Tools Used: Data Profiling: Altreyx Data Cleaning: Talend ETL tool: Talend SQL Servers: MySQL, MSSQL Data Visualization: PowerBi and Tableau
IMDB Movies Analysis Using Talend
Tools & Technologies Used: Talend, ER Studio, Altreyx, Microsoft SQL Server, MySQL, Tableau, Azure Data Studio, PowerBi
• Executed data integration from diverse sources including MySQL (IMDb tables), TSV (revenue data), and JSON files (movie titles and actor name changes), ensuring comprehensive data consolidation
• Conducted in-depth data profiling and analysis using Alteryx, producing detailed reports and insights, complemented by a meticulous mapping document in Excel
• Developed a robust data model focusing on an SCD Type 2 Movie Titles Dimension table, enhancing data accuracy and historical tracking
• Designed and implemented ETL mappings in Talend, utilizing metadata-based connections, contexts, and environments, to streamline data processing workflows
• Created dynamic and interactive dashboards in Power BI and Tableau, ensuring SQL script outputs were consistent with visualized data, effectively communicating key metrics and trends
1.Alteryx:
Alteryx Workflow: Understanding data
Finding:
- Rank: The movie's rank varied from 1 to 55 during its box office run, and it contains “-” values as well
- Gross: Daily gross earnings ranged from a minimum of $357 to a maximum of about $28.27 million.
- Per Theater: Earnings per theater varied between $60 and $8,181.
- Total Gross: The cumulative gross earnings increased, reaching approximately $760.51 million.
- Days: The dataset covers 336 days from the movie's release.
- %LW and %YD contain null values
Insights and Observations
- Strong Initial Performance: "Avatar" had a powerful opening, indicated by the high initial daily and per-theater gross.
- Longevity in Theaters: The movie remained in theaters for a significant duration (336 days), highlighting its lasting appeal.
- Consistent Top Rankings: The movie consistently ranked well during its theatrical run despite fluctuations.
- Revenue Stability: After the initial spike, the total gross showed stability, indicating a steady influx of viewers over an extended period.
2. Navicat: For designing Data Model Dimensional Model:
3.Talend Workflow Screenshots
Bridge Tables: Movie-Genre Bridge table: Movie-Region Bridge Table:
Fact Tables:
BoxOfcFact: FactTitle Principal: Genre Fact:
Visualization Using Power BI (https://app.powerbi.com/groups/4245cd51-53a4-4aac-984f-18f6bde6a73e/reports/07948f86-f53d-4286-b8c0-efee8aaf52e1/ReportSection185e58af7ba5a1c2e3ef?experience=power-bi):