Skip to content

Designed Multi Star schema with 10 Fact & 33 Dimension tables & developed Data Integration Pipeline by ETL workflow to load all tables (380M+ rows) from multiple sources such as CSV, MySQL, MSSQL & PostgreSQL by using Talend. Implemented Data Profiling, Error Handling, Load Statistics, Cleansing and Performance Tuning to address data quality gap…

License

Notifications You must be signed in to change notification settings

kpkaranpatil600/Data-Warehousing-and-Business-Intelligence-for-IMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDB_Database_Warehousing_and_Business_Intelligence

Objective:

The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend & Alteryx.

Description:

  • Designed Multi Star schema with 10 Fact & 33 Dimension tables & developed Data Integration Pipeline by ETL workflow to load all tables (380M+ rows) from multiple sources such as CSV, MySQL, MSSQL & PostgreSQL by using Talend
  • Implemented Data Profiling, Error Handling, Load Statistics, Cleansing and Performance Tuning to address data quality gaps & maintained referential integrity using Alteryx & Talend
  • Executed SQL scripts & created interactive visualization dashboards on PowerBI & Tableau for analyzing the data to draw better insights on sales & customer segmentation

The key topic areas are:

  • Data Architecture
  • Dimensional Data Modeling
  • Data Integration & ETL (Extract, Transform & Load)
  • Data Engineering & Data Preparation
  • Business Intelligence & Data Analytics Design
Tools Used:
Talend, Alteryx, SQL Server, MySQL, PostreSQL, PowerBI, Tableau, ER Studio

Instructor: Rick Sherman

Reference Book: Business Intelligence Guidebook: From Data Integration to Analytics

Dataset Links:

https://datasets.imdbws.com/

https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab

https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab

https://grouplens.org/datasets/movielens/25m/

https://www.the-numbers.com/movies/franchises

https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary

https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

References:

https://elearning.tableau.com/

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw

https://www.talend.com/

https://grouplens.org/datasets/movielens/

Talend Jobs:



IMDB_Staging - SQL Server:

IMDB_Integration - SQL Server:

IMDB_BI_Movies - SQL Server:

IMDB_BI_People - PostgreSQL:

IMDB_BI_TV - MySQL:



Alteryx Workflows:



IMDB_PostgreSQL:

IMDB - BI Movies:

• Worldwide Lifetime Gross based on Running Minutes

• Top 100 Movies by Genre, Rating & Running Time

• Top 25 Movies Based on Domestic and Foreign Revenue

• Top 25 Movies based on Number of Votes and Title Type

• Average Rating by Genre and Years

• Top 25 Title based on IMDb rating and Title Type

• People Involved with Movies

• Movies Gross

• Movies Selected Info

IMDB - BI TV:

• 100 Popular TV SHows based on User Votes

• People Detail with TV

• TV Titles based on Title Type and Avg Rating

• Ratings per Year

• Top 25 TV Title based on Rating and Title Type

• TV Series With Episode Number and Season Number

IMDB - BI People:

• People Overview

• Title Overview

• Title and People

• Director Overview

About

Designed Multi Star schema with 10 Fact & 33 Dimension tables & developed Data Integration Pipeline by ETL workflow to load all tables (380M+ rows) from multiple sources such as CSV, MySQL, MSSQL & PostgreSQL by using Talend. Implemented Data Profiling, Error Handling, Load Statistics, Cleansing and Performance Tuning to address data quality gap…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages