Skip to content

Developed a data analytics solution to process, clean, and analyze IMDB movie data, leveraging Snowflake for database management, Python for data preprocessing, and Power BI for visualization. This project demonstrates expertise in ETL processes, data modeling, and data visualization.

Notifications You must be signed in to change notification settings

sysmith27/MovieDataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MovieDataAnalysis

Screenshot 2025-01-08 at 11 27 00 AM

IMDB Movie Data Analytics Dashboard

An interactive data analytics project designed to process, clean, and visualize IMDB movie data. This project leverages Snowflake for database management, Python for data preprocessing, and Power BI for creating actionable insights.

Features

Data Preprocessing: Cleaned and transformed raw IMDB movie data using Python (pandas). Database Integration: Designed and implemented a star schema in Snowflake for efficient querying. ETL Workflow: Automated data extraction, transformation, and loading using Python. Data Visualization: Built interactive dashboards in Power BI to analyze movie performance by genre, year, and more. Scalable Design: The solution supports automation and scaling for larger datasets. 🛠️ Technologies Used

Tools/Technologies

Database: Snowflake ETL & Preprocessing: Python (pandas, SQLAlchemy) Visualization: Power BI Data Modeling: Star Schema

#Steps to Run the Project

  1. Set Up Snowflake Database Install the Snowflake ODBC driver and set up a connection. Run the SQL script from scripts/snowflake_schema.sql to create the database schema.
  2. Clean and Transform Data Run the Python ETL script to clean and load the data into Snowflake
  3. Build Dashboards in Power BI Open the provided Power BI file (visuals/dashboard.pbix). Connect it to the Snowflake database to load live data.

Dashboard Overview

  • Revenue by Genre
  • Rating by Genre
  • Rating by Certificate (PG,G,R,etc)
  • Revenue by Certificate

Future Developments

Add real-time data refresh using Snowflake tasks and Power BI Service. Expand analytics to include actor and director performance metrics. Integrate machine learning models for predictive analysis (e.g., predicting movie success).

About

Developed a data analytics solution to process, clean, and analyze IMDB movie data, leveraging Snowflake for database management, Python for data preprocessing, and Power BI for visualization. This project demonstrates expertise in ETL processes, data modeling, and data visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published