An interactive data analytics project designed to process, clean, and visualize IMDB movie data. This project leverages Snowflake for database management, Python for data preprocessing, and Power BI for creating actionable insights.
Data Preprocessing: Cleaned and transformed raw IMDB movie data using Python (pandas). Database Integration: Designed and implemented a star schema in Snowflake for efficient querying. ETL Workflow: Automated data extraction, transformation, and loading using Python. Data Visualization: Built interactive dashboards in Power BI to analyze movie performance by genre, year, and more. Scalable Design: The solution supports automation and scaling for larger datasets. 🛠️ Technologies Used
Database: Snowflake ETL & Preprocessing: Python (pandas, SQLAlchemy) Visualization: Power BI Data Modeling: Star Schema
#Steps to Run the Project
- Set Up Snowflake Database Install the Snowflake ODBC driver and set up a connection. Run the SQL script from scripts/snowflake_schema.sql to create the database schema.
- Clean and Transform Data Run the Python ETL script to clean and load the data into Snowflake
- Build Dashboards in Power BI Open the provided Power BI file (visuals/dashboard.pbix). Connect it to the Snowflake database to load live data.
- Revenue by Genre
- Rating by Genre
- Rating by Certificate (PG,G,R,etc)
- Revenue by Certificate
Add real-time data refresh using Snowflake tasks and Power BI Service. Expand analytics to include actor and director performance metrics. Integrate machine learning models for predictive analysis (e.g., predicting movie success).