This project dives into Netflixβs vast dataset to uncover insights about:
- π Which genres dominate viewership
- β How ratings are distributed across shows and movies
To make the analysis versatile, itβs implemented in both Python and R:
- Python: Handles data extraction, cleaning, and visualization of both genres and ratings.
- R: Focuses on a streamlined visualization of ratings distribution using ggplot2.
netflix_analysis.py
β Extracts, cleans, and visualizes Netflix data (genres + ratings).netflix_ratings_distribution.R
β R-based script for ratings visualization.netflix_data.zip
β Compressed raw Netflix dataset.Netflix_shows_movies.csv
β Cleaned dataset generated for analysis.
- Run
netflix_analysis.py
(Python): Extract, clean, and prepare the dataset + visualize genres & ratings. - Run
netflix_ratings_distribution.R
(R): Load the cleaned dataset and generate ratings distribution plots.
- Python 3.x β Install
pandas
,seaborn
,matplotlib
- R β Install
ggplot2
,dplyr
- Data Prep β Extracts
netflix_data.csv
from the ZIP and renames it toNetflix_shows_movies.csv
. - Data Cleaning β Fills missing values (director, cast, country, rating, date_added).
- Exploration β Generates descriptive stats to understand the dataset.
- Visualization β Creates insightful plots (most watched genres + ratings distribution).
- Load Data β Reads in
Netflix_shows_movies.csv
. - Data Cleaning β Handles missing values in key fields.
- Visualization β Builds clear, engaging ratings distribution charts with ggplot2.
- Keep all files (
netflix_analysis.py
,netflix_ratings_distribution.R
,netflix_data.zip
,Netflix_shows_movies.csv
) in the same directory before running. - The Python script generates the cleaned CSV, which is then used as input for the R script.
β¨ This project demonstrates data wrangling, visualization, and cross-language analytics β skills that directly translate into real-world business intelligence and data science roles.