Skip to content

In this project I used clustering ML algorithms which classify movies and TV shows on Netflix by factors such as their genre, popularity, and target audience.

Notifications You must be signed in to change notification settings

pyhtonman0101/Netflix-Movies-and-TV-Shows-Clustering

Repository files navigation

Netflix Movies And TV Shows Clustering

This project is part of the “Unsupervised Machine Learning” curriculum as capstone projects at AlmaBetter.

Project Status: [Completed]

💾 Problem Statement and Project Description

Netflix Movies And TV Shows Clustering dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search. In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

-----------------------------------------------------

💾 Table of Content

  • Problem Statement and Project Description
  • Project Files Description
  • Goal
  • Dataset Information
  • Exploratory Data Analysis
  • K Means Clustering
  • Agglomerative Hierarchical Clustering

-----------------------------------------------------

💾 Project Files Description

This project contains one executable file as follows:

Executable Files:

  • Netflix Movies and TV Shows Clustering - Mohd Zahid Ansari.ipynb - Google Collab notebook containing data summary, exploration, visualisations, modeling, model performance, evaluation and conclusion.

Source Directory:

-----------------------------------------------------

📖 Goal:

By utilizing clustering algorithms in machine learning and data analysis, we can categorize data points with similar characteristics into groups. This approach can be employed to classify movies and TV shows on Netflix by factors such as their genre, popularity, and target audience.

Netflix can leverage this data to provide personalized recommendations, enhance its content library, and ultimately enhance user engagement and satisfaction. -----------------------------------------------------

📖 Dataset information:

Features in the dataset: Most of the fields are self-explanatory. The following are descriptions for those that aren't.

  • Show_id : Unique ID for every Movie / Tv Show
  • Type : Identifier - A Movie or TV Show
  • Title : Title of the Movie / Tv Show
  • Director : Director of the Movie
  • Cast : Actors involved in the movie / show
  • Country : Country where the movie / show was produced
  • Date_added : Date it was added on Netflix
  • Release_year : Actual Releaseyear of the movie / show
  • Rating : TV Rating of the movie / show
  • Duration : Total Duration - in minutes or number of seasons
  • Listed_in : Genere
  • Description: The Summary description

-----------------------------------------------------

📈 Exploratory Data Analysis

Exploratory Data Analysis concluded that Netflix has a greater number of movies than TV shows, with 69% movies and 31% TV shows.

Raúl Campos and Jan Suter have directed the highest number of films i.e. 18 films. USA has highest number of Films and TV-shows, but in India Films are made on a large scale as compared to TV-shows Production of movies started mainly started from 2000's and many more key findings.

For more EDA visualizations please refer the notebook.

-----------------------------------------------------

📖 K-Means Clustering

K-Means Clustering is a type of unsupervised learning in machine learning, used for cluster analysis. It partitions data into K number of clusters, where each data point belongs to the cluster with the nearest mean. The algorithm iteratively updates the mean values and the cluster assignments, until the cluster means no longer change or a maximum number of iterations is reached.

-----------------------------------------------------

📖 Agglomerative Hierarchical Clustering

In Agglomerative Hierarchical analysis the dendrogram is constructed by starting with each data point as a separate cluster and iteratively merging the closest clusters until all data points belong to a single cluster. Dendrograms are commonly used in hierarchical clustering to represent the relationships between data points and the clusters they belong to.The dendrogram displays the hierarchy of clusters by showing the relationships between merged clusters and the distances between them.

-----------------------------------------------------

📖 Technologies Used::

-----------------------------------------------------

📜 Credits

Mohd Zahid Ansari | Avid Learner | Data Scientist | Machine Learning Engineer | Deep Learning enthusiast

Contact me for Data Science Project Collaborations

LinkedIn Badge GitHub Badge

About

In this project I used clustering ML algorithms which classify movies and TV shows on Netflix by factors such as their genre, popularity, and target audience.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages