Netflix Movies And TV Shows Clustering

This project is part of the “Unsupervised Machine Learning” curriculum as capstone projects at AlmaBetter.

Project Status: [Completed]

💾 Problem Statement and Project Description

Netflix Movies And TV Shows Clustering dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search. In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

💾 Table of Content

Problem Statement and Project Description
Project Files Description
Goal
Dataset Information
Exploratory Data Analysis
K Means Clustering
Agglomerative Hierarchical Clustering

💾 Project Files Description

This project contains one executable file as follows:

Executable Files:

Netflix Movies and TV Shows Clustering - Mohd Zahid Ansari.ipynb - Google Collab notebook containing data summary, exploration, visualisations, modeling, model performance, evaluation and conclusion.

Source Directory:

Data & Resources link : https://drive.google.com/file/d/1ShTrCx2DCug4SB1qTM5YiJPJy-B6SAMk/view?usp=share_link

📖 Goal:

By utilizing clustering algorithms in machine learning and data analysis, we can categorize data points with similar characteristics into groups. This approach can be employed to classify movies and TV shows on Netflix by factors such as their genre, popularity, and target audience.

Netflix can leverage this data to provide personalized recommendations, enhance its content library, and ultimately enhance user engagement and satisfaction.

📖 Dataset information:

Features in the dataset: Most of the fields are self-explanatory. The following are descriptions for those that aren't.

Show_id : Unique ID for every Movie / Tv Show
Type : Identifier - A Movie or TV Show
Title : Title of the Movie / Tv Show
Director : Director of the Movie
Cast : Actors involved in the movie / show
Country : Country where the movie / show was produced
Date_added : Date it was added on Netflix
Release_year : Actual Releaseyear of the movie / show
Rating : TV Rating of the movie / show
Duration : Total Duration - in minutes or number of seasons
Listed_in : Genere
Description: The Summary description

📈 Exploratory Data Analysis

Exploratory Data Analysis concluded that Netflix has a greater number of movies than TV shows, with 69% movies and 31% TV shows.

Raúl Campos and Jan Suter have directed the highest number of films i.e. 18 films. USA has highest number of Films and TV-shows, but in India Films are made on a large scale as compared to TV-shows Production of movies started mainly started from 2000's and many more key findings.

For more EDA visualizations please refer the notebook.

📖 K-Means Clustering

K-Means Clustering is a type of unsupervised learning in machine learning, used for cluster analysis. It partitions data into K number of clusters, where each data point belongs to the cluster with the nearest mean. The algorithm iteratively updates the mean values and the cluster assignments, until the cluster means no longer change or a maximum number of iterations is reached.

📖 Agglomerative Hierarchical Clustering

In Agglomerative Hierarchical analysis the dendrogram is constructed by starting with each data point as a separate cluster and iteratively merging the closest clusters until all data points belong to a single cluster. Dendrograms are commonly used in hierarchical clustering to represent the relationships between data points and the clusters they belong to.The dendrogram displays the hierarchy of clusters by showing the relationships between merged clusters and the distances between them.

📖 Technologies Used::

📜 Credits

Mohd Zahid Ansari | Avid Learner | Data Scientist | Machine Learning Engineer | Deep Learning enthusiast

Contact me for Data Science Project Collaborations

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
NETFLIX MOVIES AND TV SHOWS CLUSTERING.csv		NETFLIX MOVIES AND TV SHOWS CLUSTERING.csv
Netflix_Movies_and_TV_Shows_Clustering_Mohd_Zahid_Ansari.ipynb		Netflix_Movies_and_TV_Shows_Clustering_Mohd_Zahid_Ansari.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Movies And TV Shows Clustering

Project Status: [Completed]

💾 Problem Statement and Project Description

💾 Table of Content

💾 Project Files Description

Executable Files:

Source Directory:

📖 Goal:

📖 Dataset information:

📈 Exploratory Data Analysis

📖 K-Means Clustering

📖 Agglomerative Hierarchical Clustering

📖 Technologies Used::

📜 Credits

About

Releases

Packages

Languages

pyhtonman0101/Netflix-Movies-and-TV-Shows-Clustering

Folders and files

Latest commit

History

Repository files navigation

Netflix Movies And TV Shows Clustering

Project Status: [Completed]

💾 Problem Statement and Project Description

💾 Table of Content

💾 Project Files Description

Executable Files:

Source Directory:

📖 Goal:

📖 Dataset information:

📈 Exploratory Data Analysis

📖 K-Means Clustering

📖 Agglomerative Hierarchical Clustering

📖 Technologies Used::

📜 Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages