# **Title of Project**

-------------
Title: Movie Recommendation System Using Python

## **Objective**

The goal of this project is to build a movie recommendation system that suggests movies to users based on the content similarity of movie descriptions, cast, director, and other features.

## **Data Source**

The dataset used in this project is the 'netflix_titles.csv', which contains information about movies and TV shows available on Netflix. https://www.kaggle.com/datasets/rahulvyasm/netflix-movies-and-tv-shows

## **Import Library**

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## **Import Data**

In [14]:
# Load the dataset
df = pd.read_csv('netflix_titles.csv', encoding='ISO-8859-1')

# Display the first few rows of the dataset
df.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,...,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,...,,,,,,,,,,
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,...,,,,,,,,,,
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,...,,,,,,,,,,
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,...,,,,,,,,,,
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,...,,,,,,,,,,


## **Describe Data**

In [15]:
# Describe the data to understand its structure
df.info()

# Check for missing values
df.isnull().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8809 entries, 0 to 8808
Data columns (total 26 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       8809 non-null   object 
 1   type          8809 non-null   object 
 2   title         8809 non-null   object 
 3   director      6175 non-null   object 
 4   cast          7984 non-null   object 
 5   country       7978 non-null   object 
 6   date_added    8799 non-null   object 
 7   release_year  8809 non-null   int64  
 8   rating        8805 non-null   object 
 9   duration      8806 non-null   object 
 10  listed_in     8809 non-null   object 
 11  description   8809 non-null   object 
 12  Unnamed: 12   0 non-null      float64
 13  Unnamed: 13   0 non-null      float64
 14  Unnamed: 14   0 non-null      float64
 15  Unnamed: 15   0 non-null      float64
 16  Unnamed: 16   0 non-null      float64
 17  Unnamed: 17   0 non-null      float64
 18  Unnamed: 18   0 non-null    

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
Unnamed: 12     8809
Unnamed: 13     8809
Unnamed: 14     8809
Unnamed: 15     8809
Unnamed: 16     8809
Unnamed: 17     8809
Unnamed: 18     8809
Unnamed: 19     8809
Unnamed: 20     8809
Unnamed: 21     8809
Unnamed: 22     8809
Unnamed: 23     8809
Unnamed: 24     8809
Unnamed: 25     8809
dtype: int64

## **Data Visualization**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Example: Distribution of Movies and TV Shows
sns.countplot(data=df, x='type')
plt.title('Distribution of Movies and TV Shows on Netflix')
plt.show()


## **Data Preprocessing**

In [16]:
# Fill NaN values with empty strings
df['director'] = df['director'].fillna('')
df['cast'] = df['cast'].fillna('')
df['country'] = df['country'].fillna('')
df['listed_in'] = df['listed_in'].fillna('')
df['description'] = df['description'].fillna('')

# Combine relevant features into a single string
df['combined_features'] = df['director'] + ' ' + df['cast'] + ' ' + df['country'] + ' ' + df['listed_in'] + ' ' + df['description']


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
In this context, there is no explicit target variable. We are focusing on content-based filtering, so we directly work with the combined features.

## **Train Test Split**

In [None]:
No train-test split is necessary for this content-based recommendation system. We will calculate similarities across all data points.

## **Modeling**

In [17]:
# Feature extraction
cv = CountVectorizer(stop_words='english')
count_matrix = cv.fit_transform(df['combined_features'])

# Compute cosine similarity
cosine_sim = cosine_similarity(count_matrix, count_matrix)


## **Model Evaluation**

In [None]:
For content-based systems, evaluation is typically qualitative (e.g., checking if the recommendations make sense).

## **Prediction**

In [18]:
# Function to get the title from the index
def get_title_from_index(index):
    return df[df.index == index]['title'].values[0]

# Function to get the index from the title
def get_index_from_title(title):
    return df[df.title == title].index.values[0]

# Recommend movies based on a given movie title
def recommend_movies(movie_title, n=5):
    try:
        movie_index = get_index_from_title(movie_title)
    except IndexError:
        print(f"'{movie_title}' not found in the dataset.")
        return []
    
    similar_movies = list(enumerate(cosine_sim[movie_index]))
    sorted_similar_movies = sorted(similar_movies, key=lambda x: x[1], reverse=True)[1:]
    
    print(f"Top {n} movies similar to '{movie_title}' are:")
    recommendations = []
    for i in range(n):
        recommendations.append(get_title_from_index(sorted_similar_movies[i][0]))
        print(recommendations[-1])
    return recommendations

# Example usage
recommend_movies('Breaking Bad', 10)  # Replace 'Breaking Bad' with a movie title from your dataset


Top 10 movies similar to 'Breaking Bad' are:
Better Call Saul
Get Shorty
Have You Ever Fallen in Love, Miss Jiang?
MINDHUNTER
Jack Taylor
Travelers
Dare Me
Re:Mind
Haunted
Marvel's The Punisher


['Better Call Saul',
 'Get Shorty',
 'Have You Ever Fallen in Love, Miss Jiang?',
 'MINDHUNTER',
 'Jack Taylor',
 'Travelers',
 'Dare Me',
 'Re:Mind',
 'Haunted',
 "Marvel's The Punisher"]

## **Explaination**

The movie recommendation system uses a content-based filtering approach by calculating cosine similarity between movies based on combined textual features. The recommendations are generated by finding the most similar movies to a given movie title.