<a href="https://colab.research.google.com/github/richierichijkl/Projects/blob/main/Netflix_Recommendation_SYS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The recommendation system of Netflix shows you movies and TV shows according to your interests. Netflix has a lot of data because of its user base. Its recommendation system predicts a personalised catalogue for you based on factors like:

    your viewing history
    the viewing history of other users with similar tastes and preferences as yours
    genres, category, description, and more information about the content that you watched in the past

In [2]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity

data = pd.read_csv("/content/drive/MyDrive/dataset/netflixData.csv")
print(data.head())

                                Show Id                          Title  \
0  cc1b6ed9-cf9e-4057-8303-34577fb54477                       (Un)Well   
1  e2ef4e91-fb25-42ab-b485-be8e3b23dedb                         #Alive   
2  b01b73b7-81f6-47a7-86d8-acb63080d525  #AnneFrank - Parallel Stories   
3  b6611af0-f53c-4a08-9ffa-9716dc57eb9c                       #blackAF   
4  7f2d4170-bab8-4d75-adc2-197f7124c070               #cats_the_mewvie   

                                         Description  \
0  This docuseries takes a deep dive into the luc...   
1  As a grisly virus rampages a city, a lone man ...   
2  Through her diary, Anne Frank's story is retol...   
3  Kenya Barris and his family navigate relations...   
4  This pawesome documentary explores how our fel...   

                      Director  \
0                          NaN   
1                       Cho Il   
2  Sabina Fedeli, Anna Migotto   
3                          NaN   
4             Michael Margolis   

             

In [22]:
print(data.isnull().sum())
# lets clear the null values for better analysis.
# The dataset contains null values, but before removing the null values,
# let’s select the columns that we can use to build a Netflix recommendation system:
data = data[["Title", "Description", "Content Type", "Genres"]]
print(data.head())

Title           0
Description     0
Content Type    0
Genres          0
dtype: int64
                       Title  \
0                      unwel   
1                       aliv   
2  annefrank  parallel stori   
3                    blackaf   
4               catsthemewvi   

                                         Description Content Type  \
0  This docuseries takes a deep dive into the luc...      TV Show   
1  As a grisly virus rampages a city, a lone man ...        Movie   
2  Through her diary, Anne Frank's story is retol...        Movie   
3  Kenya Barris and his family navigate relations...      TV Show   
4  This pawesome documentary explores how our fel...        Movie   

                                           Genres  
0                                      Reality TV  
1  Horror Movies, International Movies, Thrillers  
2             Documentaries, International Movies  
3                                     TV Comedies  
4             Documentaries, International Movi

In [23]:
data = data.dropna()
# let’s drop the rows containing null values and move further

In [24]:
import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["Title"] = data["Title"].apply(clean)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [25]:
print(data.Title.sample(10))
# let’s have a look at some samples of the Titles before moving forward:

1571                      fangbon
4253                         sign
5254      trial gabriel fernandez
2412         jojo bizarr adventur
3829                         race
767                         booba
2408    john mulaney comeback kid
2135                   realli bad
449                  audri  daisi
2161                   hymn death
Name: Title, dtype: object


In [26]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


feature = data["Genres"].tolist()

# Create an instance of TfidfVectorizer
tfidf = TfidfVectorizer(stop_words="english")

# Fit and transform the vectorizer on our corpus
tfidf_matrix = tfidf.fit_transform(feature)

# Compute the cosine similarity matrix
similarity = cosine_similarity(tfidf_matrix)

In [27]:
def netFlix_recommendation(title, similarity = similarity):
    index = indices[title]
    similarity_scores = list(enumerate(similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[0:10]
    movieindices = [i[0] for i in similarity_scores]
    return data['Title'].iloc[movieindices]

print(netFlix_recommendation("girlfriend"))

3                          blackaf
285                     washington
417                 arrest develop
434     astronomi club sketch show
451    aunti donna big ol hous fun
656                      big mouth
752                bojack horseman
805                   brew brother
935                       champion
937                   chappel show
Name: Title, dtype: object


In [37]:
import ipywidgets as widgets
from IPython.display import display

def netflix_recommendation(button):
    title = text_box.value
    index = indices[title]
    similarity_scores = list(enumerate(similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[0:10]
    movie_indices = [i[0] for i in similarity_scores]
    result = data['Title'].iloc[movie_indices]
    result_label.value = '\n'.join(result)

# Create widgets
text_box = widgets.Text(description="Enter a movie title 🎬")
recommend_button = widgets.Button(description="Recommend 🍿")
result_label = widgets.HTML(value="")

# Register event handler for the button click
recommend_button.on_click(netflix_recommendation)

# Display widgets
display(text_box)
display(recommend_button)
display(result_label)


Text(value='', description='Enter a movie title 🎬')

Button(description='Recommend 🍿', style=ButtonStyle())

HTML(value='')