<h2 style="text-align: center;"> Movie Recommendation System </h2>

<b>Goal:</b>
<p style="text-align: justify;"> Recommender systems have become integral to various online platforms, providing personalized suggestions to users across a wide range of entities such as products, movies, and services. Examples include Amazon's product recommendations, Netflix's movie suggestions, and YouTube's video recommendations.</p>

<p style="text-align: justify;">
There are different types of recommender systems such as Collaborative filtering methods, which rely on past user-item interactions, Content-based recommenders which leverage metadata associated with items to suggest similar items of interest, and Simple Rule-based recommenders which is based on global metrics like popularity, global ratings etc.</p>

<p style="text-align: justify;">
In this project, my focus is on building a Content-based movie recommendation system using Natural Language Processing (NLP) techniques. By analyzing textual features such as movie descriptions, genres, and titles, I aim to recommend movies that share similar characteristics with specific movies of interest tailored towards user preferences and tastes. </p>

<img src="minions.jpg" alt="netflix_recommendation" style="width:800px">

### Libraries Used


In [1]:
import pandas as pd
import numpy as np

import nltk
import re

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


### Loading dataset

In [2]:
df = pd.read_csv('movies.csv')
display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

None

### Selecting necessaary columns

In [3]:
df = df[['title', 'tagline', 'overview', 'genres', 'popularity']]

display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   title       4803 non-null   object 
 1   tagline     3959 non-null   object 
 2   overview    4800 non-null   object 
 3   genres      4803 non-null   object 
 4   popularity  4803 non-null   float64
dtypes: float64(1), object(4)
memory usage: 187.7+ KB


None

### Creating a new column, "Description" by adding 'tagline' and 'overview'

In [4]:
df['description'] = df['tagline'].map(str) + ' ' + df['overview']

display (df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   title        4803 non-null   object 
 1   tagline      3959 non-null   object 
 2   overview     4800 non-null   object 
 3   genres       4803 non-null   object 
 4   popularity   4803 non-null   float64
 5   description  4800 non-null   object 
dtypes: float64(1), object(5)
memory usage: 225.3+ KB


None

### Dealing with missing values

In [5]:
df.tagline.fillna('', inplace=True)
df.dropna(inplace=True)

display (df.info())

display (df.head())

<class 'pandas.core.frame.DataFrame'>
Index: 4800 entries, 0 to 4802
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   title        4800 non-null   object 
 1   tagline      4800 non-null   object 
 2   overview     4800 non-null   object 
 3   genres       4800 non-null   object 
 4   popularity   4800 non-null   float64
 5   description  4800 non-null   object 
dtypes: float64(1), object(5)
memory usage: 262.5+ KB


None

Unnamed: 0,title,tagline,overview,genres,popularity,description
0,Avatar,Enter the World of Pandora.,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",150.437577,Enter the World of Pandora. In the 22nd centur...
1,Pirates of the Caribbean: At World's End,"At the end of the world, the adventure begins.","Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",139.082615,"At the end of the world, the adventure begins...."
2,Spectre,A Plan No One Escapes,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",107.376788,A Plan No One Escapes A cryptic message from B...
3,The Dark Knight Rises,The Legend Ends,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",112.31295,The Legend Ends Following the death of Distric...
4,John Carter,"Lost in our world, found in another.","John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",43.926995,"Lost in our world, found in another. John Cart..."


### Building the Movie Recommender function by doing basic text pre-processing, feature engineering and document similarity computation. 


In [6]:
#text pre-processing
stop_words = nltk.corpus.stopwords.words('english')

def normalize_document(description):
    description = re.sub(r'[^a-zA-Z0-9\s]', '', description, re.I|re.A)
    description = description.lower()
    description = description.strip()
    tokens = nltk.word_tokenize(description)
    filtered_tokens = [token for token in tokens if token not in stop_words]
    description = ' '.join(filtered_tokens)
    return description

normalize_description = np.vectorize(normalize_document)

norm_description = normalize_description(list(df['description']))
len(norm_description)

4800

In [7]:
# Extracting TFIDF Features and computing Pairwise Document Similarity

tf = TfidfVectorizer(ngram_range=(1, 2), min_df=2)
tfidf_matrix = tf.fit_transform(norm_description)

movie_sim = cosine_similarity(tfidf_matrix)
movie_sim_df = pd.DataFrame(movie_sim)

movie_sim_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4790,4791,4792,4793,4794,4795,4796,4797,4798,4799
0,1.0,0.010701,0.0,0.01903,0.028687,0.024901,0.0,0.026516,0.0,0.00742,...,0.009702,0.0,0.023336,0.033549,0.0,0.0,0.0,0.00688,0.0,0.0
1,0.010701,1.0,0.011891,0.0,0.041623,0.0,0.014564,0.027122,0.034688,0.007614,...,0.009956,0.0,0.004818,0.0,0.0,0.012593,0.0,0.022351,0.013724,0.0
2,0.0,0.011891,1.0,0.0,0.0,0.0,0.0,0.022242,0.015854,0.004891,...,0.042617,0.0,0.0,0.0,0.016501,0.0,0.0,0.011661,0.0,0.003994
3,0.01903,0.0,0.0,1.0,0.008793,0.0,0.015976,0.023172,0.027452,0.07361,...,0.0,0.0,0.009667,0.0,0.0,0.0,0.0,0.028304,0.021785,0.027696
4,0.028687,0.041623,0.0,0.008793,1.0,0.0,0.022912,0.028676,0.0,0.023538,...,0.0148,0.0,0.0,0.0,0.0,0.01076,0.0,0.010495,0.0,0.0


### Getting the list of Movie Titles in the dataset

In [8]:
movie_titles = df['title'].values
print(movie_titles, '\n', movie_titles.shape)

['Avatar' "Pirates of the Caribbean: At World's End" 'Spectre' ...
 'Signed, Sealed, Delivered' 'Shanghai Calling' 'My Date with Drew'] 
 (4800,)


### The function to recommend top 5 similar movies for any movie in the dataset

In [9]:
def movie_recommender(movie_title, movies=movie_titles, movie_sims=movie_sim_df):
    
    movie_index = np.where(movies == movie_title)[0][0]
    movie_similarities = movie_sims.iloc[movie_index].values
    similar_movie_idxs = np.argsort(-movie_similarities)[1:6]
    similar_movies = movies[similar_movie_idxs]
    
    return similar_movies

## Sorting the dataset by Popular Movies

In [10]:
pop_movies = df.sort_values(by='popularity', ascending=False)
pop_movies.head(10)

Unnamed: 0,title,tagline,overview,genres,popularity,description
546,Minions,"Before Gru, they had a history of bad bosses","Minions Stuart, Kevin and Bob are recruited by...","[{""id"": 10751, ""name"": ""Family""}, {""id"": 16, ""...",875.581305,"Before Gru, they had a history of bad bosses M..."
95,Interstellar,Mankind was born on Earth. It was never meant ...,Interstellar chronicles the adventures of a gr...,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 18, ""...",724.247784,Mankind was born on Earth. It was never meant ...
788,Deadpool,Witness the beginning of a happy ending,Deadpool tells the origin story of former Spec...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",514.569956,Witness the beginning of a happy ending Deadpo...
94,Guardians of the Galaxy,All heroes start somewhere.,"Light years from Earth, 26 years after being a...","[{""id"": 28, ""name"": ""Action""}, {""id"": 878, ""na...",481.098624,All heroes start somewhere. Light years from E...
127,Mad Max: Fury Road,What a Lovely Day.,An apocalyptic story set in the furthest reach...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",434.278564,What a Lovely Day. An apocalyptic story set in...
28,Jurassic World,The park is open.,Twenty-two years after the events of Jurassic ...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",418.708552,The park is open. Twenty-two years after the e...
199,Pirates of the Caribbean: The Curse of the Bla...,Prepare to be blown out of the water.,"Jack Sparrow, a freewheeling 17th-century pira...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",271.972889,Prepare to be blown out of the water. Jack Spa...
82,Dawn of the Planet of the Apes,One last chance for peace.,A group of scientists in San Francisco struggl...,"[{""id"": 878, ""name"": ""Science Fiction""}, {""id""...",243.791743,One last chance for peace. A group of scientis...
200,The Hunger Games: Mockingjay - Part 1,Fire burns brighter in the darkness,Katniss Everdeen reluctantly becomes the symbo...,"[{""id"": 878, ""name"": ""Science Fiction""}, {""id""...",206.227151,Fire burns brighter in the darkness Katniss Ev...
88,Big Hero 6,From the creators of Wreck-it Ralph and Frozen,The special bond that develops between plus-si...,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 10751...",203.73459,From the creators of Wreck-it Ralph and Frozen...


In [11]:
# Top 10 Movies by Popularity
popular_movies = pop_movies['title'][:10].tolist()
popular_movies

['Minions',
 'Interstellar',
 'Deadpool',
 'Guardians of the Galaxy',
 'Mad Max: Fury Road',
 'Jurassic World',
 'Pirates of the Caribbean: The Curse of the Black Pearl',
 'Dawn of the Planet of the Apes',
 'The Hunger Games: Mockingjay - Part 1',
 'Big Hero 6']

## Top 5 recommended Movies for the top 10 selected movies using the Recommender system created

In [12]:
recomm_list = []

for movie in popular_movies:
    recomm = {
        'Movie': movie,
        'Recommended Movies': movie_recommender(movie_title=movie)
    }
    recomm_list.append(recomm)

recomm_df = pd.DataFrame(recomm_list)

pd.set_option('display.max_colwidth', None)
display(recomm_df)

Unnamed: 0,Movie,Recommended Movies
0,Minions,"[Despicable Me 2, Despicable Me, Teenage Mutant Ninja Turtles: Out of the Shadows, Superman, Rise of the Guardians]"
1,Interstellar,"[Gattaca, Space Cowboys, Space Pirate Captain Harlock, Starship Troopers, Final Destination 2]"
2,Deadpool,"[Silent Trigger, Underworld: Evolution, Bronson, Shaft, Don Jon]"
3,Guardians of the Galaxy,"[Chasing Mavericks, E.T. the Extra-Terrestrial, American Sniper, The Amazing Spider-Man 2, Hoop Dreams]"
4,Mad Max: Fury Road,"[The 6th Day, Star Trek Beyond, Kites, The Orphanage, The Water Diviner]"
5,Jurassic World,"[Jurassic Park, The Lost World: Jurassic Park, The Nut Job, National Lampoon's Vacation, Vacation]"
6,Pirates of the Caribbean: The Curse of the Black Pearl,"[Pirates of the Caribbean: Dead Man's Chest, The Pirate, Pirates of the Caribbean: On Stranger Tides, The Pirates! In an Adventure with Scientists!, Joyful Noise]"
7,Dawn of the Planet of the Apes,"[Battle for the Planet of the Apes, Groove, The Other End of the Line, Chicago Overcoat, Definitely, Maybe]"
8,The Hunger Games: Mockingjay - Part 1,"[The Hunger Games: Catching Fire, The Hunger Games: Mockingjay - Part 2, John Carter, For Greater Glory - The True Story of Cristiada, The Proposition]"
9,Big Hero 6,"[Wreck-It Ralph, A Home at the End of the World, Phat Girlz, Splice, U.F.O.]"


## Conclusion
<p style="text-align: justify;"> The movie recommendation function successfully generates top 5 recommendations for each given movie title. For instance, for the movie "Minions", the recommended movies include "Despicable Me 2", "Despicable Me", "Teenage Mutant Ninja Turtles: Out of the Shadows", "Superman", and "Rise of the Guardians". Similarly, for "Interstellar", recommended movies are "Gattaca", "Space Cowboys", "Space Pirate Captain Harlock", "Starship Troopers", and "Final Destination 2". These recommendations are derived from an analysis of movie metadata, enabling the system to suggest similar movies based on their textual descriptions. </p>