# Movie Recommendation System

In this project, we will be building a Movie Recommendation System. The simple intuition of this way is that we will be combining the main features like the genres, keywords, cast , title, etc., and observe similarities between them because most of the time, similar casts like to perform in some similar specific types of movies.

([source link](https://machinelearningprojects.net/movie-recommendation-system/))

## Importing libraries

In [1]:
import pandas as pd

## Loading the datasets

In [2]:
pd.read_csv("data/tmdb_5000_movies.csv").head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [3]:
pd.read_csv("data/tmdb_5000_credits.csv").head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


## Merging two dataset based on `title`

In [4]:
df1 = pd.read_csv("data/tmdb_5000_movies.csv")
df2 = pd.read_csv("data/tmdb_5000_credits.csv")
org_movies = pd.merge(df1, df2, on="title")
org_movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


## Checking columns of dataset

In [5]:
org_movies.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count', 'movie_id', 'cast', 'crew'],
      dtype='object')

## Just keep important columns

In [6]:
important_columns = org_movies[["genres", "keywords","cast", "production_companies", "title"]]
important_columns.head()

Unnamed: 0,genres,keywords,cast,production_companies,title
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...",Avatar
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",Pirates of the Caribbean: At World's End
2,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",Spectre
3,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",The Dark Knight Rises
4,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",John Carter


## Cleaning the dataset

In [7]:
import yaml
def extract_name_value(x):
    x = yaml.safe_load(x)
    x = " ".join(item.get("name") for item in x)
    return x

In [8]:
movies = important_columns.copy()
for column in movies.columns[:-1]:
    movies[column] = movies[column].apply(extract_name_value)

movies.head()

Unnamed: 0,genres,keywords,cast,production_companies,title
0,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,Sam Worthington Zoe Saldana Sigourney Weaver S...,Ingenious Film Partners Twentieth Century Fox ...,Avatar
1,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,Johnny Depp Orlando Bloom Keira Knightley Stel...,Walt Disney Pictures Jerry Bruckheimer Films S...,Pirates of the Caribbean: At World's End
2,Action Adventure Crime,spy based on novel secret agent sequel mi6 bri...,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Columbia Pictures Danjaq B24,Spectre
3,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,Christian Bale Michael Caine Gary Oldman Anne ...,Legendary Pictures Warner Bros. DC Entertainme...,The Dark Knight Rises
4,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,Taylor Kitsch Lynn Collins Samantha Morton Wil...,Walt Disney Pictures,John Carter


## Checking info of daaset

In [9]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   genres                4809 non-null   object
 1   keywords              4809 non-null   object
 2   cast                  4809 non-null   object
 3   production_companies  4809 non-null   object
 4   title                 4809 non-null   object
dtypes: object(5)
memory usage: 188.0+ KB


## Make a column called combined features

In [10]:
movies["combined_features"] = movies["genres"] +' '+ movies["keywords"] +' '+ movies["cast"] +' '+ movies["production_companies"] +' '+ movies["title"]
movies.head()

Unnamed: 0,genres,keywords,cast,production_companies,title,combined_features
0,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,Sam Worthington Zoe Saldana Sigourney Weaver S...,Ingenious Film Partners Twentieth Century Fox ...,Avatar,Action Adventure Fantasy Science Fiction cultu...
1,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,Johnny Depp Orlando Bloom Keira Knightley Stel...,Walt Disney Pictures Jerry Bruckheimer Films S...,Pirates of the Caribbean: At World's End,Adventure Fantasy Action ocean drug abuse exot...
2,Action Adventure Crime,spy based on novel secret agent sequel mi6 bri...,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Columbia Pictures Danjaq B24,Spectre,Action Adventure Crime spy based on novel secr...
3,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,Christian Bale Michael Caine Gary Oldman Anne ...,Legendary Pictures Warner Bros. DC Entertainme...,The Dark Knight Rises,Action Crime Drama Thriller dc comics crime fi...
4,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,Taylor Kitsch Lynn Collins Samantha Morton Wil...,Walt Disney Pictures,John Carter,Action Adventure Science Fiction based on nove...


Observe the first entry in the combined feature column.

In [11]:
movies.iloc[0]["combined_features"]

'Action Adventure Fantasy Science Fiction culture clash future space war space colony society space travel futuristic romance space alien tribe alien planet cgi marine soldier battle love affair anti war power relations mind and soul 3d Sam Worthington Zoe Saldana Sigourney Weaver Stephen Lang Michelle Rodriguez Giovanni Ribisi Joel David Moore CCH Pounder Wes Studi Laz Alonso Dileep Rao Matt Gerald Sean Anthony Moran Jason Whyte Scott Lawrence Kelly Kilgour James Patrick Pitt Sean Patrick Murphy Peter Dillon Kevin Dorman Kelson Henderson David Van Horn Jacob Tomuri Michael Blain-Rozgay Jon Curry Luke Hawker Woody Schultz Peter Mensah Sonia Yee Jahnel Curfman Ilram Choi Kyla Warren Lisa Roumain Debra Wilson Chris Mala Taylor Kibby Jodie Landau Julie Lamm Cullen B. Madden Joseph Brady Madden Frankie Torres Austin Wilson Sara Wilson Tamica Washington-Miller Lucy Briant Nathan Meister Gerry Blair Matthew Chamberlain Paul Yates Wray Wilson James Gaylyn Melvin Leno Clark III Carvon Futrell 

## Initializing CountVectorizer

In [12]:
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer()
count_matrix = cv.fit_transform(movies["combined_features"])

In [13]:
pd.DataFrame(count_matrix.todense())

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,45666,45667,45668,45669,45670,45671,45672,45673,45674,45675
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4804,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4805,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4806,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4807,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Finding similarities between different entries

In [14]:
from sklearn.metrics.pairwise import cosine_similarity

cs = cosine_similarity(count_matrix)
cs.shape

(4809, 4809)

## Printing all movie names

In [15]:
i = 0
for movie in movies["title"].values:
    i += 1
    if i == 3:
        print(f"{movie[:40]:40}")
        i = 0
    else:
        print(f"{movie[:40]:40}", end=" | ")

Avatar                                   | Pirates of the Caribbean: At World's End | Spectre                                 
The Dark Knight Rises                    | John Carter                              | Spider-Man 3                            
Tangled                                  | Avengers: Age of Ultron                  | Harry Potter and the Half-Blood Prince  
Batman v Superman: Dawn of Justice       | Superman Returns                         | Quantum of Solace                       
Pirates of the Caribbean: Dead Man's Che | The Lone Ranger                          | Man of Steel                            
The Chronicles of Narnia: Prince Caspian | The Avengers                             | Pirates of the Caribbean: On Stranger Ti
Men in Black 3                           | The Hobbit: The Battle of the Five Armie | The Amazing Spider-Man                  
Robin Hood                               | The Hobbit: The Desolation of Smaug      | The Golden Compass       

## Two utility functions

In [16]:
def get_index_from_movie_name(title):
    """Extract the index from the title"""
    return org_movies[org_movies["title"] == title].index.values

def get_movie_name_from_index(index):
    """Extract titles from the index"""
    return org_movies[org_movies.index == index]["title"].values[0]

## Live prediction

In [17]:
def live_prediction(movie_title, number_of_similar=5):
    movie_index = get_index_from_movie_name(movie_title)
    movie_corrs = cs[movie_index][0]
    movie_corrs = enumerate(movie_corrs)
    sorted_similar_movies = sorted(movie_corrs, key=lambda x:x[1], reverse=True)
    similar_movie_index = [index for index, _ in sorted_similar_movies]
    print(f"Top {number_of_similar} recommended movies for {movie_title!r}:\n")
    for i in range(1, number_of_similar+1):
        index = similar_movie_index[i]
        name = get_movie_name_from_index(index)
        print(f"{i:2}: {name}")

In [18]:
test_movie_title = "Iron Man"
live_prediction(test_movie_title, number_of_similar=10)

Top 10 recommended movies for 'Iron Man':

 1: Iron Man 3
 2: Iron Man 2
 3: The Avengers
 4: The Incredible Hulk
 5: Thor: The Dark World
 6: Avengers: Age of Ultron
 7: Ant-Man
 8: Thor
 9: Captain America: The First Avenger
10: Captain America: The Winter Soldier
