# Movie Recommendation System

In [1]:
import pandas as pd
import numpy as np
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

## Content Based Filtering 

*We compute similarity between movies based on certain metrics and suggests movies that are more similar to a movie like by the user.*

*Content based filitering is built based on Movie Overviews, Taglines, Cast, Crew, keywords and Genre.*
*We will use a subset of movies data.*

In [2]:
try:  #small_movies_data
    smd = pd.read_csv('description.csv')
except FileNotFoundError:
    md = pd.read_csv('movies_metadata.csv',
                     skiprows=[19731, 29504, 35588],
                     dtype={'id': int},
                     usecols=['title', 'id', 'overview', 'tagline'])
    links_small = pd.read_csv('links_small.csv')['tmdbId']
    links_small = links_small.dropna().astype(int)
    smd = md[md['id'].isin(links_small)].copy()
    smd['description'] = smd['overview'].fillna('') + ' ' + smd['tagline'].fillna('')
    smd = smd[['id', 'title', 'description']].drop_duplicates()
    smd.to_csv('description.csv', index=False)
    smd = smd.reset_index(drop=True)

smd.shape

(9082, 3)

*We start by using movie description and taglines to build the recommender system.*

In [3]:
smd['description'] = smd['description'].fillna('')
tf = TfidfVectorizer(ngram_range=(1, 2), min_df=0, stop_words='english')

In [4]:
tfidf_matrix = tf.fit_transform(smd['description'])
tfidf_matrix.shape

(9082, 267952)

**Consine Similarity
This will help us calculate numeric qunatity that represents similarity between two movies.**

**We're using linear_kernal as it is faster compared to cosine_similarities.**

In [5]:
cosine_sim = linear_kernel(tfidf_matrix)
cosine_sim


array([[1.        , 0.00680302, 0.        , ..., 0.        , 0.        ,
        0.00477808],
       [0.00680302, 1.        , 0.01530688, ..., 0.        , 0.00175214,
        0.00367921],
       [0.        , 0.01530688, 1.        , ..., 0.00192587, 0.00221235,
        0.        ],
       ...,
       [0.        , 0.        , 0.00192587, ..., 1.        , 0.        ,
        0.        ],
       [0.        , 0.00175214, 0.00221235, ..., 0.        , 1.        ,
        0.00146392],
       [0.00477808, 0.00367921, 0.        , ..., 0.        , 0.00146392,
        1.        ]])

**We are building a function called recommend which gives us 30 most similar movies based on cosine similarity score.**

In [6]:
def recommend(title):
    movie = smd[smd['title'] == title]
    if len(movie) > 1:
        print("There are duplications of same name. Choose index and use get_recommendations(idx)")
        print(movie)
    else:
        indexes = get_recommendations(movie.index[0])
        recommend_movies = smd.iloc[indexes]
        return recommend_movies[1:].set_index('id')


def get_recommendations(idx):
    # return movies index which similarity score bigger than 0.01
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    return [i[0] for i in sim_scores if i[1] > 0.01]

In [7]:
recommend('The Godfather').head(10)

Unnamed: 0_level_0,title,description
id,Unnamed: 1_level_1,Unnamed: 2_level_1
240,The Godfather: Part II,In the continuing saga of the Corleone crime f...
112205,The Family,"The Manzoni family, a notorious mafia clan, is..."
15745,Made,Two aspiring boxers lifelong friends get invol...
16806,Johnny Dangerously,"Set in the 1930s, an honest, goodhearted man i..."
37557,Shanghai Triad,A provincial boy related to a Shanghai crime f...
14615,Fury,When a prisoner barely survives a lynch mob at...
14242,American Movie,AMERICAN MOVIE is the story of filmmaker Mark ...
242,The Godfather: Part III,In the midst of trying to legitimize his busin...
1958,8 Women,Eight women gather to celebrate Christmas in a...
10279,Summer of Sam,"Spike Lee's take on the ""Son of Sam"" murders i..."


In [8]:
recommend('The Dark Knight').head(10)

Unnamed: 0_level_0,title,description
id,Unnamed: 1_level_1,Unnamed: 2_level_1
49026,The Dark Knight Rises,Following the death of District Attorney Harve...
414,Batman Forever,The Dark Knight of Gotham City confronts a das...
364,Batman Returns,"Having defeated the Joker, Batman now faces th..."
142061,"Batman: The Dark Knight Returns, Part 2",Batman has stopped the reign of terror that Th...
40662,Batman: Under the Red Hood,Batman faces his ultimate challenge as the mys...
268,Batman,The Dark Knight of Gotham City begins his war ...
69735,Batman: Year One,Two men come to Gotham City: Bruce Wayne after...
14919,Batman: Mask of the Phantasm,An old flame of Bruce Wayne's strolls into tow...
820,JFK,New Orleans District Attorney Jim Garrison dis...
123025,"Batman: The Dark Knight Returns, Part 1",Batman has not been seen for ten years. A new ...


## Collaborative Filtering 
*We make recommendations based on different users interests. We take data from similar users and make recommendations based on that to our user.*




In [9]:
from surprise import Reader, Dataset, SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

ratings = pd.read_csv('ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


**We use Singular Value Decomposition (SVD) in collabrative filtering.**


In [10]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], Reader())

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.8980


0.8979582061202923

**We will now train our dataset and then perform predictions.**


In [11]:
trainset = data.build_full_trainset()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7feab8dc9790>

In [12]:
algo.predict(1, 302)

Prediction(uid=1, iid=302, r_ui=None, est=2.6056834603369237, details={'was_impossible': False})

**As we can see, the estimated prediction for MovieID 302 is 2.69 and similary, for movieId 54 it is 3.32.**

**It predicts mainly based on the movie ID and makes suggestions depending the other users likings.**


In [13]:
algo.predict(2, 54)

Prediction(uid=2, iid=54, r_ui=None, est=3.5390814260127947, details={'was_impossible': False})

## Hybrid Recommender
*We use both content based and collaborative filer techiques and build a hybrid model.*

*It takes user ID and movie title and gives similar movies on basis of expected ratings by that particular user.*

In [14]:
id_map = pd.read_csv('links_small.csv',
                     usecols=['movieId', 'tmdbId'])
id_map = id_map.dropna().astype(int).set_index('tmdbId')


def hybrid(userid, title):
    movies = recommend(title)
    movies['est'] = [algo.predict(userid, id_map.loc[x]['movieId']).est for x in movies.index]
    movies = movies.sort_values('est', ascending=False)
    return movies.head(10)

In [15]:
hybrid(1, 'Avatar')

Unnamed: 0_level_0,title,description,est
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
530,A Grand Day Out,Wallace and Gromit have run out of cheese and ...,3.487713
17431,Moon,With only three weeks left in his three year c...,3.47281
8321,In Bruges,"Ray and Ken, two hit men, are in Bruges, Belgi...",3.460227
975,Paths of Glory,"During World War I, commanding officer General...",3.400233
11645,Ran,"Set in Japan in the 16th century (or so), an e...",3.372326
947,Lawrence of Arabia,An epic about British officer T.E. Lawrence's ...,3.351677
274,The Silence of the Lambs,"FBI trainee, Clarice Starling ventures into a ...",3.322705
31657,Coming Home,"The wife of a Marine serving in Vietnam, Sally...",3.309543
603,The Matrix,"Set in the 22nd century, The Matrix tells the ...",3.283087
12429,Ponyo,"The son of a sailor, 5-year old Sosuke lives a...",3.250127


In [16]:
hybrid(7, 'Avatar')

Unnamed: 0_level_0,title,description,est
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
11645,Ran,"Set in Japan in the 16th century (or so), an e...",4.180582
530,A Grand Day Out,Wallace and Gromit have run out of cheese and ...,4.154222
8321,In Bruges,"Ray and Ken, two hit men, are in Bruges, Belgi...",4.046262
4485,Diva,"Jules, a young Parisian postman, secretly reco...",4.040304
975,Paths of Glory,"During World War I, commanding officer General...",3.995641
828,The Day the Earth Stood Still,An alien and a robot land on earth after World...,3.966788
274,The Silence of the Lambs,"FBI trainee, Clarice Starling ventures into a ...",3.93209
17431,Moon,With only three weeks left in his three year c...,3.918428
947,Lawrence of Arabia,An epic about British officer T.E. Lawrence's ...,3.916376
1091,The Thing,Scientists in the Antarctic are confronted by ...,3.887658


**The hybrid model gives different recommendations for different users although the movie is the same. 
Recommendations are more personalized towards different users.**


In [17]:
hybrid(4, 'Toy Story')

Unnamed: 0_level_0,title,description,est
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
278,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,5.0
627,Trainspotting,"Renton, deeply immersed in the Edinburgh drug ...",4.855822
11362,The Count of Monte Cristo,Edmond Dantés's life and plans to marry the be...,4.850431
585,"Monsters, Inc.","James Sullivan and Mike Wazowski are monsters,...",4.783175
108,Three Colors: Blue,A woman struggles to find a way to live her li...,4.766249
3082,Modern Times,The Tramp struggles to live in modern industri...,4.736223
10404,Raise the Red Lantern,"China in the 1920s. After her father's death, ...",4.735197
655,"Paris, Texas",A man wanders out of the desert not knowing wh...,4.723092
205,Hotel Rwanda,"Inspired by true events, this film takes place...",4.714372
31225,Paris is Burning,A chronicle of New York City's drag scene in t...,4.713063


In [18]:
hybrid(30, 'Toy Story')

Unnamed: 0_level_0,title,description,est
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
278,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,4.730916
863,Toy Story 2,"Andy heads off to Cowboy Camp, leaving his toy...",4.619507
3082,Modern Times,The Tramp struggles to live in modern industri...,4.575196
10193,Toy Story 3,"Woody, Buzz, and the rest of Andy's toys haven...",4.562126
205,Hotel Rwanda,"Inspired by true events, this film takes place...",4.497844
221,Rebel Without a Cause,"After moving to a new town, troublemaking teen...",4.48231
11929,Dolores Claiborne,Dolores Claiborne was accused of killing her a...,4.405071
11362,The Count of Monte Cristo,Edmond Dantés's life and plans to marry the be...,4.382163
108,Three Colors: Blue,A woman struggles to find a way to live her li...,4.364793
22292,The Ghost and Mrs. Muir,"In 1900, strong-willed widow Lucy Muir goes to...",4.357535
