# Introduction
This notebook is a part of the final project. It contains source code for the project. The project is about movie recommendation system. 
For more information about the project, please refer to the purposal of the project.

Import necessary libraries :
- Tf-idf is used to calculate term frequency for movies' overviews.
- CountVectorizer is used to convert a collection of text documents to a matrix of token counts.
- Linear_kernel is used to compute the dot product between two vectors.
- Pandas is used to read the csv file.
- Numpy is used to array operations.
- Cosisne_similarity is used to calculate the similarity between movies.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json

## Data Collection

We cannot take data directly from the internet. We need to download the data from the internet and then use it because the Kaggle-Api is not available in our country. So we download the data from the following links : 
- https://www.kaggle.com/rounakbanik/the-movies-dataset
- https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

In [2]:
tmdb_movies = pd.read_csv('data/tmdb/tmdb_5000_movies.csv')
tmdb_credits = pd.read_csv('data/tmdb/tmdb_5000_credits.csv')

# Ingest the data

In [3]:
tmdb_movies.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466


In [4]:
tmdb_credits.head(3)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."


In [5]:
print(tmdb_movies.shape)
print(tmdb_credits.shape)

(4803, 20)
(4803, 4)


# Content Based Filtering
![alt text](https://miro.medium.com/max/1400/1*Lr6qL0YjY_WqVK5u-AYHAQ.png)
### Overview Based Recommender

Edit dataframe columns to merge the dataframes.

In [6]:
tmdb_credits.columns = ['id','title','cast','crew']
tmdb_credits.head(3)

Unnamed: 0,id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."


Merge the dataframes to get the all required data.

In [7]:
tmdb_all_data = tmdb_movies.merge(tmdb_credits, on='id')
tmdb_all_data.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."


In [8]:
tmdb_all_data.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title_x', 'vote_average',
       'vote_count', 'title_y', 'cast', 'crew'],
      dtype='object')

Clean overview column.

In [9]:
tmdb_all_data['overview'] = tmdb_all_data['overview'].fillna('')

Get tf-idf matrix for the overview column and added stop words to avoid unnecessary words.

In [10]:
tfidf = TfidfVectorizer(stop_words='english')

In [11]:
transformed_overviews = tfidf.fit_transform(tmdb_all_data['overview'])
transformed_overviews.shape

(4803, 20978)

We use linear kernel to find similarity between overviews because we need dot product here, and it gives us the cosine similarity score.

In [12]:
cosine_similarities = linear_kernel(transformed_overviews, transformed_overviews)
cosine_similarities.shape

(4803, 4803)

We use the function below for getting the top 10 similar movies.
- The function takes movie title and similarity score matrix as input and get similarity score for that movie and then sort the score and get the top 10 similar movies.


In [13]:
def get_movie_recommendation_with_similarities(title, similarity, top=10):
    idx = tmdb_all_data[tmdb_all_data['title_x'] == title].index[0]
    sim_scores = list(enumerate(similarity[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top+1]
    movie_indices = [i[0] for i in sim_scores]
    return pd.DataFrame(tmdb_all_data['title_x'].iloc[movie_indices])
    

Let's get a random movie from the dataset and find the top 10 similar movies.

In [57]:
random_movie = tmdb_all_data.sample(1)['title_x'].values[0]
print('Movie: ', random_movie)
get_movie_recommendation_with_similarities(random_movie, cosine_similarities, top=10)

Movie:  S.W.A.T.


Unnamed: 0,title_x
1084,The Glimmer Man
3306,Code of Honor
4502,Water & Power
976,Escape from L.A.
3206,Polisse
738,Joy
2020,The Rookie
4763,Smiling Fish & Goat On Fire
3293,10th & Wolf
2371,RockNRolla


### Cast, Crew, Genres and Keywords Based Recommender

We will use count vectorizer to convert other features to one vector. Let's see how it works.

In [58]:
vectorizer = CountVectorizer()
sampleData = ['This is the first document.','This document is the second document.','And this is the third one.','Is this the first document?',]
X = vectorizer.fit_transform(sampleData)
print(vectorizer.get_feature_names())
print(X.toarray())

['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]




In [59]:
vectorizer = CountVectorizer()

Now we know how to convert the features to vectors. <br>
Let's concat all features to one data and convert it to vector.

Get cast, crew, keywords and genre names from the dataset.

In [60]:
extended_data = tmdb_all_data.copy()
extended_data['cast'] = extended_data['cast'].apply(lambda x: [i['name'] for i in json.loads(x)] if isinstance(x, str) else [])
extended_data['crew'] = extended_data['crew'].apply(lambda x: [i['name'] for i in json.loads(x)] if isinstance(x, str) else [])
extended_data['genres'] = extended_data['genres'].apply(lambda x: [i['name'] for i in json.loads(x)] if isinstance(x, str) else [])
extended_data['keywords'] = extended_data['keywords'].apply(lambda x: [i['name'] for i in json.loads(x)] if isinstance(x, str) else [])

extended_data.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,"[Action, Adventure, Fantasy, Science Fiction]",http://www.avatarmovie.com/,19995,"[culture clash, future, space war, space colon...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,"[Sam Worthington, Zoe Saldana, Sigourney Weave...","[Stephen E. Rivkin, Rick Carter, Christopher B..."
1,300000000,"[Adventure, Fantasy, Action]",http://disney.go.com/disneypictures/pirates/,285,"[ocean, drug abuse, exotic island, east india ...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,"[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[Dariusz Wolski, Gore Verbinski, Jerry Bruckhe..."
2,245000000,"[Action, Adventure, Crime]",http://www.sonypictures.com/movies/spectre/,206647,"[spy, based on novel, secret agent, sequel, mi...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,"[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[Thomas Newman, Sam Mendes, Anna Pinnock, John..."


Convert related features to strings for count vectorizer.

In [61]:
extended_data['genres'] = extended_data['genres'].apply(lambda x: ' '.join(x))
extended_data['cast'] = extended_data['cast'].apply(lambda x: ' '.join(x))
extended_data['crew'] = extended_data['crew'].apply(lambda x: ' '.join(x))
extended_data['keywords'] = extended_data['keywords'].apply(lambda x: ' '.join(x))
extended_data.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,Sam Worthington Zoe Saldana Sigourney Weaver S...,Stephen E. Rivkin Rick Carter Christopher Boye...
1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,Johnny Depp Orlando Bloom Keira Knightley Stel...,Dariusz Wolski Gore Verbinski Jerry Bruckheime...
2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6 bri...,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Thomas Newman Sam Mendes Anna Pinnock John Log...


Add all features to concated_data column.

In [62]:
extended_data['concated_data'] = extended_data['genres'] + ' ' + extended_data['cast'] + ' ' + extended_data['crew'] + ' ' + extended_data['keywords']
extended_data.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew,concated_data
0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,Sam Worthington Zoe Saldana Sigourney Weaver S...,Stephen E. Rivkin Rick Carter Christopher Boye...,Action Adventure Fantasy Science Fiction Sam W...
1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,Johnny Depp Orlando Bloom Keira Knightley Stel...,Dariusz Wolski Gore Verbinski Jerry Bruckheime...,Adventure Fantasy Action Johnny Depp Orlando B...
2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6 bri...,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Thomas Newman Sam Mendes Anna Pinnock John Log...,Action Adventure Crime Daniel Craig Christoph ...


Create count vectorizer matrix.

In [63]:
count_matrix = vectorizer.fit_transform(extended_data['concated_data'])
count_matrix.shape

(4803, 62236)

Get cosine similarities for the count vectorizer matrix.

In [64]:
cosine_similarity_for_concated_data = cosine_similarity(count_matrix, count_matrix)

Let's see how it looks like with some random movies.

In [68]:
random_movie = tmdb_all_data.sample(1)['title_x'].values[0]
print('Movie: ', random_movie)
get_movie_recommendation_with_similarities(random_movie, cosine_similarity_for_concated_data, top=10)

Movie:  Richard III


Unnamed: 0,title_x
3616,Robin and Marian
3177,Henry V
1412,Predator 2
2024,Gandhi
1930,Stone Cold
845,Instinct
432,Deep Impact
508,The Lost World: Jurassic Park
2327,Predator
1133,15 Minutes


In [82]:
random_movie = tmdb_all_data.sample(1)['title_x'].values[0]
print('Movie: ', random_movie)
get_movie_recommendation_with_similarities(random_movie, cosine_similarity_for_concated_data, top=10)

Movie:  The Girl with the Dragon Tattoo


Unnamed: 0,title_x
1161,The Social Network
693,Gone Girl
28,Jurassic World
100,The Curious Case of Benjamin Button
191,Harry Potter and the Prisoner of Azkaban
197,Harry Potter and the Philosopher's Stone
276,Harry Potter and the Chamber of Secrets
1422,The X Files: I Want to Believe
2151,The Bank Job
1133,15 Minutes


### Let's see more random movies.

In [85]:
for i in range(10):
    random_movie = tmdb_all_data.sample(1)['title_x'].values[0]
    print('Movie: ', random_movie)
    print(get_movie_recommendation_with_similarities(random_movie, cosine_similarity_for_concated_data, top=10))
    print('--------------------------------------------------------')

Movie:  PCU
                                       title_x
4014                Kevin Hart: Let Me Explain
2418         Dickie Roberts: Former Child Star
2545                              End of Watch
838                                     Alien³
3713                              Mean Machine
1540                            American Pie 2
4242                                 Road Hard
182                                    Ant-Man
191   Harry Potter and the Prisoner of Azkaban
950                             The Negotiator
--------------------------------------------------------
Movie:  I, Robot
                                 title_x
28                        Jurassic World
1133                          15 Minutes
1374                   L.A. Confidential
544                Flight of the Phoenix
3                  The Dark Knight Rises
365                              Contact
72                         Suicide Squad
93    Terminator 3: Rise of the Machines
412                         

# Collaborative Filtering

![alt text](https://miro.medium.com/max/1400/1*6_NlX6CJYhtxzRM-t6ywkQ.png)

Unlike content-based filtering, collaborative filtering does not require item metadata like genre, overview, cast, crew, etc. It works by learning the user's preferences and recommending items that are similar to the user's preferences. The underlying assumption is that if a user liked a particular item, he or she will also like an item that is similar to it.

We're going to use the  [Surprise](https://surpriselib.com/) library to implement collaborative filtering. Surprise is a Python scikit building and analyzing recommender systems that deal with explicit rating data. It also has implementations of various other algorithms for recommendation systems.

In [86]:
from surprise import Reader, Dataset, SVD

import warnings; warnings.simplefilter('ignore')
from surprise.model_selection import cross_validate
from sklearn.model_selection import train_test_split


Reader class is used to parse a file containing ratings.

In [87]:
reader = Reader()

In [88]:
ratings = pd.read_csv('./data/movies-dataset/ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


Convert the dataset to surprise dataset.

In [89]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

We use SVD(Single Value Decomposition) algorithm to get the predictions. <br>
SVD is a matrix factorization technique that generalizes the eigendecomposition of a matrix to any m x n matrix. <br>

---

We used cross validation to meause the accuracy of the model.

In [90]:
svd = SVD()
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8894  0.8938  0.9037  0.8997  0.8894  0.8952  0.0057  
MAE (testset)     0.6847  0.6896  0.6970  0.6917  0.6856  0.6897  0.0045  
Fit time          0.57    0.53    0.47    0.46    0.46    0.50    0.04    
Test time         0.09    0.09    0.08    0.14    0.08    0.10    0.02    


{'test_rmse': array([0.88940607, 0.89378114, 0.90365023, 0.89973632, 0.88944893]),
 'test_mae': array([0.68473337, 0.68955905, 0.69704913, 0.6916795 , 0.68556042]),
 'fit_time': (0.5658848285675049,
  0.527008056640625,
  0.4735832214355469,
  0.4582850933074951,
  0.4593019485473633),
 'test_time': (0.09052801132202148,
  0.09227299690246582,
  0.07813072204589844,
  0.13817906379699707,
  0.0784449577331543)}

Train the model on the trainset and predict the ratings for the random user and movie.

In [91]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f98b95fd400>

In [92]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.8140266221053625, details={'was_impossible': False})

# Composite Movie Recommender System
Until now, we have built two recommendation systems. One based on content and the other based on collaborative filtering. We will now combine these two recommendation systems to build a hybrid recommender system. <br>
> For this, we will take userId and movie title as input and get similar movies with content based filtering and then get the predicted rating for the user and movie with collaborative filtering. Then we will sort the movies based on the predicted rating and get the top N movies.

In [93]:
def composite(userId, title):
    similars = get_movie_recommendation_with_similarities(title, cosine_similarity_for_concated_data, top=50)
    similar_indices = similars.index.tolist()

    movies = tmdb_all_data.iloc[similar_indices][['original_title', 'vote_count', 'vote_average', 'id']]
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, x).est)
    movies = movies.sort_values('est', ascending=False)
    return movies.head(10)

Let's test same movie with different users

In [94]:
composite(1, 'The Fast and the Furious')

Unnamed: 0,original_title,vote_count,vote_average,id,est
150,Men in Black II,3114,6.0,608,3.587824
878,Flags of Our Fathers,526,6.7,3683,3.319325
197,Harry Potter and the Philosopher's Stone,7006,7.5,671,3.217031
2743,The Butterfly Effect,2060,7.3,1954,3.126574
568,xXx,1424,5.8,7451,2.915823
762,Mercury Rising,368,6.0,8838,2.815337
1133,15 Minutes,191,5.7,2749,2.79352
1374,L.A. Confidential,1310,7.7,2118,2.755965
813,Superman,1022,6.9,1924,2.736223
28,Jurassic World,8662,6.5,135397,2.692632


In [97]:
composite(22, 'The Fast and the Furious')

Unnamed: 0,original_title,vote_count,vote_average,id,est
878,Flags of Our Fathers,526,6.7,3683,4.023625
1374,L.A. Confidential,1310,7.7,2118,3.87822
150,Men in Black II,3114,6.0,608,3.842554
573,Die Hard 2,1896,6.6,1573,3.664204
2743,The Butterfly Effect,2060,7.3,1954,3.599843
197,Harry Potter and the Philosopher's Stone,7006,7.5,671,3.584388
762,Mercury Rising,368,6.0,8838,3.497385
303,Catwoman,808,4.2,314,3.493384
813,Superman,1022,6.9,1924,3.463328
568,xXx,1424,5.8,7451,3.420842


As we can observe above, the recommendation system is working fine with different users. <br>

# Conclusion

- We have built a movie recommendation system using the TMDB 5000 Movie Dataset and Movies-Dataset. We have used three different approaches to build the recommendation system. The first approach is content-based filtering and the second approach is collaborative filtering. We have also combined these two approaches to build a composite recommendation system. <br>
- We have used the Surprise library to build the collaborative filtering model. We have used the SVD algorithm to get the predictions. We have used the cross validation to measure the accuracy of the model. <br>
- We have used the Tf-idf vectorizer to convert the overview column to vectors. We have used the CountVectorizer to convert the cast, crew, keywords and genres to vectors. We have used the linear_kernel to get the similarity scores for the overview column and the cosine_similarity to get the similarity scores for the other columns. <br>


### Further Improvements
The algorithm works fine however it could be better. We can improve the algorithm by provide more data to the model. We can also improve the algorithm by using other algorithms like KNN, NMF, etc. We can also improve the algorithm by using deep learning techniques like LSTM, etc.

### Contributors

- Kıraç Acar Apaydın : Collaborative Filtering
- Berkkan Bütün : Content Based Filtering
- We ingest the data first and then we clean the data together.
- Then we build different recommendation systems seperately.
- Finally we combine the recommendation systems to build a composite recommendation system.
