# Content Based Recommendation System Based on Movie Genres
When I personally search for movies that I might be interested in, genre is a very important factor. I would tend to watch movies in genres that I like. So I will build this content based recommendation system based on movie genres.  

In [83]:
#load python packages
import os
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

## Load the data

In [84]:
ratings = pd.read_csv('D:\Jupyter_Notebook\Movie_Recommendation_System\data\\ratings_featured.csv')
users = pd.read_csv('D:\Jupyter_Notebook\Movie_Recommendation_System\data\\users_featured.csv')
movies = pd.read_csv('D:\Jupyter_Notebook\Movie_Recommendation_System\data\\movies_featured.csv') 

In [85]:
movies.shape

(3883, 21)

In [86]:
movies.head()

Unnamed: 0,movie_id,title,genres,War,Comedy,Musical,Thriller,Sci-Fi,Fantasy,Drama,...,Animation,Mystery,Western,Crime,Adventure,Documentary,Film-Noir,Horror,Romance,Children's
0,1,Toy Story (1995),Animation|Children's|Comedy,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,1
1,2,Jumanji (1995),Adventure|Children's|Fantasy,0,0,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,1
2,3,Grumpier Old Men (1995),Comedy|Romance,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
3,4,Waiting to Exhale (1995),Comedy|Drama,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,5,Father of the Bride Part II (1995),Comedy,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Building the content based engine
I am going to build a Content-Based Recommendation Engine that computes similarity between movies based on movie genres. It will suggest movies that are most similar to a particular movie based on its genre. To do so, I will make use of the file movies.csv.

In [87]:
# Break up the big genre string into a string array
movies['genres_list'] = movies['genres'].str.split('|')
# Convert genres to string value
movies['genres_list'] = movies['genres_list'].fillna("").astype('str')

In [88]:
# use TfidfVectorizer function from scikit-learn, which transforms text to feature vectors that can be used as input to estimator.
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(movies['genres'])
tfidf_matrix.shape

(3883, 127)

Use the Cosine Similarity to calculate a numeric quantity that denotes the similarity between two movies. Since we have used the TF-IDF Vectorizer, calculating the Dot Product will directly give us the Cosine Similarity Score. Therefore, we will use sklearn's linear_kernel instead of cosine_similarities since it is much faster.

In [89]:
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim[:4, :4]

array([[1.        , 0.14193614, 0.09010857, 0.1056164 ],
       [0.14193614, 1.        , 0.        , 0.        ],
       [0.09010857, 0.        , 1.        , 0.1719888 ],
       [0.1056164 , 0.        , 0.1719888 , 1.        ]])

In [90]:
# Build a 1-dimensional array with movie titles
titles = movies['title']
indices = pd.Series(movies.index, index=movies['title'])

# Function that get movie recommendations based on the cosine similarity score of movie genres
def genre_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

In [91]:
genre_recommendations('Toy Story (1995)').head(20)

1050               Aladdin and the King of Thieves (1996)
2072                             American Tail, An (1986)
2073           American Tail: Fievel Goes West, An (1991)
2285                            Rugrats Movie, The (1998)
2286                                 Bug's Life, A (1998)
3045                                   Toy Story 2 (1999)
3542                                Saludos Amigos (1943)
3682                                   Chicken Run (2000)
3685       Adventures of Rocky and Bullwinkle, The (2000)
236                                 Goofy Movie, A (1995)
12                                           Balto (1995)
241                               Gumby: The Movie (1995)
310                             Swan Princess, The (1994)
592                                      Pinocchio (1940)
612                                Aristocats, The (1970)
700                               Oliver & Company (1988)
876     Land Before Time III: The Time of the Great Gi...
1010          

In [92]:
genre_recommendations('Jumanji (1995)').head(20)

55                         Kids of the Round Table (1995)
59                     Indian in the Cupboard, The (1995)
124                     NeverEnding Story III, The (1994)
996                       Escape to Witch Mountain (1975)
1898                                     Labyrinth (1986)
1936                                  Goonies, The (1985)
1974            Darby O'Gill and the Little People (1959)
2092                        NeverEnding Story, The (1984)
2093    NeverEnding Story II: The Next Chapter, The (1...
2330                        Santa Claus: The Movie (1985)
1489                            Warriors of Virtue (1997)
1542                                Simple Wish, A (1997)
1006                  20,000 Leagues Under the Sea (1954)
1698                                      Star Kid (1997)
2024                                  Return to Oz (1985)
7                                     Tom and Huck (1995)
144                   Amazing Panda Adventure, The (1995)
156           

In [93]:
genre_recommendations('Gone in 60 Seconds (2000)').head(20)

985                       Set It Off (1996)
1727                King of New York (1990)
2191                          Wisdom (1986)
2239                    Detroit 9000 (1973)
2499                  Mod Squad, The (1999)
2547                      Dick Tracy (1990)
3196    Hard-Boiled (Lashou shentan) (1992)
3647                    Fatal Beauty (1987)
3648              Gone in 60 Seconds (2000)
3660                           Shaft (1971)
3675                           Shaft (2000)
3712                 Shaft in Africa (1973)
3713              Shaft's Big Score! (1972)
41                   Dead Presidents (1995)
489                Menace II Society (1993)
847                   Godfather, The (1972)
1203         Godfather: Part II, The (1974)
1954        Godfather: Part III, The (1990)
2125               Untouchables, The (1987)
2843                      Limey, The (1999)
Name: title, dtype: object

I tried 3 movies "Toy Story", "Jumanji", and "Gone in 60 Seconds", the genre recommendation engine gives me movies in similar genres. By checking the results, the recommended movies are pretty acceptable.

This recommendation system does not need data on other users. We can recommend to users based on their unique tastes. However, this recommendation engine will not recommend movies outside a user's content profile. 

So this recommendation engine could be used under the following circumstances:
1. A new user registers the movie website. The website showed a list of movies and let the user choose the movies they like. Then we use this recommendation system to recommend similar movies based on what the user chose. 
2. A user rates high ratings for a certain movie, then the website can use this recommendation engine to recommend similar movies. 