# Content based recommendation system

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2018/06/Screenshot-from-2018-06-21-10-57-38.png)

### Advantages 

1. This recommendation doesn't require user data to train on. 
2. It requires only the item data
3. The core concept is Natural Language Processing. Hence there is a ready made preprocessing pipeline to be followed which works for any domain.
4. This acts more like a script which can be run after some amount of item data is available. Best usecase for early stage start-ups.
5. Requires less resources (training time, processing power) as the algorithm used is standard and has a very high explainability.

***

### Disadvantages

1. The item **must** have item name and item description
2. Since we run the code as a script, there are chances that the recommendation might be skewed. Solution, more the amount of data, better the recommendation
3. There must be some naming conventions for the item name and item description so that they are interpretable to the algorithm
4. The regex filtering changes domain to domain

### Imports

### Directory Structure

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import sigmoid_kernel
import warnings
warnings.filterwarnings('ignore')
import scipy
import os

In [2]:
cwd = os.getcwd()
parent = os.path.dirname(cwd)
print(parent)

D:\MSU\CMSE 831 Project


In [3]:
credits = pd.read_csv(f'{parent}\\data\\tmdb_5000_credits.csv')
movies_df = pd.read_csv(f'{parent}\\data\\tmdb_5000_movies.csv')

In [4]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [5]:
movies_df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


## Merging both dataframes and keeping only required columns

In [6]:
credits_column_renamed = credits.rename(columns = {"movie_id": "id"})
movies_df_merged = movies_df.merge(credits_column_renamed, on= 'id')
movies_df_merged.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [7]:
movies_cleaned_df = movies_df_merged[['id','original_title', 'overview']]
movies_cleaned_df.head()

Unnamed: 0,id,original_title,overview
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In this type of recommendation system, we try to find similarity between items. There are two ways to do it :

- Statistical approach -> Weighted hybrid technique, requires item data + generic data (total ratings, popularity)
- NLP approach -> Requires item data only, Standard preprocessing steps. Can be used as a script

Example overview

In [8]:
movies_cleaned_df.head(1)['overview'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.'

***The only thing we need to take care is that regex differes for different usecases***

In [9]:
tfv = TfidfVectorizer(min_df = 3, max_features=None, strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1,3))

# filling NaNs with empty strings
movies_cleaned_df.overview = movies_cleaned_df.overview.fillna('')

In [10]:
# Sparse matrix
tfv_matrix = tfv.fit_transform(movies_cleaned_df.overview)
tfv_matrix.shape

(4803, 26568)

## Transforming range of tfv_matrix using sigmoid kernel

![](https://qph.fs.quoracdn.net/main-qimg-6ab7369356c16f17ac39fbb83d5d56c1)

In [11]:
# Transforms the matrix value range to [0,1]
sig = sigmoid_kernel(tfv_matrix, tfv_matrix)

In [12]:
sig[0]

array([0.76160996, 0.76159451, 0.7615945 , ..., 0.76159452, 0.76159439,
       0.76159432])

## Reversing mapping of indices and movie titles

In [13]:
indices = pd.Series(movies_cleaned_df.index, index = movies_cleaned_df.original_title).drop_duplicates()
indices

original_title
Avatar                                         0
Pirates of the Caribbean: At World's End       1
Spectre                                        2
The Dark Knight Rises                          3
John Carter                                    4
                                            ... 
El Mariachi                                 4798
Newlyweds                                   4799
Signed, Sealed, Delivered                   4800
Shanghai Calling                            4801
My Date with Drew                           4802
Length: 4803, dtype: int64

In [14]:
def give_rec(title, sig = sig):
    # get index corresponding to the original_title
    idx = indices[title]
    
    # Get the list of ids along with pairwise similarity scores of the provided idx with other ids
    # Sort the movies
    # Selecting top 10 movies for recommendation
    sig_scores = list(enumerate(sig[idx]))
    sig_scores = sorted(sig_scores,key = lambda x: x[1], reverse=True)
    sig_scores = sig_scores[1:11]
    
    # Movie indices 
    movies_indices = [i[0] for i in sig_scores]
    
    # Top 10 similar movies
    return movies_cleaned_df.original_title.iloc[movies_indices]

## Kids Recommendation (Animated)

In [15]:
give_rec("Toy Story 3")

1541                     Toy Story
343                    Toy Story 2
1779        The 40 Year Old Virgin
160     How to Train Your Dragon 2
1086           Aliens in the Attic
143                            Pan
3065                    Heartbeeps
3873                 Class of 1984
4387          A LEGO Brickumentary
3379                  Factory Girl
Name: original_title, dtype: object

## Action movie recommendation (James Bond)

In [16]:
give_rec("Spectre")

3162                    Thunderball
1837                 Romeo Must Die
29                          Skyfall
1343          Never Say Never Again
4473                 The Innkeepers
3351    The Man with the Golden Gun
1200           The Living Daylights
4071          From Russia with Love
11                Quantum of Solace
4009          2016: Obama's America
Name: original_title, dtype: object

## Romance recommendation

In [17]:
give_rec("Newlyweds")

2689         Our Family Wedding
504     The Secret Life of Pets
1576                 Bride Wars
1364            Horrible Bosses
1110      Underworld: Evolution
616                       Ted 2
869          You, Me and Dupree
866          Bullet to the Head
3253             Drive Me Crazy
1071                    Blended
Name: original_title, dtype: object