## Movie recommender using  Cosine Similarity

The idea is that whenever there is new user then there is no posssibility for giving the person any kind of recommendation based on previos ratings as there are no ratings. So our Latent Matrix facorisation would fail in that condition to give right recommendation. 
This could be tackeled by suggesting the user with items similar to recent viewed or opened item which could based on any factors like in here generes, cast, directors, production unit, language availability etc

The idea is to consider cosine similarity between different features. Cosine similarity is measurement of similarity between two vectors and can be found by measuring the cosine of angle between the two vectors, which again can be found by dividing the dot product of two vectors and then diving it with the magnitude of each. 

$$\cos{\theta} = \frac{\vec{A} \cdot \vec{B}}{\left\|\vec{A}\right\| \cdot \left\|\vec{B}\right\|}$$


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from numpy.linalg import norm

In [2]:
address = 'D:\\Project Ideas\\Confusionlist\\Recommender\\ml-latest-small\\ml-latest-small\\'
movies = pd.read_csv(address + 'movies.csv')

movies.tail()

Unnamed: 0,movieId,title,genres
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation
9741,193609,Andrew Dice Clay: Dice Rules (1991),Comedy


###### Converitng the genres column data into list for later usage

In [3]:
movies['genres'] = movies['genres'].fillna('NIL')

movies['genres'] = movies['genres'].apply(lambda x: x.split('|'))


In [4]:
 genre_strings = movies['genres']

#### Creating list of unique genres from the dataframe

In [5]:
gen = set()
for genre_string in genre_strings:
    gen.update(genre_string[0].split())

# convert the set to a list and sort it
unique_genres = list(gen)
unique_genres.sort()

genres = unique_genres[1:-2]
genres.append('NIL')
print(genres)

['Action', 'Adventure', 'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western', 'NIL']


In [6]:
## Function to create a list of 0 and 1 against the type of genres available
def compare_genre(name):
    gen = genres
    cos = np.zeros(len(genres))
    
    for i in range(len(gen)):
        for j in range(len(name)):
            
            if gen[i]==  name[j]:
                cos[i]=(int(1))
                continue
                
    return cos


#### New column with list of 0 and 1 against the genre list for genre of the movie
#### It is more like one hot encoding for type of genres

In [7]:
df = movies.copy()

df['genres_copy'] = df['genres'].apply(compare_genre)


In [8]:
movname = 'Toy Story (1995)'
TopSimilar = 15

b= list(df[df['title'] == movname]['genres_copy'])

df['ref'] = b*len(df) # Adding same list in all columns in df
df.head(10)

Unnamed: 0,movieId,title,genres,genres_copy,ref
0,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
1,2,Jumanji (1995),"[Adventure, Children, Fantasy]","[0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
2,3,Grumpier Old Men (1995),"[Comedy, Romance]","[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
3,4,Waiting to Exhale (1995),"[Comedy, Drama, Romance]","[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
4,5,Father of the Bride Part II (1995),[Comedy],"[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
5,6,Heat (1995),"[Action, Crime, Thriller]","[1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
6,7,Sabrina (1995),"[Comedy, Romance]","[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
7,8,Tom and Huck (1995),"[Adventure, Children]","[0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
8,9,Sudden Death (1995),[Action],"[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."
9,10,GoldenEye (1995),"[Action, Adventure, Thriller]","[1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, ..."


### Finding cosine similar movies for 'Toy Story (1995)' and taking top 15 of them

In [9]:

df['cos_sim'] = df.apply(lambda row: np.dot(row['genres_copy'], row['ref'])/(np.linalg.norm(row['genres_copy']) * np.linalg.norm(row['ref'])), axis=1)

print('Top ' + str(TopSimilar) + 'movies similar to ', movname) 
df.sort_values(by= 'cos_sim', ascending=False).head(TopSimilar)[['movieId','title','genres','cos_sim']][df['title']!=movname]

  df['cos_sim'] = df.apply(lambda row: np.dot(row['genres_copy'], row['ref'])/(np.linalg.norm(row['genres_copy']) * np.linalg.norm(row['ref'])), axis=1)


Top 15movies similar to  Toy Story (1995)


  df.sort_values(by= 'cos_sim', ascending=False).head(TopSimilar)[['movieId','title','genres','cos_sim']][df['title']!=movname]


Unnamed: 0,movieId,title,genres,cos_sim
7355,78499,Toy Story 3 (2010),"[Adventure, Animation, Children, Comedy, Fanta...",1.0
6260,47124,"Ant Bully, The (2006)","[Adventure, Animation, Children, Comedy, Fanta...",1.0
6948,65577,"Tale of Despereaux, The (2008)","[Adventure, Animation, Children, Comedy, Fantasy]",1.0
6194,45074,"Wild, The (2006)","[Adventure, Animation, Children, Comedy, Fantasy]",1.0
2355,3114,Toy Story 2 (1999),"[Adventure, Animation, Children, Comedy, Fantasy]",1.0
8219,103755,Turbo (2013),"[Adventure, Animation, Children, Comedy, Fantasy]",1.0
3568,4886,"Monsters, Inc. (2001)","[Adventure, Animation, Children, Comedy, Fantasy]",1.0
8927,136016,The Good Dinosaur (2015),"[Adventure, Animation, Children, Comedy, Fantasy]",1.0
7360,78637,Shrek Forever After (a.k.a. Shrek: The Final C...,"[Adventure, Animation, Children, Comedy, Fanta...",1.0
2809,3754,"Adventures of Rocky and Bullwinkle, The (2000)","[Adventure, Animation, Children, Comedy, Fantasy]",1.0


#### Genralised movie selection  based on genre similarity using cosine similarity

In [10]:
movname = input("Enter the Movie for selection : ")
TopSimilar = 15

b= list(df[df['title'] == movname]['genres_copy'])

df['ref'] = b*len(df) # Adding same list in all columns in df

df['cos_sim'] = df.apply(lambda x: np.dot(x['genres_copy'], x['ref'])/(np.linalg.norm(x['genres_copy']) * np.linalg.norm(x['ref'])), axis=1)


df.sort_values(by= 'cos_sim', ascending=False).head(TopSimilar)[['movieId','title','genres','cos_sim']][df['title']!=movname]

Enter the Movie for selection : Bungo Stray Dogs: Dead Apple (2018)


  df['cos_sim'] = df.apply(lambda x: np.dot(x['genres_copy'], x['ref'])/(np.linalg.norm(x['genres_copy']) * np.linalg.norm(x['ref'])), axis=1)
  df.sort_values(by= 'cos_sim', ascending=False).head(TopSimilar)[['movieId','title','genres','cos_sim']][df['title']!=movname]


Unnamed: 0,movieId,title,genres,cos_sim
8931,136297,Mortal Kombat: The Journey Begins (1995),"[Action, Animation]",1.0
7896,95004,Superman/Doomsday (2007),"[Action, Animation]",1.0
5597,26913,Street Fighter II: The Animated Movie (Sutorît...,"[Action, Animation]",1.0
8080,99813,"Batman: The Dark Knight Returns, Part 2 (2013)","[Action, Animation]",1.0
7380,79274,Batman: Under the Red Hood (2010),"[Action, Animation]",1.0
7737,90746,"Adventures of Tintin, The (2011)","[Action, Animation, Mystery, IMAX]",0.816497
8841,132362,Patlabor 2: The Movie (1993),"[Action, Animation, Sci-Fi]",0.816497
8930,136024,The Professional: Golgo 13 (1983),"[Action, Animation, Crime]",0.816497
6815,60979,Batman: Gotham Knight (2008),"[Action, Animation, Crime]",0.816497
9452,167746,The Lego Batman Movie (2017),"[Action, Animation, Comedy]",0.816497
