# Movie Recommender System
Using the [datasets](https://github.com/warriorkitty/orientlens/tree/master/movielens) colleccted from the Movielens [website](http://movielens.org), we construct a function that take the user index as input and prints out titles to 10 movies that users like them have watched but they haven't.


In [1]:
import numpy as np
import pandas as pd
dfmovies = pd.read_csv('https://raw.githubusercontent.com/warriorkitty/orientlens/master/movielens/movies.csv')
dfratings = pd.read_csv('https://raw.githubusercontent.com/warriorkitty/orientlens/master/movielens/ratings.csv')

In [2]:
dfmovies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [3]:
num_m = dfmovies.shape[0]   #total number of movies
num_m

8927

In [12]:
dfratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,5.0,847117005
1,1,2,3.0,847642142
2,1,10,3.0,847641896
3,1,32,4.0,847642008
4,1,34,4.0,847641956


In [4]:
num_u = int(dfratings.iloc[-1]['userId'])    #total number of users
num_u

718

## Restructuring the dataset
to a dictionary where dcuser[user1][movie1] is the rating of movie1 by user1. 

In [5]:
dcuser = {}
for i,row in dfratings.iterrows():
    m,u,r = row['movieId'],row['userId'],row['rating']
    if u not in dcuser:
        dcuser[u] = {}
    dcuser[u][m] = r

## Cosine Similarity
We consider each user as a vector with dimension = #no of movies, and the value at a coordinate = rating of the respective movie. The cosine similarity between 2 users is defined as the cosine of the angle between their 2 vectors.

In [6]:
def cossim(user):    # returns a list siml (where siml[other] = cosine(user,other) the user-wise list of similarities.
    #if o == u:       
     #   return 0
    ubuck = list(dcuser[user].keys())       #co-ords with non-zero values
    urat = list(dcuser[user].values())      #values at those co-ords
    usize = np.linalg.norm(urat)            #magnitude of vector
    
    siml = [0]*(1+num_u)                    
    
    if usize == 0:                          #if no movie has been watched, this will lead to output of most watched movies
        return [1]*(1+num_u)
    
    
    for other in dcuser:                           #for every user in data
        obuck = list(dcuser[other].keys())    
        orat = list(dcuser[other].values())
        osize = np.linalg.norm(orat)
    
        if osize != 0:
            tot = 0
            for m in ubuck:
                if m in obuck:
                    tot+= dcuser[user][m]*dcuser[other][m]       #multiply same co-ords and then add to get dot product
            siml[int(other)] = (tot)/usize/osize                 #cosine similarity
    return siml    

## Likeness Score
Calculated for each user and movie as a sum of ratings weighted by user similarity.

In [7]:
def score(m_ind,siml):          #given the list of user similarity, and the index of a movies, returns a score
    s = 0
    m = dfmovies.iloc[m_ind]['movieId']     
    for other in dcuser:         
        if m in dcuser[other]:       #if other has watched the movie     
            
            s+= dcuser[other][m]*siml[int(other)]        #weighted sum of rating by similarity
    return s

In [8]:
score(3,cossim(389))        #the 3rd movie for the 389th user

1.577541030396603

## Output function
Print 10 recommendations.

In [9]:
def printrecom(u):
    movieinds = range(num_m)        #list of all movies
    siml = cossim(u)                

    ranks = sorted(movieinds,key = lambda m: score(m,siml),reverse = True) #reverse sort of movies by score
    count = 0
    for m in ranks: 
        if m not in dcuser[u]:             #ignore the movies already watched
            print(dfmovies['title'][m])
            count+=1
        else:
            print(m)
        if count == 10:                    #break after printing 10 titles
            break        

In [10]:
printrecom(3)

Star Wars: Episode IV - A New Hope (1977)
Back to the Future (1985)
Pulp Fiction (1994)
Forrest Gump (1994)
Shawshank Redemption, The (1994)
Star Wars: Episode V - The Empire Strikes Back (1980)
Silence of the Lambs, The (1991)
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
Matrix, The (1999)
Toy Story (1995)


In [11]:
printrecom(718)

Pulp Fiction (1994)
Forrest Gump (1994)
Shawshank Redemption, The (1994)
Silence of the Lambs, The (1991)
Matrix, The (1999)
Jurassic Park (1993)
Star Wars: Episode IV - A New Hope (1977)
Toy Story (1995)
Back to the Future (1985)
Terminator 2: Judgment Day (1991)
