## What to do?

- <s>Recommend movies to users based on implicit recommend function</s>
- <s>Because you've watched feature</s>
- <s>Trending Movies</s>
- <s>Export model and data for production</s>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import implicit
import random

import scipy.sparse as sparse

%matplotlib inline



In [2]:
ratings = pd.read_csv("data/ml-latest-small/ratings.csv")

In [3]:
ratings = ratings[["userId", "movieId", "rating"]]

users = list(np.sort(ratings.userId.unique())) # Get our unique customers
movies = list(ratings.movieId.unique()) # Get our unique products that were purchased
rating = list(ratings.rating) # All of our purchases

rows = ratings.userId.astype('category', categories = users).cat.codes 
# Get the associated row indices
cols = ratings.movieId.astype('category', categories = movies).cat.codes 
# Get the associated column indices
user_item = sparse.csr_matrix((rating, (rows, cols)), shape=(len(users), len(movies)))

matrix_size = user_item.shape[0]*user_item.shape[1] # Number of possible interactions in the matrix
num_purchases = len(user_item.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (1.0*num_purchases/matrix_size))
print (sparsity)

user_item

98.3560858391


  import sys
  if __name__ == '__main__':


<671x9066 sparse matrix of type '<type 'numpy.float64'>'
	with 100004 stored elements in Compressed Sparse Row format>

## Recommending Movies to users

In [4]:
model = implicit.als.AlternatingLeastSquares(factors=10, 
                                             iterations=20, 
                                             regularization=0.1, 
                                             num_threads=4)
model.fit(user_item.T)

First let's write a function that returns the movies that a particular user had rated

In [5]:
def get_rated_movies_ids(user_id, user_item, users, movies):
    """
    Input
    -----
    
    user_id: int
        User ID
        
    user_item: scipy.Sparse Matrix
        User item interaction matrix
        
    users: np.array
        Mapping array between user ID and index in the user item matrix
        
    movies: np.array
        Mapping array between movie ID and index in the user item matrix
        
    Output
    -----
    
    movieTableIDs: python list
        List of movie IDs that the user had rated
    
    """
    user_id = users.index(user_id)
    # Get matrix ids of rated movies by selected user
    ids = user_item[user_id].nonzero()[1]
    # Convert matrix ids to movies IDs
    movieTableIDs = [movies[item] for item in ids]
    
    return movieTableIDs

In [6]:
movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)

In [7]:
rated_movies = pd.DataFrame(movieTableIDs, columns=['movieId'])
rated_movies

Unnamed: 0,movieId
0,31
1,1029
2,1061
3,1129
4,1172
5,1263
6,1287
7,1293
8,1339
9,1343


In [8]:
movies_table = pd.read_csv("data/ml-latest-small/movies.csv")
movies_table.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [9]:
rated_movies = pd.merge(rated_movies, movies_table, on='movieId', how='left')
rated_movies

Unnamed: 0,movieId,title,genres
0,31,Dangerous Minds (1995),Drama
1,1029,Dumbo (1941),Animation|Children|Drama|Musical
2,1061,Sleepers (1996),Thriller
3,1129,Escape from New York (1981),Action|Adventure|Sci-Fi|Thriller
4,1172,Cinema Paradiso (Nuovo cinema Paradiso) (1989),Drama
5,1263,"Deer Hunter, The (1978)",Drama|War
6,1287,Ben-Hur (1959),Action|Adventure|Drama
7,1293,Gandhi (1982),Drama
8,1339,Dracula (Bram Stoker's Dracula) (1992),Fantasy|Horror|Romance|Thriller
9,1343,Cape Fear (1991),Thriller


In [10]:
def get_movies(movieTableIDs, movies_table):
    """
    Input
    -----
    
    movieTableIDs: python list
        List of movie IDs that the user had rated
        
    movies_table: pd.DataFrame
        DataFrame of movies info
        
    Output
    -----
    
    rated_movies: pd.DataFrame
        DataFrame of rated movies
    
    """
    
    rated_movies = pd.DataFrame(movieTableIDs, columns=['movieId'])
    
    rated_movies = pd.merge(rated_movies, movies_table, on='movieId', how='left')
    
    return rated_movies

In [11]:
movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)
df = get_movies(movieTableIDs, movies_table)
df

Unnamed: 0,movieId,title,genres
0,31,Dangerous Minds (1995),Drama
1,1029,Dumbo (1941),Animation|Children|Drama|Musical
2,1061,Sleepers (1996),Thriller
3,1129,Escape from New York (1981),Action|Adventure|Sci-Fi|Thriller
4,1172,Cinema Paradiso (Nuovo cinema Paradiso) (1989),Drama
5,1263,"Deer Hunter, The (1978)",Drama|War
6,1287,Ben-Hur (1959),Action|Adventure|Drama
7,1293,Gandhi (1982),Drama
8,1339,Dracula (Bram Stoker's Dracula) (1992),Fantasy|Horror|Romance|Thriller
9,1343,Cape Fear (1991),Thriller


In [12]:
def recommend_movie_ids(user_id, model, user_item, users, movies, N=5):
    """
    Input
    -----
    
    user_id: int
        User ID
        
    model: ALS model
        Trained ALS model
    
    user_item: sp.Sparse Matrix
        User item interaction matrix so that we do not recommend already rated movies
        
    users: np.array
        Mapping array between User ID and user item index
        
    movies: np.array
        Mapping array between Movie ID and user item index
        
    N: int (default =5)
        Number of recommendations
        
    Output
    -----
    
    movies_ids: python list
        List of movie IDs
    """
    
    user_id = users.index(user_id)
    
    recommendations = model.recommend(user_id, user_item, N=N)
    
    recommendations = [item[0] for item in recommendations]
    
    movies_ids = [movies[ids] for ids in recommendations]
    
    return movies_ids

In [13]:
movies_ids = recommend_movie_ids(1, model, user_item, users, movies, N=5)
movies_ids

[1374, 1127, 1214, 1376, 541]

In [14]:
movies_rec = get_movies(movies_ids, movies_table)
movies_rec

Unnamed: 0,movieId,title,genres
0,1374,Star Trek II: The Wrath of Khan (1982),Action|Adventure|Sci-Fi|Thriller
1,1127,"Abyss, The (1989)",Action|Adventure|Sci-Fi|Thriller
2,1214,Alien (1979),Horror|Sci-Fi
3,1376,Star Trek IV: The Voyage Home (1986),Adventure|Comedy|Sci-Fi
4,541,Blade Runner (1982),Action|Sci-Fi|Thriller


In [15]:
df

Unnamed: 0,movieId,title,genres
0,31,Dangerous Minds (1995),Drama
1,1029,Dumbo (1941),Animation|Children|Drama|Musical
2,1061,Sleepers (1996),Thriller
3,1129,Escape from New York (1981),Action|Adventure|Sci-Fi|Thriller
4,1172,Cinema Paradiso (Nuovo cinema Paradiso) (1989),Drama
5,1263,"Deer Hunter, The (1978)",Drama|War
6,1287,Ben-Hur (1959),Action|Adventure|Drama
7,1293,Gandhi (1982),Drama
8,1339,Dracula (Bram Stoker's Dracula) (1992),Fantasy|Horror|Romance|Thriller
9,1343,Cape Fear (1991),Thriller


## Add posters data

In [16]:
metadata = pd.read_csv('data/movies_metadata.csv')

image_data = metadata[['imdb_id', 'poster_path']]

links = pd.read_csv("data/links.csv")

links = links[['movieId', 'imdbId']]

image_data = image_data[~ image_data.imdb_id.isnull()]

def app(x):
    try:
        return int(x[2:])
    except ValueError:
        print x
        
image_data['imdbId'] = image_data.imdb_id.apply(app)
image_data = image_data[~ image_data.imdbId.isnull()]
image_data.imdbId = image_data.imdbId.astype(int)
image_data = image_data[['imdbId', 'poster_path']]


posters = pd.merge(image_data, links, on='imdbId', how='left')

posters = posters[['movieId', 'poster_path']]

posters = posters[~ posters.movieId.isnull()]

posters.movieId = posters.movieId.astype(int)

movies_table = pd.merge(movies_table, posters, on='movieId', how='left')
movies_table.head()

  interactivity=interactivity, compiler=compiler, result=result)


0
0
0


Unnamed: 0,movieId,title,genres,poster_path
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,/rhIRbceoE9lR4veEXuwCC2wARtG.jpg
1,2,Jumanji (1995),Adventure|Children|Fantasy,/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg
2,3,Grumpier Old Men (1995),Comedy|Romance,/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg
4,5,Father of the Bride Part II (1995),Comedy,/e64sOI48hQXyru7naBFyssKFxVd.jpg


In [17]:
from IPython.display import HTML
from IPython.display import display

def display_posters(df):
    
    images = ''
    for ref in df.poster_path:
            if ref != '':
                link = 'http://image.tmdb.org/t/p/w185/' + ref
                images += "<img style='width: 120px; margin: 0px; \
                  float: left; border: 1px solid black;' src='%s' />" \
              % link
    display(HTML(images))

In [18]:
movies_rec = get_movies(movies_ids, movies_table)
movies_rec

Unnamed: 0,movieId,title,genres,poster_path
0,1374,Star Trek II: The Wrath of Khan (1982),Action|Adventure|Sci-Fi|Thriller,/7VKpj4Xl3hTzgAS3xpVuOyqNnSv.jpg
1,1127,"Abyss, The (1989)",Action|Adventure|Sci-Fi|Thriller,/kRP5dGXDhKt7bDpXX4YBa4dRwlL.jpg
2,1214,Alien (1979),Horror|Sci-Fi,/2h00HrZs89SL3tXB4nbkiM7BKHs.jpg
3,1376,Star Trek IV: The Voyage Home (1986),Adventure|Comedy|Sci-Fi,/62nATuMKuaLhd5VHKumHOrJnCZa.jpg
4,541,Blade Runner (1982),Action|Sci-Fi|Thriller,/p64TtbZGCElxQHpAMWmDHkWJlH2.jpg


In [19]:
display_posters(movies_rec)

In [20]:
movies_ids = recommend_movie_ids(100, model, user_item, users, movies, N=7)
movies_rec = get_movies(movies_ids, movies_table)
display_posters(movies_rec)

## Because You've watched

Let's implement Netflix latest features. It's about recommending movies based on what you've watched. This is similar to what we already did, but this time, it's more selective. Here's how we will do it: We will choose random 5 movies that a user had watched and for each movie recommend similar movies to it. Finally, we display all of them in a one page layout

In [21]:
def similar_items(item_id, movies_table, movies, N=5):
    """
    Input
    -----
    
    item_id: int
        MovieID in the movies table
    
    movies_table: DataFrame
        DataFrame with movie ids, movie title and genre
        
    movies: np.array
        Mapping between movieID in the movies_table and id in the item user matrix
        
    N: int
        Number of similar movies to return
        
    Output
    -----
    df: DataFrame
        DataFrame with selected movie in first row and similar movies for N next rows
    """
    # Get movie user index from the mapping array
    user_item_id = movies.index(item_id)
    # Get similar movies from the ALS model
    similars = model.similar_items(user_item_id, N=N+1)    
    # ALS similar_items provides (id, score), we extract a list of ids
    l = [item[0] for item in similars[1:]]
    # Convert those ids to movieID from the mapping array
    ids = [movies[ids] for ids in l]
    # Make a dataFrame of the movieIds
    ids = pd.DataFrame(ids, columns=['movieId'])
    # Add movie title and genres by joining with the movies table
    recommendation = pd.merge(ids, movies_table, on='movieId', how='left')
    
    return recommendation

In [49]:
def display_recommendations(df):
    
    images = ''
    for ref in df.poster_path:
            if ref != '':
                link = 'http://image.tmdb.org/t/p/w185/' + ref
                images += "<img style='width: 120px; margin: 0px; \
                  float: left; border: 1px solid black;' src='%s' />" \
              % link
    display(HTML(images))

In [57]:
def similar_and_display(item_id, movies_table, movies, N=5):
    
    df = similar_items(item_id, movies_table, movies, N=N)
    
    df.dropna(inplace=True)
    
    display_recommendations(df)

In [24]:
movieTableIDs = get_rated_movies_ids(1, user_item, users, movies)
df = get_movies(movieTableIDs, movies_table)
df

Unnamed: 0,movieId,title,genres,poster_path
0,31,Dangerous Minds (1995),Drama,/y5Jee3QmYOlpqfaPPbfvtdVc5wj.jpg
1,1029,Dumbo (1941),Animation|Children|Drama|Musical,/r5IqhwZ61OuKlsyDwvXWyWQZK30.jpg
2,1061,Sleepers (1996),Thriller,/cDqEv4Fw4JZh2zCfecqw3z09L8z.jpg
3,1129,Escape from New York (1981),Action|Adventure|Sci-Fi|Thriller,/z1KTbKJh7vSTTXllxxE4DWG63rT.jpg
4,1172,Cinema Paradiso (Nuovo cinema Paradiso) (1989),Drama,/xKAweeF2ZPMNn6ce4GclSbr59Pv.jpg
5,1263,"Deer Hunter, The (1978)",Drama|War,/slNJESItHPqp1CENEJQUPw8d7WE.jpg
6,1287,Ben-Hur (1959),Action|Adventure|Drama,/syPMBvvZsADTTRu3UKuxO1Wflq.jpg
7,1293,Gandhi (1982),Drama,/2z9A4FSu1YySrhhcuqkdMIXpgyN.jpg
8,1339,Dracula (Bram Stoker's Dracula) (1992),Fantasy|Horror|Romance|Thriller,/ioHxm3D3JdSXR61LRhcVb8KdZOz.jpg
9,1343,Cape Fear (1991),Thriller,/4KvrvcqckdupXXO2YnANtyG7QLK.jpg


In [25]:
def because_you_watched(user, user_item, users, movies, k=5, N=5):
    """
    Input
    -----
    
    user: int
        User ID
        
    user_item: scipy sparse matrix
        User item interaction matrix
        
    users: np.array
        Mapping array between User ID and user item index
        
    movies: np.array
        Mapping array between Movie ID and user item index
        
    k: int
        Number of recommendations per movie
        
    N: int
        Number of movies already watched chosen
    
    """
    
    movieTableIDs = get_rated_movies_ids(user, user_item, users, movies)
    df = get_movies(movieTableIDs, movies_table)
    
    movieIDs = random.sample(df.movieId, N)
    
    for movieID in movieIDs:
        title = df[df.movieId == movieID].iloc[0].title
        print("Because you've watched ", title)
        similar_and_display(movieID, movies_table, movies, k)

In [56]:
because_you_watched(500, user_item, users, movies, k=5, N=5)

("Because you've watched ", '3 Ninjas: High Noon On Mega Mountain (1998)')
   movieId                       title                  genres  \
0     8911     Raise Your Voice (2004)                 Romance   
1     4750  3 Ninjas Knuckle Up (1995)         Action|Children   
2     7624          School Ties (1992)                   Drama   
3     4749   3 Ninjas Kick Back (1994)  Action|Children|Comedy   
4    31433    Wedding Date, The (2005)          Comedy|Romance   

                        poster_path  
0  /b1fdAQC87xpxKzfP1akrrmsDR6R.jpg  
1  /bd9if2VRAxBrXxkhT28hXMiusY1.jpg  
2  /poV3j71mcmQkmjezc2H35xJsAhD.jpg  
3  /paFMTv9IuZJaVkPZZvwtS5Ta5D9.jpg  
4  /A2m90ko1FCnJqEkpbHwMq3BDgx6.jpg  


("Because you've watched ", 'Saved! (2004)')
   movieId                        title                      genres  \
0     5505        Good Girl, The (2002)                Comedy|Drama   
1     3079        Mansfield Park (1999)        Comedy|Drama|Romance   
2     3189           My Dog Skip (1999)              Children|Drama   
3     6218  Bend It Like Beckham (2002)        Comedy|Drama|Romance   
4     5991               Chicago (2002)  Comedy|Crime|Drama|Musical   

                        poster_path  
0  /gXUh1LQ8g2rOKNzmOCe3tjfnsZR.jpg  
1  /reh2SdnPDmFbjWWAEbTv9GgQbke.jpg  
2  /t1jsRTNF1LcVBTfpXWNfrPjgoG1.jpg  
3  /aaLAIPw9vJSVyQLy390TJrntEwF.jpg  
4  /18pCc2XZ5MO7wsywOYEbhoeuxNw.jpg  


("Because you've watched ", 'Stranger than Fiction (2006)')
   movieId                        title                              genres  \
0    69757  (500) Days of Summer (2009)                Comedy|Drama|Romance   
1    56367                  Juno (2007)                Comedy|Drama|Romance   
2    54732         Balls of Fury (2007)                              Comedy   
3    68954                    Up (2009)  Adventure|Animation|Children|Drama   
4    57669             In Bruges (2008)         Comedy|Crime|Drama|Thriller   

                        poster_path  
0  /5SjtNPD1bb182vzQccvEUpXHFjN.jpg  
1  /eE64N6PYCSRW2mtQucfK2av5Wk2.jpg  
2  /ouNyskL3MjSr1SZe5rIfxmQ1E4M.jpg  
3  /nk11pvocdb5zbFhX5oq5YiLPYMo.jpg  
4  /kBABboeLU2HsKWSG7DwiF9saHl5.jpg  


("Because you've watched ", 'Lord of the Rings: The Two Towers, The (2002)')
   movieId                                              title  \
0     7153  Lord of the Rings: The Return of the King, The...   
1     4993  Lord of the Rings: The Fellowship of the Ring,...   
2     4306                                       Shrek (2001)   
3     4886                              Monsters, Inc. (2001)   
4     6539  Pirates of the Caribbean: The Curse of the Bla...   

                                              genres  \
0                     Action|Adventure|Drama|Fantasy   
1                                  Adventure|Fantasy   
2  Adventure|Animation|Children|Comedy|Fantasy|Ro...   
3        Adventure|Animation|Children|Comedy|Fantasy   
4                    Action|Adventure|Comedy|Fantasy   

                        poster_path  
0  /uexxR7Kw1qYbZk0RYaF9Rx5ykbj.jpg  
1  /bxVxZb5O9OxCO0oRUNdCnpy9NST.jpg  
2  /140ewbWv8qHStD3mlBDvvGd0Zvu.jpg  
3  /93Y9BGx8blzmZOPSoivkFfaifqU.jpg  
4  /t

("Because you've watched ", 'Da Vinci Code, The (2006)')
   movieId                    title                                 genres  \
0    45517              Cars (2006)              Animation|Children|Comedy   
1    33679  Mr. & Mrs. Smith (2005)        Action|Adventure|Comedy|Romance   
2    55872       August Rush (2007)                          Drama|Musical   
3    59784     Kung Fu Panda (2008)  Action|Animation|Children|Comedy|IMAX   
4    47610  Illusionist, The (2006)          Drama|Fantasy|Mystery|Romance   

                        poster_path  
0  /5damnMcRFKSjhCirgX3CMa88MBj.jpg  
1  /dqs5BmwSULtB28Kls3IB6khTQwp.jpg  
2  /j8kUdhnIlYsgvL0u4EUKEzWUrbo.jpg  
3  /2Paj1nufT0jeSY0G4u3RC31HIGT.jpg  
4  /sRYw9oAiporMpq1GWcYHqmpdeAO.jpg  


## Trending movies

Let's also implement trending movies. In our context, trending movies are movies that been rated the most by users

In [27]:
binary = user_item.copy()
binary[binary !=0] = 1
binary.shape

(671, 9066)

In [28]:
print (user_item.shape)
populars = np.array(binary.sum(axis=0)).reshape(-1)
print (populars.shape)

(671, 9066)
(9066,)


In [29]:
populars

array([ 42.,  42.,  33., ...,   1.,   1.,   1.])

In [30]:
populars.argsort()[::-1][:5]

array([ 57,  49,  99,  92, 143])

In [31]:
populars

array([ 42.,  42.,  33., ...,   1.,   1.,   1.])

In [32]:
binary = user_item.copy()
binary[binary !=0] = 1
    
populars = np.array(binary.sum(axis=0)).reshape(-1)
    
movieIDs = populars.argsort()[::-1][:5]
    
movies_rec = get_movies(movieIDs, movies_table)

In [33]:
movies_rec.head()

Unnamed: 0,movieId,title,genres,poster_path
0,57,Home for the Holidays (1995),Drama,/97EKLaslQ4gzkvALtqGCDOmsGHk.jpg
1,49,When Night Is Falling (1995),Drama|Romance,/wnUuYc9XnWrTTAGz8YjjULA3Zmr.jpg
2,99,Heidi Fleiss: Hollywood Madam (1995),Documentary,/j9k3UTx7OZ0xoCws1oACYQZFF5N.jpg
3,92,Mary Reilly (1996),Drama|Horror|Thriller,/Oesh4pN5J2ing4flkgkiVdxILP.jpg
4,143,,,


In [59]:
def get_trending(user_item, movies, movies_table, N=5):
    """
    Input
    -----
    
    user_item: scipy sparse matrix
        User item interaction matrix to use to extract popular movies
        
    movies: np.array
        Mapping array between movieId and ID in the user_item matrix
        
    movies_table: pd.DataFrame
        DataFrame for movies information
        
    N: int
        Top N most popular movies to return
    
    """
    
    binary = user_item.copy()
    binary[binary !=0] = 1
    
    populars = np.array(binary.sum(axis=0)).reshape(-1)
    
    movieIDs = populars.argsort()[::-1][:N]
    
    movies_rec = get_movies(movieIDs, movies_table)
    
    movies_rec.dropna(inplace=True)
    
    print("Trending Now")
    
    display_posters(movies_rec)

In [53]:
get_trending(user_item, movies, movies_table, N=6)

   movieId                                 title                 genres  \
0       57          Home for the Holidays (1995)                  Drama   
1       49          When Night Is Falling (1995)          Drama|Romance   
2       99  Heidi Fleiss: Hollywood Madam (1995)            Documentary   
3       92                    Mary Reilly (1996)  Drama|Horror|Thriller   
5       72          Kicking and Screaming (1995)           Comedy|Drama   

                        poster_path  
0  /97EKLaslQ4gzkvALtqGCDOmsGHk.jpg  
1  /wnUuYc9XnWrTTAGz8YjjULA3Zmr.jpg  
2  /j9k3UTx7OZ0xoCws1oACYQZFF5N.jpg  
3   /Oesh4pN5J2ing4flkgkiVdxILP.jpg  
5  /urPxxjylNUHmoNcGkMgHStMTxhF.jpg  
Trending Now


In [60]:
def my_timeline(user, user_item, users, movies, movies_table, k=5, N=5):
    
    get_trending(user_item, movies, movies_table, N=N)
    
    because_you_watched(user, user_item, users, movies, k=k, N=N)

In [61]:
my_timeline(500, user_item, users, movies, movies_table, k=5, N=5)

Trending Now


("Because you've watched ", 'Out of Africa (1985)')


("Because you've watched ", 'Boy in the Striped Pajamas, The (Boy in the Striped Pyjamas, The) (2008)')


("Because you've watched ", 'Star Trek: Generations (1994)')


("Because you've watched ", 'Saved! (2004)')


("Because you've watched ", "Pirates of the Caribbean: Dead Man's Chest (2006)")


## Export trained models to be used in production

At this point, we want to get our model into production. We want to create a web service where a user will provide a userid to the service and the service will return all of the recommendations including the trending and the "because you've watched". We first export the trained model and the used data for use in the web service.

In [62]:
import scipy.sparse
scipy.sparse.save_npz('model/user_item.npz', user_item)

In [63]:
np.save('model/movies.npy', movies)
np.save('model/users.npy', users)
movies_table.to_csv('model/movies_table.csv', index=False)

In [65]:
from sklearn.externals import joblib
joblib.dump(model, 'model/model.pkl') 

['model/model.pkl']