# Recommender System with GraphLab
In which we demonstrate some features of GraphLab while building a movie recommendation system

In [1]:
import graphlab as gl

A newer version of GraphLab Create (v2.1) is available! Your current version is v1.10.1.

You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.


The factorization recommender tries to minimize the function:
\begin{equation}
\min_{\mathbf{w},\mathbf{a},\mathbf{b},\mathbf{U},\mathbf{V}} \frac{1}{\lvert\mathcal{D}\rvert} \sum_{i,j,r_{i,j} \in \mathcal{D}}{ \mathcal{L}\bigl(score\left(i,j\right),r_{i,j} \bigr) + \lambda_{1}\left(\lVert\mathbf{w}\rVert_{2}^{2} + \Vert\mathbf{a}\Vert_{2}^{2} + \Vert\mathbf{b}\Vert_{2}^{2}\right) + \lambda_{2}\left(\Vert\mathbf{U}\Vert_{2}^{2} + \Vert\mathbf{V}\Vert_{2}^{2}\right)} 
\end{equation}
where:
\begin{equation}
score\left(i,j\right) = \mu + w_i + w_j +\mathbf{a}^{T}\mathbf{x}_{i} + \mathbf{b}^{T}\mathbf{y}_{i} + \mathbf{u}_{i}^{T}\mathbf{v}_{j}
\end{equation}
and
$\mu$ is the overall average rating, $w_i$ is the user bias, $w_j$ is the item bias, $\mathbf{a}$ and $\mathbf{x}$ are the user data, $\mathbf{a}$ and $\mathbf{x}$ are the item data, and $\mathbf{u}$ and $\mathbf{v}$ are the user and item factors

We do a lot with a recommender model. First we can get recommendations for a user:
```python
top_movies = recommender.recommend([1])['movieid']
movies.filter_by(top_movies, 'movieid')
```

We can find similar items to a movie:
```python
inception_id = movies.filter_by('Inception (2010)', 'title')['movieid']
similar_movies = recommender.get_similar_items(inception_id)['similar']
movies.filter_by(similar_movies, 'movieid')
```


In [2]:
def add_path(base, name):
    return base+name



if __name__ == '__main__':
    base_path = "./data/sample-movie-recommender-master/dataset/ml-20m/"
    ratings_path = add_path(base_path, "ratings.csv")
    movies_path = add_path(base_path, "movies.csv")
    ratings = gl.SFrame.read_csv(ratings_path)
    movies = gl.SFrame.read_csv(movies_path)

    



[INFO] graphlab.cython.cy_server: GraphLab Create v1.10.1 started. Logging: /tmp/graphlab_server_1473116662.log


This non-commercial license of GraphLab Create for academic use is assigned to cullywest@gmail.com and will expire on June 24, 2017.


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,float,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


In [9]:
movies.head()
movies[0]
movies[movies['movieId'] < 10]
movies[movies['movieId'] < 10].shape

(9, 3)

In [10]:
df = movies.to_dataframe()
df.info()

In [17]:
sf = gl.SFrame(df)
sf.head()

movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Child ren|Comedy|Fantasy ...
2,Jumanji (1995),Adventure|Children|Fantas y ...
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995) ...,Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
10,GoldenEye (1995),Action|Adventure|Thriller


In [25]:
train, test = gl.recommender.util.random_split_by_user(ratings, 'userId', 'movieId', max_num_users=1000, item_test_proportion=0.2 )
#or train, test = ratings.random_split(0.8, seed=10)

In [26]:
train.shape

(19972071, 4)

In [58]:
recommender = gl.recommender.factorization_recommender.create(train, 'userId', 'movieId', 'rating', max_iterations=5)

In [59]:
top_movies = recommender.recommend([1])['movieId']
movies.filter_by(top_movies, 'movieId')

movieId,title,genres
110,Braveheart (1995),Action|Drama|War
356,Forrest Gump (1994),Comedy|Drama|Romance|War
527,Schindler's List (1993),Drama|War
665,Underground (1995),Comedy|Drama|War
2324,Life Is Beautiful (La Vita è bella) (1997) ...,Comedy|Drama|Romance|War
2571,"Matrix, The (1999)",Action|Sci-Fi|Thriller
3147,"Green Mile, The (1999)",Crime|Drama
3578,Gladiator (2000),Action|Adventure|Drama
5008,Witness for the Prosecution (1957) ...,Drama|Mystery|Thriller
58559,"Dark Knight, The (2008)",Action|Crime|Drama|IMAX


In [60]:
#recommender_with_side_data = gl.factorization_recommender.create(train, 'userId', 'movieId', 'rating', item_data=movies, max_iterations=5)
recommender_with_side_data = gl.factorization_recommender.create(test, 'userId', 'movieId', 'rating', item_data=movies, max_iterations=5)



In [61]:
top_movies = recommender_with_side_data.recommend([1])['movieId']
movies.filter_by(top_movies, 'movieId')

movieId,title,genres
26978,Kiss or Kill (1997),Crime|Drama|Thriller
27081,Claire Dolan (1998),Drama
53651,Once You're Born You Can No Longer Hide (Quando ...,Adventure|Drama
96631,Deathstalker II (1987),Action|Adventure|Comedy|F antasy ...
102541,As You Like It (1978),Comedy
102852,With Great Power: The Stan Lee Story (2012) ...,Documentary
108132,Rhino Season (Fasle kargadan) (2012) ...,Drama|Fantasy|Film- Noir|Mystery|Romance|War ...
112942,Sky Murder (1940),Action|Adventure|Crime|My stery|Thriller ...
116169,Reign of Assassins (2010),Action
117924,The Bloody Olive (1997),Comedy|Crime|Film-Noir


In [62]:
inception_id = movies.filter_by('Inception (2010)', 'title')['movieId']
similar_movies = recommender.get_similar_items(inception_id)['similar']
movies.filter_by(similar_movies, 'movieId')

movieId,title,genres
4011,Snatch (2000),Comedy|Crime|Thriller
4678,UHF (1989),Comedy
7323,"Good bye, Lenin! (2003)",Comedy|Drama
8914,Primer (2004),Drama|Sci-Fi
46347,Metal: A Headbanger's Journey (2005) ...,Documentary
52319,Inglorious Bastards (Quel maledetto treno blind ...,Action|Adventure|Drama|Wa r ...
53123,Once (2006),Drama|Musical|Romance
55814,"Diving Bell and the Butterfly, The ...",Drama
78574,Winter's Bone (2010),Drama|Thriller
89864,50/50 (2011),Comedy|Drama


In [63]:
similar_movies = recommender_with_side_data.get_similar_items(inception_id)['similar']
movies.filter_by(similar_movies, 'movieId')

movieId,title,genres
49272,Casino Royale (2006),Action|Adventure|Thriller
54736,"Kingdom, The (2007)",Action|Drama|Thriller
55765,American Gangster (2007),Crime|Drama|Thriller
58559,"Dark Knight, The (2008)",Action|Crime|Drama|IMAX
80906,Inside Job (2010),Documentary
91529,"Dark Knight Rises, The (2012) ...",Action|Adventure|Crime|IM AX ...
91658,"Girl with the Dragon Tattoo, The (2011) ...",Drama|Thriller
96610,Looper (2012),Action|Crime|Sci-Fi
106100,Dallas Buyers Club (2013),Drama
109487,Interstellar (2014),Sci-Fi|IMAX


In [64]:
similar_users = recommender.get_similar_users([1])['similar']

In [65]:
users = ratings.groupby(key_columns='userId', 
                        operations={'avg_rating':gl.aggregate.AVG('rating'), 'count':gl.aggregate.COUNT()})

In [66]:
users.head()

userId,count,avg_rating
21855,22,4.36363636364
88004,34,3.70588235294
79732,24,3.5
63664,43,3.76744186047
127950,78,3.80128205128
7899,730,3.57191780822
25263,22,3.95454545455
130872,75,3.84
87629,38,3.86842105263
30621,247,3.53441295547


In [67]:
users.filter_by(similar_users, 'userId')

userId,count,avg_rating
61477,56,3.64285714286
13007,157,3.1847133758
67224,852,3.6220657277
39687,73,3.87671232877
121576,342,4.04970760234
116668,40,3.4
13030,115,3.83043478261
95383,20,3.375
74688,250,3.72
56394,562,3.6165480427


In [54]:
users.shape


(138493, 3)

In [56]:
users.append(gl.SFrame({'userId':[100000000], 'count':[2], 'avg_rating':[1.1]}))
users.shape

(138493, 3)

In [69]:
ratings.show()

Canvas is accessible via web browser at the URL: http://localhost:46754/index.html
Opening Canvas in default web browser.
