# Try Recommender System in GraphLab Create

## Small Example

In [1]:
import pandas as pd
import graphlab as gl

In [2]:
df = pd.DataFrame(
    data=[
        (1, 'a', 1),
        (1, 'd', 1),
        (1, 'b', 5),
        (2, 'a', 2),
        (2, 'b', 5),
        (2, 'e', 2),
        (3, 'b', 1),
        (3, 'c', 2),
        (3, 'e', 5),
        (3, 'd', 5),
        (4, 'a', 4),
        (4, 'd', 5),
        (4, 'c', 2),
    ],
    columns=['user_id', 'item_id', 'score'])
pdf = df.pivot('user_id', 'item_id')
pdf

Unnamed: 0_level_0,score,score,score,score,score
item_id,a,b,c,d,e
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
1,1.0,5.0,,1.0,
2,2.0,5.0,,,2.0
3,,1.0,2.0,5.0,5.0
4,4.0,,2.0,5.0,


In [3]:
sf = gl.SFrame(df)
model = gl.recommender.create(sf,
          user_id='user_id',
          item_id='item_id',
          target='score')
recommendations = model.recommend()
print recommendations

[INFO] This trial license of GraphLab Create is assigned to e-hkosc15@hupili.net and will expire on July 26, 2015. Please contact trial@dato.com for licensing options or to request a free non-commercial license for personal or academic use.

[INFO] Start server at: ipc:///tmp/graphlab_server-56186 - Server binary: /Users/hupili/Desktop/hkosc2015-workshop/graphlab-venv/dato-env/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1435376583.log
[INFO] GraphLab Server Version: 1.4.1


PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 13 observations with 4 users and 5 items.
PROGRESS:     Data prepared in: 0.003097s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | sgd      |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coeff

In [4]:
pd.concat([df, recommendations[['user_id', 'item_id', 'score']].to_dataframe()]).pivot('user_id', 'item_id')

Unnamed: 0_level_0,score,score,score,score,score
item_id,a,b,c,d,e
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
1,1.0,5.0,-0.815725,1.0,0.870254
2,2.0,5.0,-0.073497,1.505843,2.0
3,3.283033,1.0,2.0,5.0,5.0
4,4.0,0.607049,2.0,5.0,3.833822


## Plug and Play with other model/algorithms

```
sf = gl.SFrame(df)
model = gl.recommender.create(sf,
          user_id='user_id',
          item_id='item_id',
          target='score')
recommendations = model.recommend()
print recommendations
```

In [5]:
def try_model(model_module):
    sf = gl.SFrame(df)
    model = model_module.create(sf,
              user_id='user_id',
              item_id='item_id',
              target='score')
    recommendations = model.recommend()
    print recommendations

In [6]:
try_model(gl.recommender.item_similarity_recommender)

PROGRESS: Recsys training: model = item_similarity
PROGRESS: Preparing data set.
PROGRESS:     Data has 13 observations with 4 users and 5 items.
PROGRESS:     Data prepared in: 0.00452s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 5 items:
PROGRESS: Finished training in 0.000794s
PROGRESS: Finished prediction in 0.000802s
+---------+---------+---------------+------+
| user_id | item_id |     score     | rank |
+---------+---------+---------------+------+
|    1    |    e    | 3.28571428571 |  1   |
|    1    |    c    | 1.85714285714 |  2   |
|    2    |    d    |      3.2      |  1   |
|    2    |    c    |      2.9      |  2   |
|    3    |    a    | 3.16666666667 |  1   |
|    4    |    b    |      4.0      |  1   |
|    4    |    e    |      3.5      |  2   |
+---------+---------+---------------+------+
[7 rows x 4 columns]



In [7]:
try_model(gl.recommender.popularity_recommender)

PROGRESS: Recsys training: model = popularity
PROGRESS: Preparing data set.
PROGRESS:     Data has 13 observations with 4 users and 5 items.
PROGRESS:     Data prepared in: 0.004618s
PROGRESS: 13 observations to process; with 5 unique items.
PROGRESS: 13 observations processed.
PROGRESS: Number observations / second: 151163
PROGRESS: Computing nearest neighbors model for item similarity queries.
PROGRESS: +------------+--------------+
PROGRESS: | Tree level | Elapsed Time |
PROGRESS: +------------+--------------+
PROGRESS: | 0          | 286us        |
PROGRESS: +------------+--------------+
+---------+---------+---------------+------+
| user_id | item_id |     score     | rank |
+---------+---------+---------------+------+
|    1    |    e    |      3.5      |  1   |
|    1    |    c    |      2.0      |  2   |
|    2    |    d    | 3.66666666667 |  1   |
|    2    |    c    |      2.0      |  2   |
|    3    |    a    | 2.33333333333 |  1   |
|    4    |    b    | 3.66666666667 |  1 

In [8]:
try_model(gl.recommender.ranking_factorization_recommender)

PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 13 observations with 4 users and 5 items.
PROGRESS:     Data prepared in: 0.004671s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | sgd      |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coeff

## A Real Dataset

2.7MB data: http://s3.amazonaws.com/GraphLab-Datasets/movie_ratings/training_data.csv 

In [9]:
data = gl.SFrame.read_csv("http://s3.amazonaws.com/GraphLab-Datasets/movie_ratings/training_data.csv", column_type_hints={"rating":int})
data.head()

PROGRESS: Downloading http://s3.amazonaws.com/GraphLab-Datasets/movie_ratings/training_data.csv to /var/tmp/graphlab-hupili/56186/000000.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/GraphLab-Datasets/movie_ratings/training_data.csv
PROGRESS: Parsing completed. Parsed 82068 lines in 0.068877 secs.


user,movie,rating
Jacob Smith,Flirting with Disaster,4
Jacob Smith,Indecent Proposal,3
Jacob Smith,Runaway Bride,2
Jacob Smith,Swiss Family Robinson,1
Jacob Smith,The Mexican,2
Jacob Smith,Maid in Manhattan,4
Jacob Smith,A Charlie Brown Thanksgiving / The ...,3
Jacob Smith,Brazil,1
Jacob Smith,Forrest Gump,3
Jacob Smith,It Happened One Night,4


In [10]:
data_df = data.to_dataframe()

In [22]:
all_users = data_df['user'].unique()
all_movies = data_df['movie'].unique()
print 'number of users', len(all_users)
print 'number of movies', len(all_movies)

number of users 334
number of movies 7714


In [12]:
data_df['user'].value_counts()

Zion Smith         2024
Ivan Smith         1147
Beau Smith         1057
Damian Smith        908
Gabriel Smith       843
Paul Smith          823
Brady Smith         769
Cristian Smith      762
Adam Smith          746
Edwin Smith         734
Bryson Smith        732
Richard Smith       732
Shawn Smith         730
Asher Smith         716
Kameron Smith       701
Camden Smith        672
Aden Smith          670
August Smith        668
Jayce Smith         654
Bryce Smith         638
Aaron Smith         630
Daniel Smith        627
Ryder Smith         625
Mason Smith         623
Alejandro Smith     606
William Smith       599
Javier Smith        597
Charles Smith       591
Drew Smith          588
Jaden Smith         586
                   ... 
Sergio Smith         48
Kaleb Smith          48
Ashton Smith         48
Kaden Smith          47
Emmanuel Smith       47
Roberto Smith        44
Johnathan Smith      42
Graham Smith         41
Simon Smith          40
Santiago Smith       40
Zayden Smith    

In [13]:
data_df['movie'].value_counts()

Something's Gotta Give                                             252
Pirates of the Caribbean: The Curse of the Black Pearl             170
Pretty Woman                                                       168
How to Lose a Guy in 10 Days                                       166
The Godfather                                                      162
Miss Congeniality                                                  161
Ocean's Eleven                                                     159
Lost in Translation                                                157
The Bourne Identity                                                156
Two Weeks Notice                                                   156
Bruce Almighty                                                     155
The Italian Job                                                    153
Sweet Home Alabama                                                 153
Catch Me If You Can                                                150
Mystic

In [14]:
model = gl.recommender.create(data,
          user_id='user',
          item_id='movie',
          target='rating')

PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 82068 observations with 334 users and 7714 items.
PROGRESS:     Data prepared in: 0.131875s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | sgd      |
PROGRESS: | linear_regularization          | L2 Regularization on Line

In [15]:
model.recommend(k=5)

user,movie,score,rank
Jacob Smith,Shall We Dance?,5.141178594,1
Jacob Smith,Love Actually,4.76000283754,2
Jacob Smith,The Red Violin,4.68698619402,3
Jacob Smith,Bridget Jones's Diary,4.55182908571,4
Jacob Smith,Shine,4.45206521547,5
Mason Smith,Mulholland Drive,5.8094079355,1
Mason Smith,Love Actually,5.56564925707,2
Mason Smith,Phone Booth,5.55061172045,3
Mason Smith,Deliverance,5.2154667238,4
Mason Smith,Lawrence of Arabia,5.13001082933,5


In [16]:
model.recommend(users=['Mason Smith'], k=2)

user,movie,score,rank
Mason Smith,Mulholland Drive,5.8094079355,1
Mason Smith,Love Actually,5.56564925707,2


In [18]:
model.recommend(users=['Mason Smith'], items=['The Hours'])

user,movie,score,rank
Mason Smith,The Hours,2.82095609701,1


In [20]:
model.recommend(users=['William Smith'], items=['The Corner'])

user,movie,score,rank
William Smith,The Corner,0.93975679672,1


In [27]:
# If you get a new movie, who should you recommend to?
model.recommend(items=['The Corner']).sort('score', ascending=False)

user,movie,score,rank
Jeffrey Smith,The Corner,2.96076010263,1
Reid Smith,The Corner,2.51614062584,1
Brennan Smith,The Corner,2.39484571016,1
Kevin Smith,The Corner,2.28503529823,1
Thomas Smith,The Corner,2.24449376858,1
Trevor Smith,The Corner,2.21904902256,1
Xander Smith,The Corner,2.21528212822,1
Paxton Smith,The Corner,2.20255379474,1
Brandon Smith,The Corner,2.16120879448,1
Micah Smith,The Corner,2.08845846451,1
