### Business Understanding

Our selected dataset was a part of a github gist post that contained a collection of datasets for recommendation and ratings.  It can be found here: https://gist.github.com/entaroadun/1653794.  Looking at our dataset it was created with the purpose of being a dataset for collaberative filtering for a Czech dating site.  It is technically two datasets, with one containing a userID, the userID of the person they rated, and the rating itself.  The other dataset contains the gender information for each userID.  For any recommender or collaborative filtering dataset, the purpose is usually to create some model that will accurately recommend items to users based on their scores for other items.  For example Netflix would want a recommender system that gives good suggestions for users to watch next after rating a certain movie.  For dating sites such as the one we obtained our dataset from they are looking for a recommender system that will hopefully lead to giving suggestions to users based on their ratings to lead to matches.  With that in mind it looks like our project is just what the dataset was created for.  

A good algorithm for recommender systems should have high precision and recall with a low RMSE.  For the purpose of our dataset we will be comparing four recommender models: a popularity model, ranked factorization model, item-item matrix model, and a standard factorization matrix model.  We will measure the effectiveness of a good algorithm by comparing the precision/recall and RMSE values for each model, while also running a model comparison to compare the models to one another.  The popularity model will act as a baseline comparison for the other three models.  
 
Recommender systems normally utilize a set of items or users and items, along with optional features as part of a linear model to predict ratings or calculate similarities between items. Our training set contains a user id, another user that was rated by our user id and a rating or score. This is an interesting dilemma, often not common to recommenders for music or movies. In our case, we essentially have created a user-user matrix. There is an obvious relationship between the users doing the rating and the users being rated which may imply an item-item (user-user) recommender be used. We will investigate multiple recommender models to determine which works the best.

For the stakeholder, the Czech dating site, the item-item matrix model should work because it will find recommendations for users based on the ratings that they give to other users.  While there are not many side features that we can include in the model, we assume that users are rating other users based on the information contained in their profile.  If this is true, the scores they give should help with a recommender model based on only their ratings.  The dating site's goals are most likely to present the best possible matches for their users.  Finding a good recommender model that created more successful matches would help achieve their goals.


## Item-Item Matrix Model

The item-item recommender model, also known as item-based collaborative filtering, is an algorithm that compares two items and determines the similarity between them.  This can then be used to recommend item-item or user-item pairs in the future.  In GraphLab Create there are three different similarity measures that can be used in the item similarity recommender: Jaccard, Cosine, and Pearson.  A brief summary of the three are below. 

#### Jaccard Similarity

The Jaccard similarity (default for GraphLab) measures the similarity between two items and is calculated with the following equation:

$$\mbox{JS}(i,j)
= \frac{|U_i \cap U_j|}{|U_i \cup U_j|}$$

It is best used for when you only care whether or not the items have been rated or not as the Jaccard similarity does not take into account the score itself.

#### Cosine Similarity

The Cosine similarity measures the similarity between two items for users that have either rated one or both of the items.  The equation for Cosine similarity is as follows:

$$\mbox{CS}(i,j)
= \frac{\sum_{u\in U_{ij}} r_{ui}r_{uj}}
    {\sqrt{\sum_{u\in U_{i}} r_{ui}^2}
     \sqrt{\sum_{u\in U_{j}} r_{uj}^2}}$$

An issue that can arise from using the Cosine similarity is that the mean and variance in scores/ratings are not taken into account in the calculation.  If there are extremely varying means and/or variances the Cosine similarity metric can become skewed.  This is where the Pearson similarity comes in. 

#### Pearson Similarity 

The Pearson similarity, like the Cosine similarity, measures the simialrity between two items for users having rated both or just one of the items.  It is calculated using the following equation:

$$\mbox{PS}(i,j)
= \frac{\sum_{u\in U_{ij}} (r_{ui} - \bar{r}_i)
                            (r_{uj} - \bar{r}_j)}
    {\sqrt{\sum_{u\in U_{ij}} (r_{ui} - \bar{r}_i)^2}
     \sqrt{\sum_{u\in U_{ij}} (r_{uj} - \bar{r}_j)^2}}$$

Different from the Cosine similarity metric, the Pearson similarity removes the mean and variance from it's calculations.  

Based on our dataset it is most likely that the Cosine and Pearson similarity metrics will work best, although depending on the mean and variance of ratings in the dataset one will be better than the other.  

Information was obtained from the following sources: http://www.cs.carleton.edu/cs_comps/0607/recommend/recommender/itembased.html & https://turi.com/products/create/docs/generated/graphlab.recommender.item_similarity_recommender.ItemSimilarityRecommender.html

In [1]:
import graphlab as gl
import graphlab.aggregate as agg
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
gl.canvas.set_target("ipynb")

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1471485228.log


This non-commercial license of GraphLab Create for academic use is assigned to amatsunami@smu.edu and will expire on August 14, 2017.


In [2]:
ratings = gl.SFrame.read_csv('~/desktop/ratings.dat', header=False, verbose=False)
genders = gl.SFrame.read_csv('~/desktop/gender.dat', header=False, verbose=False)

In [3]:
ratings.rename({'X1':'user_id',
                'X2':'user_rated',
                'X3':'score'})

genders.rename({'X1':'user_id',
                'X2':'gender'})

print ratings, genders

+---------+------------+-------+
| user_id | user_rated | score |
+---------+------------+-------+
|    1    |    133     |   8   |
|    1    |    720     |   6   |
|    1    |    971     |   10  |
|    1    |    1095    |   7   |
|    1    |    1616    |   10  |
|    1    |    1978    |   7   |
|    1    |    2145    |   8   |
|    1    |    2211    |   8   |
|    1    |    3751    |   7   |
|    1    |    4062    |   3   |
+---------+------------+-------+
[17359346 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns. +---------+--------+
| user_id | gender |
+---------+--------+
|    1    |   F    |
|    2    |   F    |
|    3    |   U    |
|    4    |   F    |
|    5    |   F    |
|    6    |   F    |
|    7    |   F    |
|    8    |   M    |
|    9    |   M    |
|    10   |   M    |
+---------+--------+
[220970 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use prin

In [4]:
ratings['user_id'] = ratings['user_id'].astype(str)
ratings['user_rated'] = ratings['user_rated'].astype(str)
genders['user_id'] = genders['user_id'].astype(str)
ratings.show(), genders.show()

(None, None)

In [5]:
# join the rater's gender onto the ratee's gender
complete = ratings.join(genders).join(genders,on={'user_rated':'user_id'})
# rename for simplicity
complete.rename({'gender':'user_gender',
                 'gender.1':'gender_rated'})
# show a snapshot of complete SFrame
complete


user_id,user_rated,score,user_gender,gender_rated
1,133,8,F,M
1,720,6,F,F
1,971,10,F,M
1,1095,7,F,M
1,1616,10,F,M
1,1978,7,F,M
1,2145,8,F,M
1,2211,8,F,M
1,3751,7,F,M
1,4062,3,F,M


In [6]:
# split out male and female ratings
female_ratings = complete[complete['user_gender'] == 'F']['user_id','user_rated','score','gender_rated']
male_ratings = complete[complete['user_gender'] == 'M']['user_id','user_rated','score','gender_rated']


In [7]:
user_genders = genders.copy()
item_genders = genders.rename({'user_id':'user_rated'})
genders.rename({'user_rated':'user_id'})

user_id,gender
1,F
2,F
3,U
4,F
5,F
6,F
7,F
8,M
9,M
10,M


In [8]:
# establish main training and test sets
# only use high rated as test
high_rated = ratings[ratings['score'] >= 7]
low_rated = ratings[ratings['score'] < 7]

train_data_1, test = gl.recommender.util.random_split_by_user(high_rated, 
                                                              user_id='user_id', 
                                                              item_id='user_rated',
                                                              max_num_users=27000,
                                                              item_test_proportion=0.2)
train = train_data_1.append(low_rated)

# now lets set up a validation set from our training data that we can grid search through in our lifetimes
# on more complicated recommenders like ranked matrix factorization recommenders

high_rated_sample = train[train['score'] >= 7]
low_rated_sample = train[train['score'] < 7]


train_sample_1, validation = gl.recommender.util.random_split_by_user(high_rated_sample,
                                                                      user_id='user_id',
                                                                      item_id='user_rated',
                                                                      max_num_users=27000,
                                                                      item_test_proportion=0.2)
train_sample = train_sample_1.append(low_rated_sample)

In [9]:
train_sample_small, _ = train_sample.random_split(0.10)


In [10]:
train_sample_small.join(genders).show()

In [11]:
#the first default item-item recommender
item_item = gl.recommender.item_similarity_recommender.create(train,
                                                             user_id='user_id',
                                                             item_id='user_rated',
                                                             target='score',
                                                             verbose=False)

In [17]:
item_eval = item_item.evaluate(validation, verbose=False)
print 'Overall RMSE: {0:.3}\n\nOverall Precision & Recall by Cutoff:\n{1}'.format(item_eval['rmse_overall'],
                                                                                  item_eval['precision_recall_overall'])

Overall RMSE: 8.87

Overall Precision & Recall by Cutoff:
+--------+-----------+--------+
| cutoff | precision | recall |
+--------+-----------+--------+
|   1    |    0.0    |  0.0   |
|   2    |    0.0    |  0.0   |
|   3    |    0.0    |  0.0   |
|   4    |    0.0    |  0.0   |
|   5    |    0.0    |  0.0   |
|   6    |    0.0    |  0.0   |
|   7    |    0.0    |  0.0   |
|   8    |    0.0    |  0.0   |
|   9    |    0.0    |  0.0   |
|   10   |    0.0    |  0.0   |
+--------+-----------+--------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


It appears that did not work out well at all.  The precision and recall are zero, and RMSE is very high.  Let us look at using the Cosine similarity now.

In [15]:
#the second (cosine) item-item recommender
cosine_item_item = gl.recommender.item_similarity_recommender.create(train,
                                                             user_id='user_id',
                                                             item_id='user_rated',
                                                             target='score',
                                                             similarity_type='cosine',
                                                             verbose=False)

In [16]:
cosine_item_eval = cosine_item_item.evaluate(validation, verbose=False)
print 'Overall RMSE: {0:.3}\n\nOverall Precision & Recall by Cutoff:\n{1}'.format(cosine_item_eval['rmse_overall'],
                                                                                  cosine_item_eval['precision_recall_overall'])

Overall RMSE: 8.71

Overall Precision & Recall by Cutoff:
+--------+-----------+--------+
| cutoff | precision | recall |
+--------+-----------+--------+
|   1    |    0.0    |  0.0   |
|   2    |    0.0    |  0.0   |
|   3    |    0.0    |  0.0   |
|   4    |    0.0    |  0.0   |
|   5    |    0.0    |  0.0   |
|   6    |    0.0    |  0.0   |
|   7    |    0.0    |  0.0   |
|   8    |    0.0    |  0.0   |
|   9    |    0.0    |  0.0   |
|   10   |    0.0    |  0.0   |
+--------+-----------+--------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


This was just as bad as the jaccard model in terms of precision and recall.  A possible explanation for this is that the means and variances are affecting the model.  Let's look at the last similarity metric, Pearson, to see if this improves the model.

In [18]:
#the final (pearson) item-item recommender
pearson_item_item = gl.recommender.item_similarity_recommender.create(train,
                                                             user_id='user_id',
                                                             item_id='user_rated',
                                                             target='score',
                                                             similarity_type='pearson',
                                                             verbose=False)

In [19]:
pearson_item_eval = pearson_item_item.evaluate(validation, verbose=False)
print 'Overall RMSE: {0:.3}\n\nOverall Precision & Recall by Cutoff:\n{1}'.format(pearson_item_eval['rmse_overall'],
                                                                                  pearson_item_eval['precision_recall_overall'])

Overall RMSE: 2.12

Overall Precision & Recall by Cutoff:
+--------+-----------+--------+
| cutoff | precision | recall |
+--------+-----------+--------+
|   1    |    0.0    |  0.0   |
|   2    |    0.0    |  0.0   |
|   3    |    0.0    |  0.0   |
|   4    |    0.0    |  0.0   |
|   5    |    0.0    |  0.0   |
|   6    |    0.0    |  0.0   |
|   7    |    0.0    |  0.0   |
|   8    |    0.0    |  0.0   |
|   9    |    0.0    |  0.0   |
|   10   |    0.0    |  0.0   |
+--------+-----------+--------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


So this is weird.  All three item-item models had a precision and recall of 0.  The difference here though is that the overall RMSE using the Pearson similarity is significantly smalller, at 2.12 points.  Based purely on these results, it looks like the pearson similarity metric for the item similarity recommender works best for our dataset.

In [20]:
#comparison to Cory's, double checking that it was 0.
results = pearson_item_item.evaluate(validation, verbose=False)
agg_results = [agg.AVG('precision'),agg.STD('precision'),agg.AVG('recall'),agg.STD('recall')]
results['precision_recall_by_user'].groupby('cutoff',agg_results).sort('cutoff')

cutoff,Avg of precision,Stdv of precision,Avg of recall,Stdv of recall
1,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0


In [21]:
results_similar = item_item.get_similar_items(k=3)
results_similar.head()

user_rated,similar,score,rank
133,36964,0.199293851852,1
133,26084,0.195980548859,2
133,65679,0.193401873112,3
971,78232,0.0679012537003,1
971,161850,0.066445171833,2
971,116832,0.0657370686531,3
1095,96985,0.208222806454,1
1095,164736,0.177483439445,2
1095,69710,0.148916959763,3
1616,174757,0.111430823803,1


In [25]:
results_similar_cosine = cosine_item_item.get_similar_items(k=3)
results_similar_cosine.head()

user_rated,similar,score,rank
133,26084,0.328164756298,1
133,36964,0.326726615429,2
133,180504,0.316426873207,3
971,8410,0.179827690125,1
971,64018,0.158206284046,2
971,84403,0.157812893391,3
1095,164736,0.271158754826,1
1095,51524,0.248242497444,2
1095,83206,0.24770373106,3
1616,193687,0.269618630409,1


In [24]:
results_similar_pearson = pearson_item_item.get_similar_items(k=3)
results_similar_pearson.head()

user_rated,similar,score,rank
133,56397,0.17421656847,1
133,176744,0.161558866501,2
133,60983,0.152879357338,3
971,176194,0.414553105831,1
971,136386,0.359934151173,2
971,34228,0.353657603264,3
1095,88066,0.16590899229,1
1095,117063,0.162141025066,2
1095,69710,0.146988511086,3
1616,190758,0.110662937164,1


Getting similar items based on the three similarity metrics appears to show no significant difference between them.  For the users rated that were selected, the similar users selected were not very similar in score.  