# **TuriCreate Recommender Model**
The following notebook includes an example of data import, model training and validation evaluation.

In [1]:
import turicreate as tc
import time

from skafossdk import *
ska = Skafos() #initialize Skafos

## **The Data**
There are 2 main data inputs for a recommender model:
- **items**: items that we want to recommend to a given user, ex. apples
- **actions**: actions users have taken on items, ex. John bought apples

In our example here, we have movies (items) and ratings by users on those movies(actions).

In [2]:
%%capture 
actions = tc.SFrame.read_csv('https://s3.amazonaws.com/skafos.example.data/MovieLensDataset/ml-20m/ratings.csv'); 
items = tc.SFrame.read_csv('https://s3.amazonaws.com/skafos.example.data/MovieLensDataset/ml-20m/movies.csv');

In [3]:
actions.head(5)

userId,movieId,rating,timestamp
1,2,3.5,1112486027
1,29,3.5,1112484676
1,32,3.5,1112484819
1,47,3.5,1112484727
1,50,3.5,1112484580


In [4]:
items.head(5)

movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Child ren|Comedy|Fantasy ...
2,Jumanji (1995),Adventure|Children|Fantas y ...
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995) ...,Comedy


In [5]:
%%capture 
# split the training and validation sets up
training_data, validation_data = tc.recommender.util.random_split_by_user(actions, 'userId', 'movieId')

In [6]:
%%capture 
# build the recommender
model = tc.recommender.create(training_data, 'userId', 'movieId')

In [7]:
%%capture
# grab the results of the model
results = model.recommend();

In [8]:
# print the validation data
validation_data.print_rows(num_rows=10)

+--------+---------+--------+------------+
| userId | movieId | rating | timestamp  |
+--------+---------+--------+------------+
|  239   |    1    |  5.0   | 1245047671 |
|  239   |   261   |  2.5   | 1245045481 |
|  239   |   586   |  5.0   | 1245047731 |
|  239   |   588   |  5.0   | 1245047291 |
|  239   |   596   |  4.5   | 1245045442 |
|  239   |   783   |  4.0   | 1245049069 |
|  239   |   788   |  4.5   | 1245047974 |
|  239   |   919   |  3.5   | 1245047817 |
|  239   |   1025  |  4.0   | 1245049826 |
|  239   |   1073  |  4.0   | 1245047806 |
+--------+---------+--------+------------+
[30051 rows x 4 columns]



In [9]:
# evaluate the model
model.evaluate(validation_data)

# save the model
ska.engine.save_model(model_name, model_data, tags = ["0.1.0", "latest"], access="private").result()




Precision and recall summary statistics by cutoff
+--------+---------------------+----------------------+
| cutoff |    mean_precision   |     mean_recall      |
+--------+---------------------+----------------------+
|   1    | 0.19299999999999992 | 0.011474344152001532 |
|   2    | 0.19200000000000006 | 0.02151962653993981  |
|   3    | 0.17599999999999985 | 0.03046727939478849  |
|   4    | 0.17249999999999993 | 0.03746688013994392  |
|   5    |  0.1677999999999999 | 0.043714852248053805 |
|   6    |  0.1673333333333332 | 0.05243984787763719  |
|   7    | 0.16457142857142879 | 0.058756643128236904 |
|   8    | 0.16400000000000006 | 0.06613798714477744  |
|   9    | 0.16322222222222213 | 0.07323819649870088  |
|   10   |  0.1629000000000002 | 0.08065950600378116  |
+--------+---------------------+----------------------+
[10 rows x 3 columns]



{'precision_recall_by_user': Columns:
 	userId	int
 	cutoff	int
 	precision	float
 	recall	float
 	count	int
 
 Rows: 18000
 
 Data:
 +--------+--------+--------------------+--------+-------+
 | userId | cutoff |     precision      | recall | count |
 +--------+--------+--------------------+--------+-------+
 |  239   |   1    |        1.0         |  0.02  |   50  |
 |  239   |   2    |        0.5         |  0.02  |   50  |
 |  239   |   3    | 0.3333333333333333 |  0.02  |   50  |
 |  239   |   4    |        0.25        |  0.02  |   50  |
 |  239   |   5    |        0.4         |  0.04  |   50  |
 |  239   |   6    | 0.3333333333333333 |  0.04  |   50  |
 |  239   |   7    | 0.2857142857142857 |  0.04  |   50  |
 |  239   |   8    |        0.25        |  0.04  |   50  |
 |  239   |   9    | 0.3333333333333333 |  0.06  |   50  |
 |  239   |   10   |        0.3         |  0.06  |   50  |
 +--------+--------+--------------------+--------+-------+
 [18000 rows x 5 columns]
 Note: Only the