## **Book Recommendation with a Target**

In this example, instead of using movie data, we use book ratings. This is essentially the same problem but the reason we use this example is because we will add a target to the recommender engine. This increases run-time as it creates a more complex model. 

In [1]:
import pandas as pd
import turicreate as tc
from skafossdk import *
from s3fs.core import S3FileSystem

  args, varargs, keywords, defaults = inspect.getargspec(method)


In [2]:
ska = Skafos() # set up Skafos

2018-11-26 16:15:40,919 - skafossdk.data_engine - INFO - Connecting to DataEngine
2018-11-26 16:15:40,952 - skafossdk.data_engine - INFO - DataEngine Connection Opened


## **Get the Data** 
We have uploaded the data to a public S3 pucket but the original data can be found [here](http://www2.informatik.uni-freiburg.de/~cziegler/BX/)

In [3]:
s3 = S3FileSystem(anon=True)
file = s3.open("s3://skafos.example.data/BX-Book-Ratings.csv", "r", encoding = 'latin1', errors='replace')

In [4]:
data = pd.read_csv(file, sep =";", error_bad_lines=False)

In [5]:
data.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


## **Prep the data**
We convert the book-rating column to an integer value, convert the data to an SFrame (Turi Create's dataframe data structure) and split the training and validation data.

In [6]:
# convert the book rating column to an integer
data['Book-Rating'] = data['Book-Rating'].astype(int)

In [7]:
# convert to an SFrame
data = tc.SFrame(data)

In [8]:
# split the training and validation sets up
training_data, validation_data = tc.recommender.util.random_split_by_user(data, 'User-ID', 'ISBN')

## **Build the model**
Here we build the model. Note how this example differs from the pre-baked Turi Create example in that we specify a target. For more information about this, checkout the [Turi Create documentation](https://turi.com/learn/userguide/recommender/choosing-a-model.html)

In [9]:

# build the recommender
#model = tc.recommender.create(training_data, 'User-ID', 'ISBN', target = 'Book-Rating')
model = tc.recommender.create(observation_data=training_data, user_id = 'User-ID', item_id='ISBN', target = 'Book-Rating' )

## **Evaluate Results**

In [10]:

# grab the results of the model (careful this takes a while)
# results = model.recommend();
# results

In [11]:
# evaluate the validation data
model.evaluate(validation_data)


Precision and recall summary statistics by cutoff
+--------+-----------------------+-----------------------+
| cutoff |     mean_precision    |      mean_recall      |
+--------+-----------------------+-----------------------+
|   1    |          0.0          |          0.0          |
|   2    |          0.0          |          0.0          |
|   3    |          0.0          |          0.0          |
|   4    |          0.0          |          0.0          |
|   5    | 0.0004889975550122248 |  9.77995110024451e-05 |
|   6    | 0.0004074979625101878 |  9.77995110024451e-05 |
|   7    | 0.0003492839678658751 |  9.77995110024451e-05 |
|   8    | 0.0003056234718826404 |  9.77995110024451e-05 |
|   9    | 0.0008149959250203748 | 0.0027465362673186624 |
|   10   | 0.0009779951100244496 |  0.005191524042379783 |
+--------+-----------------------+-----------------------+
[10 rows x 3 columns]


Overall RMSE: 5.8487388102239635

Per User RMSE (best)
+---------+----------------------+-------+
|

{'precision_recall_by_user': Columns:
 	User-ID	int
 	cutoff	int
 	precision	float
 	recall	float
 	count	int
 
 Rows: 7362
 
 Data:
 +---------+--------+-----------+--------+-------+
 | User-ID | cutoff | precision | recall | count |
 +---------+--------+-----------+--------+-------+
 |    9    |   1    |    0.0    |  0.0   |   1   |
 |    9    |   2    |    0.0    |  0.0   |   1   |
 |    9    |   3    |    0.0    |  0.0   |   1   |
 |    9    |   4    |    0.0    |  0.0   |   1   |
 |    9    |   5    |    0.0    |  0.0   |   1   |
 |    9    |   6    |    0.0    |  0.0   |   1   |
 |    9    |   7    |    0.0    |  0.0   |   1   |
 |    9    |   8    |    0.0    |  0.0   |   1   |
 |    9    |   9    |    0.0    |  0.0   |   1   |
 |    9    |   10   |    0.0    |  0.0   |   1   |
 +---------+--------+-----------+--------+-------+
 [7362 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns