## **Book Recommendation Engine Example**

In this example, instead of using movie data, we use book ratings. At its core, it is the same problem. However, in this example we will include user ratings of books in the training of the recommendation model. This increases train-time as it creates a more complex model that tends to be more accurate.

In [None]:
# Import necessary libraries
import turicreate as tc
from skafossdk import *

In [None]:
ska = Skafos() # initialize Skafos

## **Load the data** 
We stored the data used for this example in a public S3 bucket. However, you can find the raw data [here](http://www2.informatik.uni-freiburg.de/~cziegler/BX/).

In [None]:
# Load the data from S3 bucket
s3_path = 'skafos.example.data/Recommender/BX-Book-Ratings.csv'

# Convert to SFrame
data = tc.SFrame.read_csv(
    url='https://s3.amazonaws.com/' + s3_path,
    delimiter=';',
    error_bad_lines=False,
    verbose=False
)

In [None]:
# split the training and validation sets up
train_data, test_data = tc.recommender.util.random_split_by_user(
    dataset=data,
    user_id='User-ID',
    item_id='ISBN'
)

In [None]:
# Take a look at the training data (notice the ratings this time)
train_data.head(5)

## **Train the model**
Here we build the model. Note how this example differs from the pre-baked Turi Create example in that we specify a target. This makes the model an explicit recommendation engine and means it will likely use a ranking factorixation algorithm which is more powerful. For more information about this, checkout the [Turi Create documentation](https://turi.com/learn/userguide/recommender/choosing-a-model.html)

In [None]:
# Train the recommender - note the target variable that makes this an explicit recommender
model = tc.recommender.create(
    observation_data=train_data,
    user_id='User-ID',
    item_id='ISBN',
    target='Book-Rating'
)

## **Model Evaluation**

In [None]:
# Calculate the average prediction error for each user-item pair in the test set
## The RMSE represents the error between a users actual rating of a book and the model's prediction
## Lower RMSE is best
results = model.evaluate_rmse(
    dataset=test_data,
    target='Book-Rating'
)

In [None]:
# RMSE per user, sorted from lowest to highest (best to worst)
results['rmse_by_user'].sort(key_column_names='rmse')

In [None]:
# Average error overall
results['rmse_overall']