## Building and Evaluating Deep Learning Based Book Recommendation System

**Event : Strata Conference , San Francisco, 2019**  

In this notebook, we will build and evaluate deep learning based book recommendation system.

### Envionrment Setup

#### Installing Required Packages

In [1]:
!pip install pandas --user

[33mYou are using pip version 18.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


#### Restart Kernel

In [None]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

#### Import Libraries

In [1]:
# utitlity packages
import os
import warnings
from datetime import datetime
import shutil

# data processing and visualization packages
import numpy as np
import pandas as pd


# tensorflow packages
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Flatten, Dot, Dense, Concatenate
from tensorflow.keras.models import Model

# ignore warnings 
warnings.filterwarnings('ignore')

In [2]:
# tensorflow version 
print(tf.__version__)

1.12.0


## Import data

In [3]:
# rating dataset
rating_dataset = pd.read_csv("data/ratings.csv")

In [4]:
# explore head
rating_dataset.head()

Unnamed: 0,book_id,user_id,rating
0,1,314,5
1,1,439,3
2,1,588,5
3,1,1169,4
4,1,1185,4


In [5]:
print("Number of ratings record : ", len(rating_dataset))
# number of users and books
n_users = len(rating_dataset.user_id.unique())
n_items = len(rating_dataset.book_id.unique())
print("Number of unique users : ", n_users)
print("Number of unique items : ", n_items)

('Number of ratings record : ', 981756)
('Number of unique users : ', 53424)
('Number of unique items : ', 10000)


In [6]:
# book metadata 
book_dataset = pd.read_csv("data/books.csv")

In [7]:
book_dataset.head()

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


In [8]:
book_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 23 columns):
id                           10000 non-null int64
book_id                      10000 non-null int64
best_book_id                 10000 non-null int64
work_id                      10000 non-null int64
books_count                  10000 non-null int64
isbn                         9300 non-null object
isbn13                       9415 non-null float64
authors                      10000 non-null object
original_publication_year    9979 non-null float64
original_title               9415 non-null object
title                        10000 non-null object
language_code                8916 non-null object
average_rating               10000 non-null float64
ratings_count                10000 non-null int64
work_ratings_count           10000 non-null int64
work_text_reviews_count      10000 non-null int64
ratings_1                    10000 non-null int64
ratings_2                    10000 n

#### merged dataset

In [9]:
dataset = pd.merge(rating_dataset, book_dataset, how='left',left_on='book_id', right_on='id')

In [10]:
dataset.head()

Unnamed: 0,book_id_x,user_id,rating,id,book_id_y,best_book_id,work_id,books_count,isbn,isbn13,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,314,5,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,1,439,3,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
2,1,588,5,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
3,1,1169,4,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
4,1,1185,4,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...


In [11]:
n_users = 53424
n_items = 10000

## Simple Matrix Factorization Based Model



### Explicit feedback: supervised ratings prediction

For each pair of (user, item) try to predict the rating the user would give to the item.

This is the classical setup for building recommender systems from offline data with explicit supervision signal. 


 ### Predictive ratings  as a regression problem

The following code implements the following architecture:

<img src="images/01_matrix_factorization.png" style="width: 600px;" />

#### Helper Functions

In [12]:
def parser(item_id, user_id, rating):
    """
    parsing each row
    """
    x = {
        'User-Input': user_id,
        'Item-Input': item_id
     }
    
    y = rating
    return x,y    


def train_input_fn(csv_path, batch_size=1024, buffer_size=1024):
    """
    train input function 
    """
    dataset = (
        tf.data.experimental.CsvDataset(
            filenames=csv_path,
            record_defaults=[tf.int32, tf.int32, tf.int32],
            select_cols=[0, 1, 2],
            field_delim=",",
            header=True)
        .map(parser)
        .shuffle(buffer_size=buffer_size)
        .batch(batch_size)
        .prefetch(batch_size)
    )
    iterator = dataset.make_one_shot_iterator()
    batch_feats, batch_labels = iterator.get_next()
    return batch_feats, batch_labels

def eval_input_fn(csv_path, batch_size=1000):
    """
    eval input function
    """
    dataset = (
        tf.data.experimental.CsvDataset(
            filenames=csv_path,
            record_defaults=[tf.int32, tf.int32, tf.int32],
            select_cols=[0, 1, 2],
            field_delim=",",
            header=True)
        .map(parser)
        .batch(batch_size)
    )
    iterator = dataset.make_one_shot_iterator()
    batch_feats, batch_labels = iterator.get_next()
    return batch_feats, batch_labels


#### Model Estimator ( Simple Model )

In [13]:
def get_estimator(tf_embedding_size, tf_model_dir):
    
    # creating book embedding path
    item_input = Input(shape=[1], name="Item-Input")
    item_embedding = Embedding(n_items+1, tf_embedding_size, name="Item-Embedding")(item_input)
    item_vec = Flatten(name="Flatten-Items")(item_embedding)

    # creating user embedding path
    user_input = Input(shape=[1], name="User-Input")
    user_embedding = Embedding(n_users+1, tf_embedding_size, name="User-Embedding")(user_input)
    user_vec = Flatten(name="Flatten-Users")(user_embedding)

    # performing dot product and creating model
    prod = Dot(name="Dot-Product", axes=1)([item_vec, user_vec])
    model = Model([user_input, item_input], prod)
    model.compile('adam', 'mean_squared_error')
    model.summary()
    return tf.keras.estimator.model_to_estimator(keras_model=model,model_dir=tf_model_dir)

#### prepare data and model

In [14]:
# settings
tf_model_dir = "/tmp/model_1/"
tf_data_dir = "data/ratings.csv"
tf_batch_size = 1024
tf_train_steps = 200
tf_embedding_size = 10

# train and eval spec
train_spec = tf.estimator.TrainSpec(input_fn = lambda: train_input_fn(tf_data_dir, batch_size=tf_batch_size, buffer_size=tf_batch_size), max_steps=tf_train_steps)
eval_spec = tf.estimator.EvalSpec(input_fn = lambda: eval_input_fn(tf_data_dir, batch_size=tf_batch_size) ,steps=1,throttle_secs=1,
                                      start_delay_secs=1 )

# model 
estimator = get_estimator(tf_embedding_size, tf_model_dir)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Item-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
User-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
Item-Embedding (Embedding)      (None, 1, 10)        100010      Item-Input[0][0]                 
__________________________________________________________________________________________________
User-Embedding (Embedding)      (None, 1, 10)        534250      User-Input[0][0]                 
__________________________________________________________________________________________________
Flatten-It

#### Train Estimator

In [15]:
print("Train and evaluate")
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
print("Training done")

Train and evaluate
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/model_1/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/model_1/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: training/Adam/Variable; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/decay; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: training/Adam/Variable_4; prev_var_name: Unchanged
INFO:tensorfl

## A Deep recommender model

We can use deep learning models with multiple layers ( fully connected and dropout ) for the recommendation system.

<img src="images/02_deep_recsys.png" style="width: 600px;" />



In [16]:
def get_estimator(tf_embedding_size, tf_model_dir):
    # creating book embedding path
    item_input = Input(shape=[1], name="Item-Input")
    item_embedding = Embedding(n_items+1, tf_embedding_size, name="Item-Embedding")(item_input)
    item_vec = Flatten(name="Flatten-Items")(item_embedding)

    # creating user embedding path
    user_input = Input(shape=[1], name="User-Input")
    user_embedding = Embedding(n_users+1, tf_embedding_size, name="User-Embedding")(user_input)
    user_vec = Flatten(name="Flatten-Users")(user_embedding)

    # concatenate features
    conc = Concatenate()([item_vec, user_vec])

    # add fully-connected-layers
    fc1 = Dense(128, activation='relu')(conc)
    fc2 = Dense(32, activation='relu')(fc1)
    out = Dense(1)(fc2)

    # Create model and compile it
    model = Model([user_input, item_input], out)
    model.compile('adam', 'mean_squared_error')
    model.summary()
    return tf.keras.estimator.model_to_estimator(keras_model=model,model_dir=tf_model_dir)

#### Prepare data and model

In [17]:
# settings
tf_model_dir = "/tmp/model_2/"
tf_data_dir = "data/ratings.csv"
tf_batch_size = 1024
tf_train_steps = 200
tf_embedding_size = 10

# train and eval spec
train_spec = tf.estimator.TrainSpec(input_fn = lambda: train_input_fn(tf_data_dir, batch_size=tf_batch_size, buffer_size=tf_batch_size), max_steps=tf_train_steps)
eval_spec = tf.estimator.EvalSpec(input_fn = lambda: eval_input_fn(tf_data_dir, batch_size=tf_batch_size) ,steps=1,throttle_secs=1,
                                      start_delay_secs=1 )

# model 
estimator = get_estimator(tf_embedding_size, tf_model_dir)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Item-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
User-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
Item-Embedding (Embedding)      (None, 1, 10)        100010      Item-Input[0][0]                 
__________________________________________________________________________________________________
User-Embedding (Embedding)      (None, 1, 10)        534250      User-Input[0][0]                 
__________________________________________________________________________________________________
Flatten-It

#### Train Estimator

In [18]:
print("Train and evaluate")
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
print("Training done")

Train and evaluate
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/model_2/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/model_2/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: Adam/decay; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: User-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: training/Adam/Variable; prev_var_name: Unchanged
INFO:tensorf

### Exporting Model

In [19]:
# setup feature specification for serving
tf_export_dir = '/tmp/export/'
feature_spec = {
    'User-Input' : tf.FixedLenFeature(shape=[1], dtype=np.float32),
    'Item-Input' : tf.FixedLenFeature(shape=[1], dtype=np.float32)
}
print("Export saved model")
serving_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
export_dir = estimator.export_savedmodel(tf_export_dir, 
                               serving_input_receiver_fn=serving_fn)

print("Done exporting the model")

Export saved model
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from /tmp/model_2/model.ckpt-200
Instructions for updating:
Pass your op to the equivalent parameter main_op instead.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: /tmp/export/temp-1553621841/saved_model.pb
Done exporting the model


In [20]:
!ls /tmp/export/

1553621841


##### Inspect Model

In [21]:
!saved_model_cli show --dir /tmp/export/* --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['examples'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_example_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_2'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: dense_2/BiasAdd:0
  Method name is: tensorflow/serving/predict


### Making Predictions

In [22]:
predict_fn = tf.contrib.predictor.from_saved_model("/tmp/export/1553621841")

INFO:tensorflow:Restoring parameters from /tmp/export/1553621841/variables/variables


In [23]:
# creating data for prediction

# all items
item_data = np.array(list(set(dataset.id)))

# we need to create user data of the same shape
user_to_predict = 1  # User ID 
user_data = np.array([user_to_predict for i in range(len(item_data))]) # repeat user ID 

In [24]:
# Test inputs represented by Pandas DataFrame.
inputs = pd.DataFrame({
    'User-Input': user_data,
    'Item-Input': item_data
})

inputs.head()


Unnamed: 0,Item-Input,User-Input
0,1,1
1,2,1
2,3,1
3,4,1
4,5,1


In [25]:
# Convert input data into serialized Example strings.
examples = []
for index, row in inputs.iterrows():
    feature = {}
    for col, value in row.iteritems():
        feature[col] = tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    example = tf.train.Example(
        features=tf.train.Features(
            feature=feature
        )
    )
    examples.append(example.SerializeToString())
    
predictions = predict_fn({'examples': examples})



In [26]:
pred = predict_fn({'examples': examples})
pred = pred['dense_2'].flatten() # output name
print(-np.sort(-pred)[:10])
# top 10 items 
recommended_item_ids = (-pred).argsort()[:10]
print(recommended_item_ids)

[3.7116911 3.7008858 3.700807  3.6999967 3.6968253 3.69587   3.6929643
 3.6915388 3.6907532 3.6906826]
[1044 1625  300  366  775  986  578 1611  523 1624]


In [27]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 981756 entries, 0 to 981755
Data columns (total 26 columns):
book_id_x                    981756 non-null int64
user_id                      981756 non-null int64
rating                       981756 non-null int64
id                           981756 non-null int64
book_id_y                    981756 non-null int64
best_book_id                 981756 non-null int64
work_id                      981756 non-null int64
books_count                  981756 non-null int64
isbn                         914534 non-null object
isbn13                       925621 non-null float64
authors                      981756 non-null object
original_publication_year    979774 non-null float64
original_title               926219 non-null object
title                        981756 non-null object
language_code                876635 non-null object
average_rating               981756 non-null float64
ratings_count                981756 non-null int64
work_rating

#### Books Rated By User

In [30]:
dataset[dataset.user_id == user_to_predict][["original_title","small_image_url","rating"]]

Unnamed: 0,original_title,small_image_url,rating
117889,The Forty Rules of Love,https://s.gr-assets.com/assets/nophoto/book/50...,4
488112,Brunelleschi's Dome: How a Renaissance Genius ...,https://images.gr-assets.com/books/1309288056s...,3
625717,Born on a Blue Day: Inside the Extraordinary M...,https://s.gr-assets.com/assets/nophoto/book/50...,4


### Books Recommended

In [31]:
book_dataset[book_dataset['id'].isin(recommended_item_ids)][["original_title","small_image_url"]]

Unnamed: 0,original_title,small_image_url
299,The Boy in the Striped Pyjamas,https://images.gr-assets.com/books/1366228171s...
365,John Adams,https://images.gr-assets.com/books/1478144278s...
522,The Things They Carried,https://images.gr-assets.com/books/1424663847s...
577,Christine,https://images.gr-assets.com/books/1327270815s...
774,Just Kids,https://images.gr-assets.com/books/1259762407s...
985,The Girl Who Loved Tom Gordon,https://s.gr-assets.com/assets/nophoto/book/50...
1043,Why Not Me?,https://images.gr-assets.com/books/1442548684s...
1610,Rise of the Evening Star,https://images.gr-assets.com/books/1386633982s...
1623,La Nausée,https://images.gr-assets.com/books/1377674928s...
1624,,https://s.gr-assets.com/assets/nophoto/book/50...
