## Building and Evaluating Deep Learning Based Book Recommendation System

**Author : Abhishek Kumar**  
**Event : Strata Conference , San Francisco, 2019**  

In this notebook, we will build and evaluate deep learning based book recommendation system.

### Envionrment Setup

Base enviornment for running this notebook is **gcr.io/kubeflow-images-public/tensorflow-1.12.0-notebook-cpu@sha256:cbbe925d2985bcf9f14a36ae9468c03512e258dbc2e714356c1845980a268a0f**. The notebook can be run using docker easily

```
docker run -it --rm -p 8888:8888 -v "$PWD":/home/jovyan/work gcr.io/kubeflow-images-public/tensorflow-1.12.0-notebook-cpu@sha256:cbbe925d2985bcf9f14a36ae9468c03512e258dbc2e714356c1845980a268a0f
```

#### Installing Required Packages

In [1]:
!pip3 install scikit-learn matplotlib pandas --user

[33mThe directory '/home/jovyan/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.[0m
[33mThe directory '/home/jovyan/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.[0m
Collecting scikit-learn
[?25l  Downloading https://files.pythonhosted.org/packages/5e/82/c0de5839d613b82bddd088599ac0bbfbbbcbd8ca470680658352d2c435bd/scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4MB)
[K    100% |████████████████████████████████| 5.4MB 1.3MB/s ta 0:00:011  8% |██▊                             | 460kB 876kB/s eta 0:00:06    32% |██████████▍                     | 1.8MB 1.7MB/s eta 0:00:03    38% |████████████▏                   | 2.0MB 687kB/s eta 0:00:05    

#### Restart Kernel

In [None]:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

#### Import Libraries

In [1]:
# utitlity packages
import os
import warnings
from datetime import datetime
import shutil

# data processing and visualization packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# tensorflow packages
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Flatten, Dot, Dense, Concatenate
from tensorflow.keras.models import Model

# ignore warnings 
warnings.filterwarnings('ignore')
%matplotlib inline

## Import data

In [2]:
# rating dataset
rating_dataset = pd.read_csv("data/ratings.csv")

In [3]:
# explore head
rating_dataset.head()

Unnamed: 0,book_id,user_id,rating
0,1,314,5
1,1,439,3
2,1,588,5
3,1,1169,4
4,1,1185,4


In [4]:
print("Number of ratings record : ", len(rating_dataset))
# number of users and books
n_users = len(rating_dataset.user_id.unique())
n_items = len(rating_dataset.book_id.unique())
print("Number of unique users : ", n_users)
print("Number of unique items : ", n_items)

Number of ratings record :  981756
Number of unique users :  53424
Number of unique items :  10000


In [5]:
# book metadata 
book_dataset = pd.read_csv("data/books.csv")

In [6]:
book_dataset.head()

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


In [7]:
book_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 23 columns):
id                           10000 non-null int64
book_id                      10000 non-null int64
best_book_id                 10000 non-null int64
work_id                      10000 non-null int64
books_count                  10000 non-null int64
isbn                         9300 non-null object
isbn13                       9415 non-null float64
authors                      10000 non-null object
original_publication_year    9979 non-null float64
original_title               9415 non-null object
title                        10000 non-null object
language_code                8916 non-null object
average_rating               10000 non-null float64
ratings_count                10000 non-null int64
work_ratings_count           10000 non-null int64
work_text_reviews_count      10000 non-null int64
ratings_1                    10000 non-null int64
ratings_2                    10000 n

#### merged dataset

In [8]:
dataset = pd.merge(rating_dataset, book_dataset, how='left',left_on='book_id', right_on='id')

In [9]:
dataset.head()

Unnamed: 0,book_id_x,user_id,rating,id,book_id_y,best_book_id,work_id,books_count,isbn,isbn13,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,314,5,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,1,439,3,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
2,1,588,5,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
3,1,1169,4,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
4,1,1185,4,1,2767052,2767052,2792775,272,439023483,9780439000000.0,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...


### Train Test Split

In [12]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(dataset, test_size=0.2, random_state=0)

In [13]:
train.head()

Unnamed: 0,book_id_x,user_id,rating,id,book_id_y,best_book_id,work_id,books_count,isbn,isbn13,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
456599,4575,31335,5,4575,867248,867248,650820,21,689831870,9780690000000.0,...,24556,24814,604,275,735,4007,7408,12389,https://images.gr-assets.com/books/1344390790m...,https://images.gr-assets.com/books/1344390790s...
72367,724,28407,3,724,2233407,2233407,3159807,67,441015891,9780441000000.0,...,151095,161814,3500,946,5941,37809,62516,54602,https://s.gr-assets.com/assets/nophoto/book/11...,https://s.gr-assets.com/assets/nophoto/book/50...
103825,1039,8820,3,1039,5899779,5899779,6072122,92,1594743347,9781595000000.0,...,103995,110252,12184,10085,17255,33627,29961,19324,https://images.gr-assets.com/books/1320449653m...,https://images.gr-assets.com/books/1320449653s...
225380,2256,16884,5,2256,381421,381421,371207,75,1591451884,9781591000000.0,...,35219,38748,984,683,781,2888,7127,27269,https://s.gr-assets.com/assets/nophoto/book/11...,https://s.gr-assets.com/assets/nophoto/book/50...
154477,1545,53293,5,1545,776407,776407,3244521,146,525444440,9780525000000.0,...,69102,71538,866,689,1444,9318,20394,39693,https://images.gr-assets.com/books/1348195621m...,https://images.gr-assets.com/books/1348195621s...


In [14]:
test.head()

Unnamed: 0,book_id_x,user_id,rating,id,book_id_y,best_book_id,work_id,books_count,isbn,isbn13,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
350097,3506,10626,4,3506,856917,856917,3874446,9,1421500167.0,9781422000000.0,...,29385,29550,400,1197,2156,5661,6559,13977,https://s.gr-assets.com/assets/nophoto/book/11...,https://s.gr-assets.com/assets/nophoto/book/50...
333722,3341,40305,5,3341,18739426,18739426,26616838,21,,,...,28866,33232,2379,42,249,2519,12186,18236,https://images.gr-assets.com/books/1450134973m...,https://images.gr-assets.com/books/1450134973s...
552434,5544,22396,5,5544,10677277,10677277,15586973,17,765329581.0,9780765000000.0,...,23732,24996,2552,1007,2127,6014,8111,7737,https://images.gr-assets.com/books/1306520962m...,https://images.gr-assets.com/books/1306520962s...
227221,2274,12806,5,2274,9370,9370,1231351,29,1842430343.0,9781842000000.0,...,32991,36509,1288,490,1508,7482,13828,13201,https://s.gr-assets.com/assets/nophoto/book/11...,https://s.gr-assets.com/assets/nophoto/book/50...
710188,7146,32365,4,7146,164323,164323,2888612,48,553383663.0,9780553000000.0,...,11158,11814,471,239,705,3257,4429,3184,https://images.gr-assets.com/books/1320394284m...,https://images.gr-assets.com/books/1320394284s...


## Creating dot product model
Most recommendation systems are build using a simple dot product as shown below but newer ones are now implementing a neural network instead of the simple dot product.

In [10]:
# creating book embedding path
item_input = Input(shape=[1], name="Item-Input")
item_embedding = Embedding(n_items+1, 5, name="Item-Embedding")(item_input)
item_vec = Flatten(name="Flatten-Items")(item_embedding)

# creating user embedding path
user_input = Input(shape=[1], name="User-Input")
user_embedding = Embedding(n_users+1, 5, name="User-Embedding")(user_input)
user_vec = Flatten(name="Flatten-Users")(user_embedding)

# performing dot product and creating model
prod = Dot(name="Dot-Product", axes=1)([item_vec, user_vec])
model_1 = Model([user_input, item_input], prod)
model_1.compile('adam', 'mean_squared_error')
model_1.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Item-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
User-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
Item-Embedding (Embedding)      (None, 1, 5)         50005       Item-Input[0][0]                 
__________________________________________________________________________________________________
User-Embedding (Embedding)      (None, 1, 5)         267125      User-Input[0][0]                 
__________________________________________________________________________________________________
Flatten-It

#### Convert Keras Model to Estimator

In [51]:
estimator_1 = tf.keras.estimator.model_to_estimator(keras_model=model_1)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp92d9h2ce', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7531af03c8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


#### Train Estimator

In [52]:
model_1.input_names

['User-Input', 'Item-Input']

In [53]:
def parser(item_id, user_id, rating):
    x = {
        'User-Input': user_id,
        'Item-Input': item_id
     }
    
    y = rating
    return x,y    


def my_input_fn(csv_path, shuffle=True):
    dataset = (
        tf.data.experimental.CsvDataset(
            filenames=csv_path,
            record_defaults=[tf.int32, tf.int32, tf.int32],
            select_cols=[0, 1, 2],
            field_delim=",",
            header=True)
        .map(parser)
#         .shuffle(shuffle)
         .batch(10000)
    )
    iterator = dataset.make_one_shot_iterator()
    batch_feats, batch_labels = iterator.get_next()
    return batch_feats, batch_labels

In [54]:
train_spec = tf.estimator.TrainSpec(input_fn = lambda: my_input_fn('data/ratings.csv') , max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn = lambda: my_input_fn('data/ratings.csv', shuffle=False) )

In [55]:
tf.estimator.train_and_evaluate(estimator_1, train_spec, eval_spec)

INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp92d9h2ce/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/tmp92d9h2ce/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: User-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Item-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/iterations; prev_var_name: Unchanged
INFO:tensorflow

({'loss': 15.844675, 'global_step': 100}, [])

In [259]:
def input_function(model, user_ids, item_ids, labels=None, is_eval=False):
    if not is_eval:
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x = {
                model.input_names[0]: user_ids,
                model.input_names[1]: item_ids
            }, 
            y = labels,
            shuffle=True,
            batch_size = 1000,
            num_epochs = 500
        )
        return input_fn
    else:
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x = {
                model.input_names[0]: user_ids,
                model.input_names[1]: item_ids
            }, 
            y = labels,
            shuffle=False
        )
        return input_fn

In [260]:
# setup input function 
# for training
input_fn= input_function(model_1, train.user_id.values, train.id.values,train.rating.values,is_eval=False)
# for evaluation
eval_fn=input_function(model_1,test.user_id.values, test.id.values,test.rating.values,is_eval=True)

In [None]:
estimator_1.train()

In [261]:
# train estimator
estimator_1.train(input_fn, max_steps=10000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp46de9l9e/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/tmp46de9l9e/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: User-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Item-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/iterations; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/lr; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/beta_1; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/beta_2; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Adam/decay; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: training/Ada

INFO:tensorflow:loss = 1.6387509, step = 6601 (1.389 sec)
INFO:tensorflow:global_step/sec: 72.0462
INFO:tensorflow:loss = 1.53079, step = 6701 (1.387 sec)
INFO:tensorflow:global_step/sec: 73.7704
INFO:tensorflow:loss = 1.5608109, step = 6801 (1.356 sec)
INFO:tensorflow:global_step/sec: 65.8414
INFO:tensorflow:loss = 1.491625, step = 6901 (1.518 sec)
INFO:tensorflow:global_step/sec: 65.247
INFO:tensorflow:loss = 1.4734572, step = 7001 (1.533 sec)
INFO:tensorflow:global_step/sec: 64.4005
INFO:tensorflow:loss = 1.5142448, step = 7101 (1.554 sec)
INFO:tensorflow:global_step/sec: 65.591
INFO:tensorflow:loss = 1.4290537, step = 7201 (1.525 sec)
INFO:tensorflow:global_step/sec: 65.0822
INFO:tensorflow:loss = 1.4109411, step = 7301 (1.534 sec)
INFO:tensorflow:global_step/sec: 74.3792
INFO:tensorflow:loss = 1.4072437, step = 7401 (1.345 sec)
INFO:tensorflow:global_step/sec: 69.796
INFO:tensorflow:loss = 1.3634636, step = 7501 (1.432 sec)
INFO:tensorflow:global_step/sec: 77.9636
INFO:tensorflow:

<tensorflow.python.estimator.estimator.Estimator at 0x7f97207b8b00>

In [262]:
score = estimator_1.evaluate(eval_fn)
print(score)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-21-19:21:25
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp46de9l9e/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-21-19:21:27
INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 1.3661216
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tmp46de9l9e/model.ckpt-10000
{'loss': 1.3661216, 'global_step': 10000}


## Creating Neural Network
Neural Networks proved there effectivness for almost every machine learning problem as of now and they also perform exceptionally well for recommendation systems.

In [56]:
# creating book embedding path
item_input = Input(shape=[1], name="Item-Input")
item_embedding = Embedding(n_items+1, 10, name="Item-Embedding")(item_input)
item_vec = Flatten(name="Flatten-Items")(item_embedding)

# creating user embedding path
user_input = Input(shape=[1], name="User-Input")
user_embedding = Embedding(n_users+1, 10, name="User-Embedding")(user_input)
user_vec = Flatten(name="Flatten-Users")(user_embedding)

# concatenate features
conc = Concatenate()([item_vec, user_vec])

# add fully-connected-layers
fc1 = Dense(128, activation='relu')(conc)
fc2 = Dense(32, activation='relu')(fc1)
out = Dense(1)(fc2)

# Create model and compile it
model_2 = Model([user_input, item_input], out)
model_2.compile('adam', 'mean_squared_error')
model_2.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Item-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
User-Input (InputLayer)         (None, 1)            0                                            
__________________________________________________________________________________________________
Item-Embedding (Embedding)      (None, 1, 10)        100010      Item-Input[0][0]                 
__________________________________________________________________________________________________
User-Embedding (Embedding)      (None, 1, 10)        534250      User-Input[0][0]                 
__________________________________________________________________________________________________
Flatten-It

#### Keras Model to Estimator

In [57]:
estimator_2 = tf.keras.estimator.model_to_estimator(keras_model = model_2)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpgcd9su9u', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7531b802b0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [58]:
tf.estimator.train_and_evaluate(estimator_2, train_spec, eval_spec)

INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpgcd9su9u/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/tmpgcd9su9u/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: User-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Item-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense/kernel; prev_var_name: Unchanged
INFO:tensorflow:Wa

({'loss': 1.0810875, 'global_step': 100}, [])

In [268]:
# setup input function 
# for training
input_fn= input_function(model_2, train.user_id.values, train.id.values,train.rating.values,is_eval=False)
# for evaluation
eval_fn=input_function(model_2,test.user_id.values, test.id.values,test.rating.values,is_eval=True)

In [266]:
# train estimator
estimator_2.train(input_fn, max_steps=10000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp6wz4ghhb/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: ('/tmp/tmp6wz4ghhb/keras/keras_model.ckpt',)
INFO:tensorflow:Warm-starting variable: User-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: Item-Embedding/embeddings; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense_18/kernel; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense_18/bias; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense_19/kernel; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense_19/bias; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: dense_20/kernel; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting varia

INFO:tensorflow:loss = 1.0262319, step = 4401 (1.698 sec)
INFO:tensorflow:global_step/sec: 47.1492
INFO:tensorflow:loss = 0.95939904, step = 4501 (2.118 sec)
INFO:tensorflow:global_step/sec: 48.7905
INFO:tensorflow:loss = 0.98500043, step = 4601 (2.051 sec)
INFO:tensorflow:global_step/sec: 53.6693
INFO:tensorflow:loss = 0.97792536, step = 4701 (1.862 sec)
INFO:tensorflow:global_step/sec: 55.1558
INFO:tensorflow:loss = 0.8980318, step = 4801 (1.813 sec)
INFO:tensorflow:global_step/sec: 53.7712
INFO:tensorflow:loss = 0.95397365, step = 4901 (1.861 sec)
INFO:tensorflow:global_step/sec: 46.2103
INFO:tensorflow:loss = 0.99737346, step = 5001 (2.166 sec)
INFO:tensorflow:global_step/sec: 50.8114
INFO:tensorflow:loss = 1.000003, step = 5101 (1.965 sec)
INFO:tensorflow:global_step/sec: 60.2153
INFO:tensorflow:loss = 1.0240545, step = 5201 (1.663 sec)
INFO:tensorflow:global_step/sec: 60.3721
INFO:tensorflow:loss = 0.9357433, step = 5301 (1.655 sec)
INFO:tensorflow:global_step/sec: 59.5267
INFO:t

<tensorflow.python.estimator.estimator.Estimator at 0x7f96ff910358>

In [269]:
# evaluate estimator
score = estimator_2.evaluate(eval_fn)
print(score)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-21-19:25:48
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp6wz4ghhb/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-21-19:25:51
INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 0.971013
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tmp6wz4ghhb/model.ckpt-10000
{'loss': 0.971013, 'global_step': 10000}


### Exporting Model

In [270]:
model_2.input_names

['User-Input', 'Item-Input']

In [273]:
# setup feature specification
feature_spec = {
    model_2.input_names[0] : tf.FixedLenFeature(shape=[1], dtype=np.float32),
    model_2.input_names[1] : tf.FixedLenFeature(shape=[1], dtype=np.float32)
}
# serving function
serving_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)

# export 

# clean the output foler
dirpath = os.path.join(os.path.curdir,"export")
if os.path.exists(dirpath) and os.path.isdir(dirpath):
    shutil.rmtree(dirpath)
    
# create export folder
os.makedirs(dirpath)

# export model for serving
export_dir = estimator_2.export_savedmodel(export_dir_base="export", 
                                       serving_input_receiver_fn=serving_fn)


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tmp6wz4ghhb/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: export/temp-b'1553196473'/saved_model.pb


In [274]:
!ls export/*

saved_model.pb	variables


##### Inspect Model

In [275]:
!saved_model_cli show --dir export/* --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['examples'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_example_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_20'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: dense_20/BiasAdd:0
  Method name is: tensorflow/serving/predict


### Making Predictions

In [276]:
predict_fn = tf.contrib.predictor.from_saved_model(export_dir)

INFO:tensorflow:Restoring parameters from export/1553196473/variables/variables


In [303]:
# creating data for prediction

# all items
item_data = np.array(list(set(dataset.id)))

# we need to create user data of the same shape
user_to_predict = 7  # let's predict for first user
user_data = np.array([user_to_predict for i in range(len(item_data))]) # repeating the user_id to the times of each unique item

In [304]:
# Test inputs represented by Pandas DataFrame.
inputs = pd.DataFrame({
    'User-Input': user_data,
    'Item-Input': item_data
})

inputs.head()


Unnamed: 0,User-Input,Item-Input
0,7,1
1,7,2
2,7,3
3,7,4
4,7,5


In [305]:
# Convert input data into serialized Example strings.
examples = []
for index, row in inputs.iterrows():
    feature = {}
    for col, value in row.iteritems():
        feature[col] = tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    example = tf.train.Example(
        features=tf.train.Features(
            feature=feature
        )
    )
    examples.append(example.SerializeToString())
    
predictions = predict_fn({'examples': examples})



In [306]:
pred = predict_fn({'examples': examples})
pred = pred[model_2.output_names[0]].flatten()
print(-np.sort(-pred)[:10])
# top 10 items 
recommended_item_ids = (-pred).argsort()[:10]
print(recommended_item_ids)

[3.8708832 3.8651752 3.8641806 3.863225  3.862073  3.8602185 3.8601568
 3.859663  3.8596017 3.8595254]
[8225 9126 1252 3219 9543 1153 9923 6421 5775 9327]


In [307]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 981756 entries, 0 to 981755
Data columns (total 26 columns):
book_id_x                    981756 non-null int64
user_id                      981756 non-null int64
rating                       981756 non-null int64
id                           981756 non-null int64
book_id_y                    981756 non-null int64
best_book_id                 981756 non-null int64
work_id                      981756 non-null int64
books_count                  981756 non-null int64
isbn                         914534 non-null object
isbn13                       925621 non-null float64
authors                      981756 non-null object
original_publication_year    979774 non-null float64
original_title               926219 non-null object
title                        981756 non-null object
language_code                876635 non-null object
average_rating               981756 non-null float64
ratings_count                981756 non-null int64
work_rating

In [325]:
# books rating by the user
'\n'.join([str(x) for x in list(dataset[dataset.user_id == user_to_predict]["original_title"].values)])

"デスノート #1 (Desu Nōto) Taikutsu (退屈)\nOld Man's War\nSurely You're Joking, Mr. Feynman! Adventures of a Curious Character\nStone of Tears\nY: The Last Man, Vol. 1: Unmanned\nPersepolis\nThe Fall of Hyperion\nThe Diamond Age\nBlood of the Fold\nA Scanner Darkly \nShadow of the Hegemon\nThe Sword of Shannara\nThe Yiddish Policemen's Union\nThe Elfstones Of Shannara\nBarrel Fever: Stories and Essays\nCollapse: How Societies Chose to Fail or Succeed\nConsider Phlebas\nGod Emperor of Dune\nAnathem\nThe Windup Girl\nGhost World\nThe Ghost Brigades\nAltered Carbon\nThe Player of Games\nThud!\nPerdido Street Station\nHeretics of Dune\nShadow of the Giant\nA Fire Upon The Deep\nBlack Hole\nTransmetropolitan, Vol. 1: Back on the Street\nFlatland: A Romance of Many Dimensions\nThe City & The City\nRevelation Space\nThe Talismans of Shannara\nThe Rise of Endymion\nEndymion\nQuicksilver\nY: The Last Man Vol. 2: Cycles\nShip Breaker\nEaters of the Dead\nMona Lisa Overdrive\nThe Druid of Shannara\nAlv

In [354]:
# recommended books
#'\n'.join([str(x) for x in list(book_dataset[book_dataset['id'].isin(recommended_item_ids)]["original_title"].values)])
dict({ x[0]: x[1] for x in book_dataset[book_dataset['id'].isin(recommended_item_ids)][["original_title","small_image_url"]].values})

#book_dataset[book_dataset['id'].isin(recommended_item_ids)][["id","original_title","authors"]]

{'A Dirty Job': 'https://images.gr-assets.com/books/1331323415s/33456.jpg',
 'Behind the Beautiful Forevers: Life, Death, and Hope in a Mumbai Undercity': 'https://images.gr-assets.com/books/1315601232s/11869272.jpg',
 'The Little Friend': 'https://images.gr-assets.com/books/1327936589s/775346.jpg',
 'Cold Fire': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png',
 'Teardrop': 'https://images.gr-assets.com/books/1360596375s/16070143.jpg',
 'Princess on the Brink': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png',
 'Superman for All Seasons': 'https://images.gr-assets.com/books/1343797123s/106859.jpg',
 'Artemis Fowl Boxed Set (Artemis Fowl, #1-5)': 'https://images.gr-assets.com/books/1279206196s/2358870.jpg',
 nan: 'https://images.gr-assets.com/books/1348436233s/16041169.jpg',
 'The Green Mile, Part 5: Night Journey': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png'}

In [351]:
{ x[0]: x[1] for x in dataset[dataset.user_id == user_to_predict][["original_title","small_image_url"]].values}

{'デスノート #1 (Desu Nōto) Taikutsu (退屈)': 'https://images.gr-assets.com/books/1419952134s/13615.jpg',
 "Old Man's War": 'https://images.gr-assets.com/books/1487044882s/51964.jpg',
 "Surely You're Joking, Mr. Feynman! Adventures of a Curious Character": 'https://images.gr-assets.com/books/1348445281s/5544.jpg',
 'Stone of Tears': 'https://images.gr-assets.com/books/1478930875s/234184.jpg',
 'Y: The Last Man, Vol. 1: Unmanned': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png',
 'Persepolis': 'https://images.gr-assets.com/books/1327876995s/991197.jpg',
 'The Fall of Hyperion': 'https://images.gr-assets.com/books/1429215870s/77565.jpg',
 'The Diamond Age': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png',
 'Blood of the Fold': 'https://images.gr-assets.com/books/1443563626s/43892.jpg',
 'A Scanner Darkly ': 'https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png',
 'Shadow of the He