## Import Packages

In [None]:
import pandas as pd
import numpy as np

from keras.layers import Input, Embedding, Flatten, Dot
from keras.models import Model
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

import seaborn as sns
from sqlalchemy import create_engine

import plotly.express as px
import plotly.graph_objects as go

## Create Engine to connecto to RDS Database

In [None]:
engine = create_engine("mysql+pymysql://{user}:{pw}@{host}/{db}"
                       .format(user="admin",
                               pw="}G~j_?DwNLe{|4Q{]#",
                               host="database-1.clar7sbwghxi.eu-west-1.rds.amazonaws.com",
                               db="recommender"))

## Recover Data from SQL Tables

The book_tags and tags queries are commented since they were not used on this iteration of the project due to lack of time

In [None]:
ratings = pd.read_sql_query('SELECT * FROM rating', engine)
books = pd.read_sql_query('SELECT * FROM book', engine)
# book_tags = pd.read_sql_query('SELECT * FROM books_tags', engine)
# tags = pd.read_sql_query('SELECT * FROM tag', engine)

Split the data into train and test data.

In [None]:
train, test = train_test_split(ratings, test_size=0.2, random_state=42)
n_users = len(ratings['user_id'].unique())
n_books = len(ratings['book_id'].unique())

 Now we will create vectors, The first chunk says takes a book id as input, and then embed the user into a 5-dimensional space. Flatten it out so that we have a vector.
 
 The same goes for the user (by using the user id). 
 
 The last chunk takes the dot product between these two vectors and produces a single number. We then define the model by saying that we want to take the inputs and output the dot product between their latent embedding space.

In [None]:
book_input = Input(shape=[1], name="Book-Input")
book_embedding = Embedding(n_books+1, 5, name="Book-Embedding")(book_input)
book_vec = Flatten(name="Flatten-Books")(book_embedding)

user_input = Input(shape=[1], name="User-Input")
user_embedding = Embedding(n_users+1, 5, name="User-Embedding")(user_input)
user_vec = Flatten(name="Flatten-Users")(user_embedding)

prod = Dot(name="Dot-Product", axes=1)([book_vec, user_vec])
model = Model([user_input, book_input], prod)
model.compile('adam', 'mean_squared_error')

Fit the model chosing 10 epochs and verbose mode to evaluate the evolution

In [None]:
history = model.fit([train['user_id'], train['book_id']], train['rating'], epochs=10, verbose=1)
model.save('recommender_model.h5')

Check the summary

In [None]:
model.summary()

Embeddings are weights that are learned to represent some specific variable like books and user in our case and therefore we can not only use them to get good results on our problem but also to extract inside about our data.

In [None]:
# Extract embeddings
book_em = model.get_layer('Book-Embedding')
book_em_weights = book_em.get_weights()[0]

In [None]:
pca = PCA(n_components=2)
pca_result = pca.fit_transform(book_em_weights)
fig = px.scatter(x=pca_result[:,0], y=pca_result[:,1])
fig.show()

In [None]:
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tnse_results = tsne.fit_transform(book_em_weights)
fig = px.scatter(x=tnse_results[:,0], y=tnse_results[:,1])
fig.show()

Finally, we can visualize the improvement of the loss function over the epochs

In [None]:
loss = pd.Series(history.history['loss'])
fig = px.line( x=loss.index, y=loss, title='Loss evolution per epoch', log_y=True)
fig.update_xaxes(title_text='Epochs')
fig.update_yaxes(title_text='Loss')
fig.show()

We can now test the accuracy of our model with our test data

In [None]:
predictions = model.predict([test['user_id'], test['book_id']])
predictions = np.array([a[0] for a in predictions])

In [None]:
test['predicted rating'] = predictions

In [None]:
test

As we can see, we have a fairly accurate model