# Recommender system: the MovieLens case
We are interested in the MovieLens-100k database. https://grouplens.org/datasets/movielens/

This notebook is partly inspired by https://github.com/m2dsupsdlclass/lectures-labs/blob/master/labs/03_neural_recsys/Explicit_Feedback_Neural_Recommender_System_rendered.ipynb


## 1. Data analysis, visualization and enrichment


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


import os.path as op

from zipfile import ZipFile
try:
    from urllib.request import urlretrieve
except ImportError:  # Python 2 compat
    from urllib import urlretrieve


ML_100K_URL = "http://files.grouplens.org/datasets/movielens/ml-100k.zip"
ML_100K_FILENAME = ML_100K_URL.rsplit('/', 1)[1]
ML_100K_FOLDER = 'ml-100k'

if not op.exists(ML_100K_FILENAME):
    print('Downloading %s to %s...' % (ML_100K_URL, ML_100K_FILENAME))
    urlretrieve(ML_100K_URL, ML_100K_FILENAME)

if not op.exists(ML_100K_FOLDER):
    print('Extracting %s to %s...' % (ML_100K_FILENAME, ML_100K_FOLDER))
    ZipFile(ML_100K_FILENAME).extractall('.')

In [None]:
df = pd.read_csv('ml-100k/u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])
df.head()

<font color="red">With the describe() method, analyze the ratings column.

---

How many ratings in total? Min and max values? Average, etc.</font>

In [None]:
df['rating'].describe()

<font color="red">
Evaluate the number of unique users, the number of unique items.</font>

In [None]:
# Nombre d'utilisateurs, nombre de films
n_user = df['user_id'].nunique()
n_item = df['item_id'].nunique()
print('n_user =',n_user)
print('n_item =',n_item)

### Item metadata: data enrichment

The item metadata file contains metadata such as the name of the movie or the date it was released. The movies file contains columns indicating the genres of the movie. We only load the first five columns of the file with `usecols`.

In [None]:
m_cols = ['item_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
items = pd.read_csv('ml-100k/u.item', sep='|',names=m_cols, usecols=range(5), encoding='latin-1')
items.head()

We extract the release date as an integer value:

In [None]:
def extract_year(release_date):
    if hasattr(release_date, 'split'):
        components = release_date.split('-')
        if len(components) == 3:
            return int(components[2])
    # Missing value marker
    return 1920

items['release_year'] = items['release_date'].map(extract_year)
items.head()


In [None]:
items.hist('release_year',bins=50)

Let's further enrich the data by adding the popularity of each movie

In [None]:
popularity = df.groupby('item_id').size().reset_index(name='popularity')
items = pd.merge(popularity, items)
items.head()

We enrich the raw ratings with the metadata collected in this way: we will use this metadata later in this lab.

In [None]:
all_ratings = pd.merge(items, df)
all_ratings.head()

<font color="red">

*   Élément de liste
*   Élément de liste


Using groupby, create a new dataframe 'rating_movies' containing the average rating of each movie.
Rank the movies by decreasing order of ratings. What are the movies with the highest grades?</font>

In [None]:
rating_movies = pd.DataFrame(all_ratings.groupby('title')['rating'].mean())
rating_movies.sort_values(by = 'rating',ascending=False).head(10)

<font color="red">
Represent the histogram of the average ratings of the films. What do the "peaks" at 1 and 5 correspond to?</font>

In [None]:
rating_movies.hist(bins=50);

<font color="red">
Add to the dataframe 'rating_movies' a new column 'num of ratings' containing the number of rating of every movie. Provide your comments.</font>

In [None]:
rating_movies['num of ratings'] = all_ratings.groupby('title')['rating'].count()
rating_movies.sort_values(by='rating',ascending=False).head()

Cela confirme que les films les mieux notés ont peu de notes.

<font color="red">
Display the titles of films with the most ratings and discuss their ratings.
</font>

In [None]:
rating_movies.sort_values(by='num of ratings',ascending=False).head()

<font color="red">
With sns.jointplot, represent the point cloud (average rating, number of ratings) on all the films (use the alpha= parameter of your choice).
Discuss the graph.
</font>

In [None]:
import seaborn as sns
sns.jointplot(x='rating',y='num of ratings',data=rating_movies,alpha=0.5);

- Movies with an average rating of 1 or 5 have a small number of ratings
- Trend: films with a large number of ratings are globally highly rated films
- A frequently rated film cannot have a rating very close to 5

## 2. Evaluation of similarity between items
In this part, we set an item, say 'Star Wars (1977)'. We want to find films "similar" to Star Wars, and classify them in order of similarity, in order to recommend them.
<font color="red">
With df.pivot_table, create the interaction matrix, whose indices are the rows corresponding to the user_id, the columns correspond to the item_id, and whose entries are the ratings.
</font>

In [None]:
moviemat = pd.pivot_table(data=all_ratings,index='user_id',columns='title',values='rating')
moviemat.head()

<font color="red">
With the .corr method, calculate the correlation of the column-vector 'Star Wars (1977)' with the column 'Liar Liar (1997)'
</font>

In [None]:
starwars_user_ratings = moviemat['Star Wars (1977)']
liarliar_user_ratings = moviemat['Liar Liar (1997)']
starwars_user_ratings.corr(liarliar_user_ratings)

<font color="red">
What does the result of the corrwith() method below provide?
</font>

In [None]:
moviemat.corrwith(starwars_user_ratings)

<font color="red">
Transform the series above into a dataframe, and display the movie titles in ascending order of their correlation with starwars.
Discuss the result.</font>

In [None]:
similar_to_starwars = pd.DataFrame(moviemat.corrwith(starwars_user_ratings),columns=['Corr'])
similar_to_starwars.sort_values(by='Corr',ascending=False).head(10)

The problem is that the correlations are calculated on the non-NaN values

If a film received an r rating from a single user, and that user rated r starwars, the correlation is 1

To avoid this, we will look for the correlation among the films with more than 100 ratings

<font color="red">

Add to the similar_to_starwars dataframe a new column corresponding to the number of ratings of each movie.</font>

In [None]:
similar_to_starwars['num of ratings'] = rating_movies['num of ratings']
similar_to_starwars.head()

<font color="red">
Filter the rows to display only titles with a "sufficient" number of ratings for you to retain them in the ranking.
</font>

In [None]:
most_similar_movies = similar_to_starwars[similar_to_starwars['num of ratings']>100]
most_similar_movies.sort_values(by='Corr',ascending=False,inplace=True)
most_similar_movies.head(10)

<font color="red">
If this is not done, perform the ranking by decreasing correlation values in "inplace" mode.
Then, thanks to reset_index, get a new column corresponding to the rating of each movie.</font>

In [None]:
most_similar_movies.reset_index(inplace=True)
most_similar_movies.head(10)

## 3. Collaborative filtering
We want to predict unobserved ratings from the n_user x n_items interaction matrix
<br>
<font color="red">
- Replace all NaNs with zeros with the fillna(0) method
- Extract values as numpy.array
</font>

In [None]:
ratings = moviemat.fillna(0).values
ratings[:5,:5]

<font color="red">


How many non-zero elements are there? Evaluate the percentage of non-zero elements in this matrix.
</font>

In [None]:
(ratings>0).sum()/(ratings.shape[0] * ratings.shape[1])*100

We compute a similarity matrix $S=(S_{i,j})$ between users.
We consider the 'centered cosine similarity':
$$
S_{i,j} = \frac{\langle \bar r_i,\bar r_j\rangle}{\|\bar r_i\|\,\|\bar r_j\|}
$$
where $\bar r_i$ is the vector of user ratings $i$, recentered by the average of the ratings given by the user in question. The following function provides the similarity matrix.

In [None]:
def phi(x):
    return np.maximum(x,0)

def similarity(ratings):

    # vector containing for each user the number of ratings given
    r_user = (ratings>0).sum(axis=1)

    # vector containing for each user the average of the ratings given
    m_user = np.divide(ratings.sum(axis=1) , r_user, where=r_user!=0)

    # Notes recentered by the average per user: each line i contains the vector \bar r_i
    ratings_ctr = ratings.T - ((ratings.T!=0) * m_user)
    ratings_ctr = ratings_ctr.T

    # Gram matrix containing inner products
    sim = ratings_ctr.dot(ratings_ctr.T)

    # Renormalization
    norms = np.array([np.sqrt(np.diagonal(sim))])
    sim = sim / norms / norms.T
    sim = phi(sim)

    return sim

### Prédiction
On souhaite prédire toutes les notes d'un utilisateur $u$, à partir des notes données par les utilisateurs qui lui sont similaires.

Une première approche consiste, pour tout item $i$, à définir
$$
\hat r_{u,i} = \frac{\sum_{v} S_{u,v} r_{v,i}}{\sum_{v:r_{v,i}\neq 0} S_{u,v}}
$$
où la somme est restreinte aux utilisateurs $v$ ayant effectivement noté l'item i
<br>

<font color="red">
Compute the predicted ratings
</font>

In [None]:
sim = similarity(ratings)
numerator = sim.dot(ratings)
denominator = sim.dot(ratings>0)
pred_ratings = np.divide(numerator,denominator,where = denominator!=0)

<font color="red">
Afficher les prédictions du premier utilisateur pour les dix premiers items.</font>

In [None]:
print(pred_ratings[0,:10])

We can assess the error. The metric traditionally used is the RMSE:
$$
RMSE = \sqrt{\frac 1N\sum_{(u,i)\text{ observed}}(R_{u,i}-\hat R_{u,i})^2}
$$
where $N$ is the number of observed ratings, or the MAE
$$
MAE = \frac 1N\sum_{(u,i)\text{ observed}}|R_{u,i}-\hat R_{u,i}|\,.
$$
<font color="red">
Calculate the RMSE (root mean square error) and the MAE (mean absolute error).</font>

In [None]:
RMSE = np.sqrt(np.sum(((ratings - pred_ratings) * (ratings>0))**2) / np.sum(ratings>0))
MAE = np.sum(np.abs((ratings - pred_ratings) * (ratings>0))) / np.sum(ratings>0)
RMSE,MAE

### Validation
<font color="red">Make a train test split</font>

In [None]:
from sklearn.model_selection import train_test_split

train_ratings, test_ratings = train_test_split(all_ratings, test_size=0.2, random_state=0)

user_id_train = train_ratings['user_id']
item_id_train = train_ratings['item_id']
rating_train = train_ratings['rating']

user_id_test = test_ratings['user_id']
item_id_test = test_ratings['item_id']
rating_test = test_ratings['rating']

We generate the train and the test rating matrices.

In [None]:
from scipy.sparse import *
train = coo_matrix((rating_train.values,(user_id_train.values-1,item_id_train.values-1)),
                   shape=(n_user,n_item)).toarray()
test = coo_matrix((rating_test.values,(user_id_test.values-1,item_id_test.values-1)),
                   shape=(n_user,n_item)).toarray()

Nous définissons les fonctions nécessaires à la prédiction et l'évaluation

In [None]:
def predict_ratings(ratings,sim):

    wsum_sim = np.abs(sim).dot(ratings>0)
    return np.divide(sim.dot(ratings) , wsum_sim, where= wsum_sim!=0)

def rmse(ratings,pred):
    return np.sqrt(np.sum(((ratings - pred) * (ratings>0))**2) / np.sum(ratings>0))

<font color="red">
- With similarity(), evaluate the similarity matrix on the train set.
- Predict ratings
- Calculate the RMSE on the test set.
</font>

In [None]:
sim = similarity(train)
pred_ratings = predict_ratings(train,sim)
rmse(test,pred_ratings)

### Comparison
<font color="red">
Predict each rating by the average user rating.</font>

In [None]:
av_ratings = train.sum(axis=1) / (train>0).sum(axis=1)
rmse(test,av_ratings.reshape(train.shape[0],1))

### Bias-subtracted Collaborative Filtering
Some users are likely to give ratings that are always quite high, or always quite low. There is therefore a bias relative to this user. One can imagine that the relative difference of the notes is more important than their absolute value.

So we'll subtract each user's rating average before summing over all similar users, then we'll re-add the subtracted average at the end:
$$
\hat r_{u,i} = \bar r_u + \frac{\sum_{v} S_{u,v} (r_{v,i}- \bar r_v)}{\sum_{v}S_{u, v}1_{r_{v,i}>0}}
$$
where $\bar r_u$ is the average rating of user $u$.
<br>
<font color="red">
Observe the difference between the predict_ratings_bias_sub function below and the previous predict_ratings function.
</font>

In [None]:
def predict_ratings_bias_sub(ratings,sim):

    r_user = (ratings>0).sum(axis=1)
    m_user = np.divide(ratings.sum(axis=1) , r_user, where=(r_user!=0))
    ratings_moyens = np.dot(m_user.reshape(len(m_user),1), np.ones((1,ratings.shape[1])))

    wsum_sim = np.abs(sim).dot(ratings>0)
    pred = ratings_moyens + np.divide(sim.dot(ratings-(ratings>0)*ratings_moyens),wsum_sim, where= wsum_sim!=0)

    return np.minimum(5,np.maximum(1,pred))

<font color="red">
Evaluate the performance of the new method.</font>

In [None]:
sim = similarity(train)
pred_ratings = predict_ratings_bias_sub(train,sim)
rmse(test,pred_ratings)

## 4. Matrix factorization approach

Rating prediction as a regression problem. We seek to solve the following optimization problem:
$$
\min_{P,Q} \sum_{(u,i)\,\text{observed}} (R_{u,i}- (UV)_{u,i})^2 + \lambda \|U\ |^2_F+ \lambda \|V\|^2_F
$$
Or
- $U$ is an array $n_{\text{user}}\times K$,
- $V$ is a matrix $K\times n_{\text{item}}$,
- $K$ is an integer, $\lambda>0$ is a regularization parameter,
- $\|\,.\,\|_F$ represents the Froebenius norm (the root of the sum of the squares of its coefficients).

<img src="https://bianchi.wp.imt.fr/files/2019/01/rec_archi_1.jpg" style="width: 600px;" />

The optimization problem is non-convex. We can seek to obtain a local minimum of the above criterion. At least two methods are commonly used.

Method 1: ALS (alternating least square). At each iteration $t$, we have an estimate $(U_t,V_t)$ of the solution. We successively solve the following subproblems:
\begin{align}
& U_{t+1} = \arg\min_{U} \sum_{(u,i)\,\text{observed}} (R_{u,i}- (UV_t)_{u,i})^ 2 + \lambda \|U\|^2_F \\
& V_{t+1} = \arg\min_{V} \sum_{(u,i)\,\text{observed}} (R_{u,i}- (U_{t+1}V)_{ u,i})^2 + \lambda \|V\|^2_F
\end{align}
These two problems are quadratic and convex and can be easily solved (for example, using a conjugate gradient algorithm). The algorithm converges to a local minimum of the objective function described at the beginning of this paragraph.

Method 2: SGD (stochastic gradient algorithm).


In [None]:
from keras.layers import Input, Embedding, Flatten, Dot
from keras.models import Model

In [None]:
# For each sample we input the integer identifiers
# of a single user and a single item
user_id_input = Input(shape=[1],name='user')
item_id_input = Input(shape=[1], name='item')

embedding_size = 30
user_embedding = Embedding(output_dim=embedding_size, input_dim=n_user + 1,
                           input_length=1, name='user_embedding')(user_id_input)

item_embedding = Embedding(output_dim=embedding_size, input_dim=n_item + 1,
                           input_length=1, name='item_embedding')(item_id_input)

# reshape from shape: (batch_size, input_length, embedding_size)
# to shape: (batch_size, input_length * embedding_size) which is
# equal to shape: (batch_size, embedding_size)
user_vecs = Flatten()(user_embedding)
item_vecs = Flatten()(item_embedding)

y = Dot(axes=1)([user_vecs, item_vecs])

model = Model(inputs=[user_id_input, item_id_input], outputs=y)


In [None]:
print(model.summary())

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model.png')
from IPython.display import Image
Image(filename='model.png')

In [None]:
model.compile(optimizer='adam', loss='mae')

<font color="red">How are the parameters of the hidden layer of embeddings initialized?
<br>
In order to verify that the code does not contain errors, predict the ratings when the input is the train set [user_id_train, item_id_train].
</font>

In [None]:
# Useful for debugging the output shape of model
initial_train_preds = model.predict([user_id_train, item_id_train])
initial_train_preds.shape

<font color="red">Calculate the MSE and MAE on the train set. Is it satisfactory? Why?</font>

In [None]:
squared_differences = np.square(initial_train_preds[:,0] - rating_train.values)
absolute_differences = np.abs(initial_train_preds[:,0] - rating_train.values)

print("Random init MSE: %0.3f" % np.mean(squared_differences))
print("Random init MAE: %0.3f" % np.mean(absolute_differences))

# You may also use sklearn metrics to do so using scikit-learn:

from sklearn.metrics import mean_squared_error, mean_absolute_error

print("Random init MSE: %0.3f" % mean_squared_error(initial_train_preds, rating_train))
print("Random init MAE: %0.3f" % mean_absolute_error(initial_train_preds, rating_train))


The following command saves the model parameters.

In [None]:
model.save_weights('initial_weights.h5')

### Model training

history.history which is returned by the model.fit function is a dictionary containing the 'loss' and the 'val_loss', the validation loss, after each epoch (one epoch = one pass over the data).
<br>
<font color="red">Explain what the arguments to model.fit() are below.</font>

In [None]:
%%time

# Training the model
history = model.fit([user_id_train, item_id_train], rating_train,
                    batch_size=64, epochs=20, validation_split=0.1,
                    shuffle=True)

<font color="red">Plot a graph representing the train loss and the validation loss.</font>

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')
plt.ylim(0, 2)
plt.legend(loc='best')
plt.title('Loss');

<font color="red">Why is train loss greater than validation loss in early iterations?
<br>
Compute predictions on test set [user_id_test, item_id_test] using model.predict(..)<br>
Evaluate MAE and MSE.</font>

In [None]:
test_preds = model.predict([user_id_test, item_id_test])
print("Final test MSE: %0.3f" % mean_squared_error(test_preds, rating_test))
print("Final test MAE: %0.3f" % mean_absolute_error(test_preds, rating_test))

### Early stopping
<br>
<font color="red">How does the validation loss behave in the last iterations? How to explain this phenomenon?</font>
<br>
We want to reproduce the previous experiment by adding a stopping criterion when the validation loss increases.
<br>
<font color="red">With model.load_weights(...), reset the model to the starting parameters.</font>

In [None]:
model.load_weights('initial_weights.h5')

In [None]:
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit([user_id_train, item_id_train], rating_train,
                    batch_size=64, epochs=20, validation_split=0.1,
                    callbacks=[early_stopping], shuffle=True)

In [None]:
test_preds = model.predict([user_id_test, item_id_test])
print("Final test MSE: %0.3f" % mean_squared_error(test_preds, rating_test))
print("Final test MAE: %0.3f" % mean_absolute_error(test_preds, rating_test))

ANSWER THIS QUESTION:

<font color="red">Why did we reset the model with model.load_weights('initial_weights.h5')? What would have happened otherwise?</font>


## Deep recommender model

Here is a more complex structure:

<img src="https://bianchi.wp.imt.fr/files/2019/01/rec_archi_2.jpg" style="width: 600px;" />


In [None]:
from keras.layers import Concatenate, Dropout, Dense

<font color="red">Comment the code below. In particular:<br>
- What are the Concatenate, Dropout, Dense functions for?<br>
- What non-linearity is used, and where in the network?</font>

In [None]:
user_id_input = Input(shape=[1], name='user')
item_id_input = Input(shape=[1], name='item')

embedding_size = 30
user_embedding = Embedding(output_dim=embedding_size, input_dim=n_user + 1,
                           input_length=1, name='user_embedding')(user_id_input)
item_embedding = Embedding(output_dim=embedding_size, input_dim=n_item + 1,
                           input_length=1, name='item_embedding')(item_id_input)

user_vecs = Flatten()(user_embedding)
item_vecs = Flatten()(item_embedding)

input_vecs = Concatenate()([user_vecs, item_vecs])
## Careful: Dropout too high prevents any training
input_vecs = Dropout(0.5)(input_vecs)

x = Dense(64, activation='relu')(input_vecs)

y = Dense(1)(x)

model = Model(inputs=[user_id_input, item_id_input], outputs=y)
model.compile(optimizer='adam', loss='mae')

initial_train_preds = model.predict([user_id_train, item_id_train])


<font color="red">Draw the network, graphically</font>

In [None]:
plot_model(model, to_file='model.png')
Image(filename='model.png')

<font color="red">What is the total number of parameters?</font>

In [None]:
print(model.summary())

<font color="red">Fit the model on the train set and evaluate performance
on the test.</font>

In [None]:
%%time
history = model.fit([user_id_train, item_id_train], rating_train,
                    batch_size=64, epochs=20, validation_split=0.1,
                    callbacks = [early_stopping], shuffle=True)

In [None]:
test_preds = model.predict([user_id_test, item_id_test])
print("Final test MSE: %0.3f" % mean_squared_error(test_preds, rating_test))
print("Final test MAE: %0.3f" % mean_absolute_error(test_preds, rating_test))

### Exercise (at home)
  - Add extra layer, compare performance
  - Try to add dropout while modifying the sizes of the different layers

## 6. Viewing embeddings
The following command extracts the coefficients of the different layers.

In [None]:
weights = model.get_weights()
[w.shape for w in weights]

<font color="red">Extract the matrix of user embeddings, and that of items.<br>
Display the embedding of item 0.</font>

In [None]:
user_embeddings = weights[0]
item_embeddings = weights[1]

In [None]:
print("First item name from metadata:", items["title"][0])
print("Embedding vector for the first item:")
print(item_embeddings[0])

### Visualisation des embeddings par tSNE <br>
<font color="red">Créer un transformer TSNE. On pourra choisir une perplexité égale à 30. Transformer "item_embeddings" avec fit_transform.</font>

In [None]:
from sklearn.manifold import TSNE
tsne = TSNE(perplexity=30)

item_tsne = tsne.fit_transform(item_embeddings)

In [None]:
plt.figure(figsize=(10, 10))
plt.scatter(item_tsne[:, 0], item_tsne[:, 1]);
plt.xticks(()); plt.yticks(());
plt.show()

Clusters seem (vaguely) to appear. We want to know to which points the films correspond.<br>

In [None]:
index_most_popular = items[items['popularity']>200].index
title_most_popular = items[items['popularity']>200].title

In [None]:
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.plotting import figure, show, output_file
from bokeh.io import output_notebook
output_notebook()

p = figure(tools="pan,wheel_zoom,reset,save",
           toolbar_location="above",
           title="T-SNE for most popular movies")

source = ColumnDataSource(data=dict(x1=item_tsne[index_most_popular,0],
                                    x2=item_tsne[index_most_popular,1],
                                    names=title_most_popular))

p.scatter(x="x1", y="x2", size=8, source=source)

labels = LabelSet(x="x1", y="x2", text="names", y_offset=6,
                  text_font_size="8pt", text_color="#555555",
                  source=source, text_align='center')
p.add_layout(labels)

show(p)

## 7. Incorporate metadata into the template

Using a framework similar to the one used previously, we will build another in-depth model that can also leverage additional metadata. The resulting system is therefore a **hybrid recommender system** that performs both **collaborative filtering** and **content-based recommendations**.
<img src="images/rec_archi_3.svg" style="width: 600px;" />

We want to add the columns ['popularity', 'release_year'] as input to our regressor, in addition to user_id and item_id. We pre-process these columns.

The QuantileTransformer method transforms a feature so that the output follows a uniform distribution. The "fit" therefore consists in calculating the (empirical) distribution function of the input sequence, and the transform in applying this distribution function sample by sample.<br>

<font color="red">What can be the point of this preliminary transformation?</font>

In [None]:
from sklearn.preprocessing import QuantileTransformer

meta_columns = ['popularity', 'release_year']

scaler = QuantileTransformer()
item_meta_train = scaler.fit_transform(train_ratings[meta_columns])
item_meta_test = scaler.transform(test_ratings[meta_columns])

We want to create the following architecture:

<img src="https://bianchi.wp.imt.fr/files/2019/01/model-metadata.png" style="width: 500px;" />

<font color="red">- What are the main differences with the previous network?<br>
- It's up to you to create this model, compile it, train it, and evaluate its performance.</font>

In [None]:
user_id_input = Input(shape=[1], name='user')
item_id_input = Input(shape=[1], name='item')
meta_input = Input(shape=[2], name='meta_item')

embedding_size = 32
user_embedding = Embedding(output_dim=embedding_size, input_dim=n_user+1,
                           input_length=1, name='user_embedding')(user_id_input)
item_embedding = Embedding(output_dim=embedding_size, input_dim=n_item+1,
                           input_length=1, name='item_embedding')(item_id_input)


user_vecs = Flatten()(user_embedding)
item_vecs = Flatten()(item_embedding)

input_vecs = Concatenate()([user_vecs, item_vecs, meta_input])

x = Dense(64, activation='relu')(input_vecs)
x = Dropout(0.5)(x)
x = Dense(32, activation='relu')(x)
y = Dense(1)(x)

model = Model(inputs=[user_id_input, item_id_input, meta_input], outputs=y)
model.compile(optimizer='adam', loss='mae')

initial_train_preds = model.predict([user_id_train, item_id_train, item_meta_train])

In [None]:
plot_model(model, to_file='model.png')
Image(filename='model.png')

In [None]:
history = model.fit([user_id_train, item_id_train, item_meta_train], rating_train,
                    batch_size=64, epochs=25, validation_split=0.1,
                    callbacks = [early_stopping],shuffle=True)

In [None]:
test_preds = model.predict([user_id_test, item_id_test, item_meta_test])
print("Final test MSE: %0.3f" % mean_squared_error(test_preds, rating_test))
print("Final test MAE: %0.3f" % mean_absolute_error(test_preds, rating_test))