# Explicit matrix factorization based on Keras 2

The idea is based on embeddings, as explained by Jeremy Howard of [fast.ai](http://course.fast.ai/lessons/lesson4.html).

The main idea is to apply embeddings to a problem like user-movie ratings (content filtering), by using a matrix decomposition.  The particular method has to be able to deal with missing values in the matrix, so SVD or PCA cannot work here.

The code for the final application is taken from [3rd place winner](https://github.com/entron/entity-embedding-rossmann) of a similar [competition](https://www.kaggle.com/c/rossmann-store-sales/) on Kaggle.

This example code below deal with an example problem using a tiny matrix, just to get a handle on the options.

In [1]:
import pandas as pd
import numpy as np

%matplotlib inline
%config Completer.use_jedi = False


In [2]:
from keras.models import Sequential
from keras.models import Model as KerasModel
from keras.layers import Input, Dense, Activation, Reshape, Dropout, Dot
from keras.layers import Concatenate
from keras.layers.embeddings import Embedding
from keras.callbacks import ModelCheckpoint
from keras import optimizers


Using TensorFlow backend.


I use a sample matrix, with `0` representing a missing value.

In [34]:
ds = pd.DataFrame([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 0],
    [0, 1, 5, 4],
])
ds.replace(to_replace=0,value=np.nan,inplace=True)
ds

Unnamed: 0,0,1,2,3
0,5.0,3.0,,1.0
1,4.0,,,1.0
2,1.0,1.0,,5.0
3,1.0,,,
4,,1.0,5.0,4.0


Per Andrew Ng's suggestion, I could use the "mean normalization" step, but I will not do for now.

In [4]:
col_means=ds.mean(axis=1)
ds_demeaned=ds.subtract(col_means,axis=0)


In [35]:
ds_unstacked=ds.unstack().reset_index()
ds_unstacked.rename(axis=1,mapper={'level_0':'row','level_1':'column',0:'truth'},inplace=True)

train=ds_unstacked.dropna()
train

Unnamed: 0,row,column,truth
0,0,0,5.0
1,0,1,4.0
2,0,2,1.0
3,0,3,1.0
5,1,0,3.0
7,1,2,1.0
9,1,4,1.0
14,2,4,5.0
15,3,0,1.0
16,3,1,1.0


I will train 2-dimensional embeddings, meaning each level of row and column will be represented by 2 floating point numbers.  This is lower dimensionality than the `row` and `column` cardinality

In [36]:
n_embed = 2
n_levels=train.nunique()
n_levels

row       4
column    5
truth     4
dtype: int64

The format of the Keras model matches that of the Rossmann competition winner's, so I that I can use the weights later in a larger model.

Note: I want to add regularization to the `Embedding()` layers later.

In [37]:
input_0 = Input(shape=(1,))
output_0 = Embedding(n_levels.row, n_embed, name='embedding_0')(input_0)
output_0 = Reshape(target_shape=(n_embed,))(output_0)

input_1 = Input(shape=(1,))
output_1 = Embedding(n_levels.column, n_embed, name='embedding_1')(input_1)
output_1 = Reshape(target_shape=(n_embed,))(output_1)

This is the core of the model, the single `Dot()` layer from Keras.

In [39]:
output_model = Dot(axes=(1,1), normalize=False)([output_0,output_1])

model = KerasModel(inputs=[input_0,input_1], outputs=output_model)

This model expect a list of arrays as input, not sure if this format is very efficient, but don't know how to improve right now.

In [40]:
x=list(train.iloc[:,:2].values.T)
y=train.iloc[:,2]


Had to restart `model.fit()` several times to get a good result, usually final loss can go down to ~`0.01` with 2 dimensions.

In [43]:
sgd = optimizers.SGD(lr=0.4, decay=1e-6, momentum=0.5, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(x,y,epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x121552f28>

I can look at the weights, they don't seem to get excessively large without regularization right now.

In [48]:
model.get_layer('embedding_0').get_weights()[0]

array([[ 1.6158514 , -1.49063504],
       [ 1.04540849, -0.77206522],
       [ 2.73615122,  0.3235051 ],
       [ 1.55788863,  1.51632965]], dtype=float32)

In [49]:
model.get_layer('embedding_1').get_weights()[0]

array([[ 1.90138924, -1.29507613],
       [ 1.58038354, -0.96578097],
       [ 1.88554716,  1.35683239],
       [ 0.50729549, -0.11831354],
       [ 1.709705  ,  0.89370465]], dtype=float32)

I will predict and compare to the true values.  Note the predicted matrix does not have any missing, allowing us to achieve the goal of the competition.

In [52]:
ds_unstacked['predicted']=model.predict(list(ds_unstacked.iloc[:,:2].values.T))
ds_unstacked.predicted=ds_unstacked.predicted.apply(lambda x: round(x,2))
ds_unstacked.pivot_table(values=['predicted','truth'],index='column',columns='row')

Unnamed: 0_level_0,predicted,predicted,predicted,predicted,truth,truth,truth,truth
row,0,1,2,3,0,1,2,3
column,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,5.0,2.99,4.78,1.0,5.0,3.0,,1.0
1,3.99,2.4,4.01,1.0,4.0,,,1.0
2,1.02,0.92,5.6,4.99,1.0,1.0,,5.0
3,1.0,0.62,1.35,0.61,1.0,,,
4,1.43,1.1,4.97,4.02,,1.0,5.0,4.0
