# DeepCTR's SVD

________________________

SVD is matrix factorization algorithm popularly used in recommedation system applications such as movie & product recommendation.

This factorization technique is available for training & testing on recommendation datasets through libraries such as surpriselib which analytically does the factorization & produces decomposed matrices.
Whereas DeepCTR packages several FM techniques implemented through their DNN equivalents. Here one DeepCTR's method DeepFM is utilised to realise the implementation equivalence of SVD; Since the SVD results are here obtained through underlying Deep Neural Net, therefore DeepCTR's SVD.

**The following notebook serves as Usage guide**
_______________________________________________
* The SVD module requires passing feature_column value (which are nothing but `SparseFeat` instances for each input sparse feature) to obtain a tensorflow model.
* Towards the end, the obatained model is evaluating against sample test values.

## Step 1. Load sample dataset as pandas dataframe
___________________________________

* List `sparse_features` & label encode input dataframe.
* Perform `train_test_split` to output training/test data and labels for model training.

In [1]:
import os
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from deepctr.models import DeepFM
from deepctr.inputs import SparseFeat


data_path = os.path.expanduser('u.data')
df= pd.read_csv(data_path, sep='\t',names= 'user_id,movie_id,rating,timestamp'.split(','))#, header=None)#used for DeepCTR

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
DeepCTR version 0.7.0 detected. Your version is 0.6.3.
Use `pip install -U deepctr` to upgrade.Changelog: https://github.com/shenweichen/DeepCTR/releases/tag/v0.7.0


* List **sparse features** from input dataframe
________________________________________________

In [2]:
sparse_features = ["user_id", "movie_id"]
y= ['rating']
print('feature names:',sparse_features, '\nlabel name:',y)

feature names: ['user_id', 'movie_id'] 
label name: ['rating']


 * Label encoding features of input dataframe
 __________________________________

In [3]:
for feat in sparse_features:
        lbe = LabelEncoder()
        df[feat] = lbe.fit_transform(df[feat])
        
df.head(3)

Unnamed: 0,user_id,movie_id,rating,timestamp
0,195,241,3,881250949
1,185,301,3,891717742
2,21,376,1,878887116


**Preparing training input data & target labels.**
_____________________________________________
* Training & test input data should be a list of numpy arrays of `user_ids` & `movie_ids`.
* Labels as numpy array of target values.

In [4]:
train, test = train_test_split(df, test_size=0.2)

train_model_input = [train[name].values for name in sparse_features]#includes values from only data[user_id], data[movie_id]
train_lbl = train[y].values

test_model_input = [test[name].values for name in sparse_features]
test_lbl = test[y].values

In [5]:
print('training data:\n', train_model_input, '\n\ntraining labels:\n', train_lbl)

training data:
 [array([415, 605, 739, ..., 845, 325, 515]), array([400, 759, 327, ..., 191, 674, 285])] 

training labels:
 [[2]
 [3]
 [3]
 ...
 [5]
 [4]
 [5]]


## Step 2. Obtain feature columns
________________________________________________
* Perform required data preparatory operations as described in DeepCtr docs (refer https://deepctr-doc.readthedocs.io/en/latest/Quick-Start.html).

* Defining **feature columns** as list of SparseFeat instances for each sparse feature, here -- `user_id`, `movie_id`, by passing in `feature_name`, `num_unique feature vals` as arguments.

In [6]:
feature_columns = [SparseFeat(feat, df[feat].nunique()) for feat in sparse_features]
feature_columns

[SparseFeat:user_id, SparseFeat:movie_id]

## Step 3. Import `SVD` from `mlsquare.layers.deepctr`
____________________________________________
* Instantiate the model.
* Train the model & evaluate results.

In [7]:
from mlsquare.layers.deepctr import SVD

Using TensorFlow backend.
2019-12-06 03:02:27,487	INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-12-06_03-02-27_10871/logs.
2019-12-06 03:02:27,598	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:44590 to respond...
2019-12-06 03:02:27,737	INFO services.py:363 -- Waiting for redis server at 127.0.0.1:20042 to respond...
2019-12-06 03:02:27,746	INFO services.py:760 -- Starting Redis shard with 20.0 GB max memory.
2019-12-06 03:02:27,793	INFO services.py:1384 -- Starting the Plasma object store with 1.0 GB memory using /dev/shm.


* Now Instantiate the model by passing in args-- `feature_columns` & `embedding_size`

In [8]:
model = SVD(feature_columns, embedding_size=100)
model.summary()

Instructions for updating:
Colocations handled automatically by placer.


Instructions for updating:
Colocations handled automatically by placer.


Instructions for updating:
keep_dims is deprecated, use keepdims instead


Instructions for updating:
keep_dims is deprecated, use keepdims instead


__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
user_id (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
movie_id (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
sparse_emb_user_id (Embedding)  (None, 1, 100)       94300       user_id[0][0]                    
__________________________________________________________________________________________________
sparse_emb_movie_id (Embedding) (None, 1, 100)       168200      movie_id[0][0]                   
__________________________________________________________________________________________________
no_mask (N

* Compile the model & fit on train data

In [9]:
model.compile("adam", "mse", metrics=['mse'] )
history = model.fit(train_model_input, train_lbl, batch_size=64, epochs=8, verbose=2, validation_split=0.2,)

Instructions for updating:
Use tf.cast instead.


Instructions for updating:
Use tf.cast instead.


Train on 64000 samples, validate on 16000 samples
Instructions for updating:
Use tf.cast instead.


Instructions for updating:
Use tf.cast instead.


Epoch 1/8
 - 3s - loss: 6.1732 - mean_squared_error: 6.1471 - val_loss: 1.5249 - val_mean_squared_error: 1.4720
Epoch 2/8
 - 3s - loss: 1.1949 - mean_squared_error: 1.1329 - val_loss: 1.1028 - val_mean_squared_error: 1.0346
Epoch 3/8
 - 3s - loss: 1.0318 - mean_squared_error: 0.9602 - val_loss: 1.0583 - val_mean_squared_error: 0.9840
Epoch 4/8
 - 3s - loss: 1.0028 - mean_squared_error: 0.9270 - val_loss: 1.0338 - val_mean_squared_error: 0.9565
Epoch 5/8
 - 3s - loss: 0.9831 - mean_squared_error: 0.9048 - val_loss: 1.0326 - val_mean_squared_error: 0.9534
Epoch 6/8
 - 3s - loss: 0.9621 - mean_squared_error: 0.8818 - val_loss: 1.0174 - val_mean_squared_error: 0.9364
Epoch 7/8
 - 3s - loss: 0.9319 - mean_squared_error: 0.8499 - val_loss: 1.0030 - val_mean_squared_error: 0.9197
Epoch 8/8
 - 3s - loss: 0.8938 - mean_squared_error: 0.8097 - val_loss: 0.9979 - val_mean_squared_error: 0.9128


* Evaluating model prediction on test data.

In [11]:
user_id = test_model_input[0][1]
item_id = test_model_input[1][1]
true_y= test[y].values[1]
print('For test user id: {} & item id : {} \nTrue rating: {} \nModel prediction is: {}'.format(user_id, item_id, true_y, model.predict(test_model_input)[1]))

For test user id: 822 & item id : 650 
True rating: [5] 
Model prediction is: [4.842552]
