# Recommendation systems with Deep neural network

### **2019/3/22 성남-KAIST 인공지능 집중교육과정**

***Tip> shotcuts for Jupyter Notebook***
* Shift + Enter : run cell and select below

#### Objective> Train deep neural network (autoencoder) to complete movie rating matrix
<img src="img/fig1.png" alt="fig1" width="700"/>
<center>Fig. 1 Item-based autoencoder </center>

- #### Loss function

$$L(M, \hat{M})=\sum_{(i,j)\in E}(M_{ij}-\hat{M}_{ij})^2 + \lambda\sum_{i=1}^{3}\lVert W_i\rVert^2_2$$
<br/>
- #### Update weight and bias
$$\underset{W, b}{\text{argmin}}\hspace{0.2em} L(M, \hat{M})$$

In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import time

ModuleNotFoundError: No module named 'tensorflow'

## 1. Prepare data
### MovieLens Dataset (<a href=https://grouplens.org/datasets/movielens/>ref.</a>)
We use "MovieLens Latest Datasets" consisting of 100,000 ratings applied to 9,000 movies by 600 users. Last updated 9/2018.
### Upload the data to Google server

In [None]:
from google.colab import files
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

### Fetch MovieLens data

In [None]:
rating = pd.read_csv('ratings.csv')
rating.head(10)

### Ratings statistics
Count the number of movies with identical rating.

In [None]:
rating.set_index(["userId", "timestamp","rating"]).count(level="rating").rename({'movieId': 'The number of movies'}, axis='columns')

Count the number of users and movies and check the sparsity

In [None]:
n_user = len(rating['userId'].unique())
n_movie = len(rating['movieId'].unique())
n_rating = len(rating['rating'])
print("[*] %d users & %d movies" % (n_user, n_movie))
print("[*] Sparsity: %.2f%%" % (n_rating / (n_user * n_movie) * 100))

### Movie list
See the movie list including movies' title and genres.

In [None]:
movielist = pd.read_csv('movies.csv')
movielist.head(10)

Drop **"timestamp"** which looks useless.

In [None]:
rating.drop(['timestamp'], axis=1, inplace=True)
rating.tail()

Scale **"movieId"** in between 0 and 9741, **"userId"** in between 0 and 609

In [None]:
rating['movieId'], _ = pd.factorize(rating['movieId'])
rating['userId'], _ = pd.factorize(rating['userId'])
rating.tail()

### Item-based autoencoder
Transpose the rating matrix

In [None]:
rating = rating[['movieId', 'userId', 'rating']]
rating.head()

### Split the ratings for training and test
Training : Test = 9 : 1

In [None]:
trainIdx = np.random.choice(range(n_rating), int(n_rating * 0.9), replace=False)
dataTrain = rating.iloc[trainIdx]

testIdx = np.setdiff1d(range(n_rating), trainIdx)
dataTest = rating.iloc[testIdx]

In [None]:
ratingTrain = np.asarray(dataTrain)
ratingTest = np.asarray(dataTest)
d1, d2 = np.max(ratingTrain[:, 0]) + 1, np.max(ratingTrain[:, 1] + 1)

## 2. Build a Graph
We use "tf.sparse_tensor_dense_matmul()" function instead of  "tf.layers.dense( )" function, because of the sparse input and regularization.

In [None]:
def autoencoder(_X, _units, _l2_lambda, _n_ratings):
    w_init = w_init = tf.contrib.layers.variance_scaling_initializer()
    b_init = tf.constant_initializer(0.)
    
    ## Encoder
    '1st Hidden layer'
    w1 = tf.get_variable('weight1', [d2, _units[0]], initializer=w_init)
    b1 = tf.get_variable('biases1', [_units[0]], initializer=b_init)
    h1 = tf.sparse_tensor_dense_matmul(_X, w1) + b1
    h1 = tf.nn.relu(h1)

    '2nd Hidden layer'
    w2 = tf.get_variable('weight2', [_units[0], _units[1]], initializer=w_init)
    b2 = tf.get_variable('biases2', [_units[1]], initializer=b_init)
    h2 = tf.matmul(h1, w2) + b2
    h2 = tf.nn.sigmoid(h2)
    
    ## Decoder
    w3 = tf.get_variable('weight3', [_units[1], d2], initializer=w_init)
    
    yhat = tf.matmul(h2, w3)
    out = tf.gather_nd(yhat, _X.indices)

    loss = tf.reduce_sum(tf.pow(out - _X.values, 2)) / _n_ratings
    
    
    ''' L2 regularization '''
    all_var = [var for var in tf.trainable_variables() ]
    l2_losses = []
    for var in all_var:
        if var.op.name.find('weight') == 0:
            l2_losses.append(tf.nn.l2_loss(var))
    
    losses = loss + _l2_lambda * tf.reduce_sum(l2_losses)
    
    return yhat, losses

### Set hyperparameters
- ***n_epochs*** : The number of epochs
- ***lr*** : Learning rate for gradient descent
- ***l2_lambda*** : regularization parameter
- ***n_units*** : The number of units for each hidden layer

In [None]:
"""parameters"""
n_epochs = 1000
lr = 0.1
l2_lambda = 0.003
n_units = [100, 50]
n_ratings = len(ratingTrain)
display_step = n_epochs / 10

### Placeholder for sparse input data

In [None]:
# tf Graph input
X = tf.sparse_placeholder(dtype=tf.float32)

### Use the GradientDescentOptimizer

In [None]:
pred, cost = autoencoder(X, n_units, l2_lambda, n_ratings)
global_step = tf.Variable(0, trainable=False)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(cost, global_step=global_step)

### Create a tensorflow session
Tensorflow operations must be executed in the session. The only one session is activated.

In [None]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

## 3. Training

In [None]:
print("START OPTIMIZATION\n")
start_time = time.time()
losses = []
for epoch in  range(n_epochs + 1):
    feed = {X: (ratingTrain[:, 0:2], ratingTrain[:, 2], [d1, d2])}
    _, avg_cost = sess.run((optimizer, cost), feed_dict = feed)
    losses.append(np.sqrt(avg_cost))

    # DISPLAY
    if epoch % display_step == 0:
        duration = float(time.time() - start_time)
        print(" [*] Epoch: %05d/%05d cost: %2e (duration: %.3fs)" % (epoch, n_epochs, np.sqrt(avg_cost), duration))
        start_time = time.time()
print("\nOptimization Finished!")

In [None]:
plt.plot(losses)
plt.title("Learning curve", fontsize=14, fontweight='bold')
plt.xlabel("Epochs", fontsize=14, fontweight='bold')
plt.ylabel("RMSE of training set", fontsize=14, fontweight='bold')
plt.show()

## 4. Test

In [None]:
feed = {X: (ratingTrain[:, 0:2], ratingTrain[:, 2], [d1, d2])}
Pred = sess.run(pred, feed_dict=feed)

idxTest = (ratingTest[:, 0].astype(int), ratingTest[:, 1].astype(int))
idxTrain = (ratingTrain[:, 0].astype(int), ratingTrain[:, 1].astype(int))

RMSE_Test = np.sqrt(np.sum((Pred[idxTest] - ratingTest[:, 2]) ** 2) / len(ratingTest[:, 0]))
RMSE_Train = np.sqrt(np.linalg.norm(Pred[idxTrain] - ratingTrain[:, 2]) ** 2 / len(ratingTrain[:, 0]))

print("[*] RMSE Test: %.4e" % RMSE_Test)
print("[*] RMSE Train %.4e" % RMSE_Train)

## Report</br>

### 1. Momentum Optimizer
Use the "MomentumOptimizer( )" instaed of the GradientDescentOptimizer and compare the RMSE learning curves of the two optimizers. When you use MomentumOptimizer, set the momuentum at 0.9 and adjust the learning rate.

### 2. Batch normalization
Apply "batch normalization" to the 1st and 2nd hidden layers, and compare the resulting RMSE learning curves with those obtained above.<br/>
*Hint)* tf.layers.batch_normalization( )