# Introduction to Recommendation Systems

In this tutorial we are going to use a [deep autoencoder](https://arxiv.org/abs/1708.01715) to perform collaborative filtering in the [Netflix dataset](https://netflixprize.com/). 

[Collaborative filtering](https://en.wikipedia.org/wiki/Collaborative_filtering) is one of the most pupular techniques in recommendation systems. It is based on inferring the missing entries in an `mxn` matrix, `R`, whose `(i, j)` entry describes the ratings given by the `ith` user to the `jth` item. The performance is then measured using Root
Mean Squared Error (RMSE).

<p align="center">
    <img src="https://upload.wikimedia.org/wikipedia/commons/5/52/Collaborative_filtering.gif" width=300px/>
</p>

The code in this tutorial is done with [PyTorch](http://pytorch.org/) and is based on [this repo](https://github.com/NVIDIA/DeepRecommender) by NVIDIA.

In [22]:
import sys
import os
import numpy as np
import pandas as pd
import torch
from utils import get_gpu_name, get_number_processors, get_gpu_memory, get_cuda_version
from parameters import *

print("OS: ", sys.platform)
print("Python: ", sys.version)
print("PyTorch: ", torch.__version__)
print("Numpy: ", np.__version__)
print("Number of CPU processors: ", get_number_processors())
print("GPU: ", get_gpu_name())
print("GPU memory: ", get_gpu_memory())
print("CUDA: ", get_cuda_version())

%matplotlib inline
%load_ext autoreload
%autoreload 2

OS:  linux
Python:  3.5.4 | packaged by conda-forge | (default, Nov  4 2017, 10:11:29) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
PyTorch:  0.3.0.post4
Numpy:  1.14.0
Number of CPU processors:  24
GPU:  ['Tesla M60', 'Tesla M60', 'Tesla M60', 'Tesla M60']
GPU memory:  ['8123 MiB', '8123 MiB', '8123 MiB', '8123 MiB']
CUDA:  CUDA Version 8.0.61
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Dataset: Netflix

This dataset was constructed to support participants in the [Netflix Prize](http://www.netflixprize.com). The movie rating files contain over 100 million ratings from 480 thousand randomly-chosen, anonymous Netflix customers over 17 thousand movie titles.  The data were collected between October, 1998 and December, 2005 and reflect the distribution of all ratings received during this period.  The ratings are on a scale from 1 to 5 (integral) stars.

The dataset can be [downloaded here](http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a). To uncompress it:

```bash
tar -xvf nf_prize_dataset.tar.gz
tar -xf download/training_set.tar
```

When we download the data, there are two important files:

1) The file `training_set.tar` is a tar of a directory containing 17770 files, one per movie.  The first line of each file contains the movie id followed by a colon.  Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID,Rating,Date
- MovieIDs range from 1 to 17770 sequentially.
- CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.
- Ratings are on a five star (integral) scale from 1 to 5.
- Dates have the format YYYY-MM-DD.

2) Movie information in `movie_titles.txt` is in the following format:

MovieID,YearOfRelease,Title

- MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.
- YearOfRelease can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.
- Title in English is the Netflix movie.

### Data prep

The first step is to covert the data to the correct format for the autoencoder to read. This can take between 1 to 2 hours.  

In [None]:
%%time
%run ./DeepRecommender/data_utils/netflix_data_convert.py $NF_PRIZE_DATASET $NF_DATA

The script splitted the data into train, test and validation set, creating files with three columns: `CustomerID,MovieID,Rating`. The data is splitted over time generating 4 datasets: Netflix 3months, Netflix 6 months, Netflix 1 year and Netflix full. Here there is a table with some details of each dataset:

| Dataset  | Netflix 3 months | Netflix 6 months | Netflix 1 year | Netflix full |
| -------- | ---------------- | ---------------- | ----------- |  ------------ |
| Ratings train | 13,675,402 | 29,179,009 | 41,451,832 | 98,074,901 |
| Users train | 311,315 |390,795  | 345,855 | 477,412 |
| Items train | 17,736 |17,757  | 16,907 | 17,768 |
| Time range train | 2005-09-01 to 2005-11-31 | 2005-06-01 to 2005-11-31 | 2004-06-01 to 2005-05-31 | 1999-12-01 to 2005-11-31
|  |  |  |   | |
| Ratings test | 2,082,559 | 2,175,535  | 3,888,684| 2,250,481 |
| Users test | 160,906 | 169,541  | 197,951| 173,482 |
| Items test | 17,261 | 17,290  | 16,506| 17,305 |
| Time range test | 2005-12-01 to 2005-12-31 | 2005-12-01 to 2005-12-31 | 2005-06-01 to 2005-06-31 | 2005-12-01 to 2005-12-31

Let's take a look at one of the files.

In [3]:
nf_3m_valid = os.path.join(NF_DATA, 'N3M_VALID', 'n3m.valid.txt')
df = pd.read_csv(nf_3m_valid, names=['CustomerID','MovieID','Rating'], sep='\t')
print(df.shape)
df.head()

(1041739, 3)


Unnamed: 0,CustomerID,MovieID,Rating
0,0,1549,1.0
1,0,5144,2.0
2,0,7716,3.0
3,0,8348,3.0
4,0,4635,2.0


In [4]:
nf_3m_test = os.path.join(NF_DATA, 'N3M_TEST', 'n3m.test.txt')
df2 = pd.read_csv(nf_3m_test, names=['CustomerID','MovieID','Rating'], sep='\t')
print(df2.shape)
df2.head()

(1040820, 3)


Unnamed: 0,CustomerID,MovieID,Rating
0,0,159,4.0
1,0,4830,1.0
2,0,1261,3.0
3,0,12058,3.0
4,0,13412,2.0


## Deep Autoencoder for Collaborative Filtering

Once we have the data, let's explain in some detail the model that we are going to use. The [model](https://arxiv.org/abs/1708.01715) developed by NVIDIA folks is a Deep autoencoder with 6 layers with non-linear activation function SELU (scaled exponential linear units), dropout and iterative dense refeeding.

An autoencoder is a network which implements two transformations: $encode(x) : R^n → R^d$ and $decoder(z) : R^d → R^n$. The “goal” of autoenoder is to obtain a $d$ dimensional representation of data such that an error measure between $x$ and $f(x) = decode(encode(x))$ is minimized. In the next figure, the autocoder architecture proposed in the [paper](https://arxiv.org/abs/1708.01715) is showed. Encoder has 2 layers $e_1$ and $e_2$ and decoder has 2 layers $d_1$ and $d_2$. Dropout may be applied to coding layer $z$. In the paper, the authors show experiments with different number of layers, from 2 to 12 (see Table 2 in the original paper).

<p align="center">
    <img src="./data/AutoEncoder.png" width=350px/>
</p>

During the forward pass the model takes a user representation by his vector of ratings from the training set $x \in R^n$, where $n$ is number of items. Note that $x$ is very sparse, while the output of the decoder, $y=f(x) \in R^n$ is dense and contains the rating predictions for all items in the corpus. The loss is the root mean squared error (RMSE).

One of the key ideas of the paper is dense re-feeding. Let's consider an idealized scenario with a perfect $f$. Then $f(x)_i = x_i ,∀i : x_i \ne 0$ and $f(x)_i$ accurately predicts all user's future ratings. This means that if a user rates a new item $k$ (thereby creating a new vector $x'$) then $f(x)_k = x'_k$ and $f(x) = f(x')$. Therefore, the authors refeed the input in the autoencoder to augment the dataset. The method consists of the following steps:

1. Given a sparse $x$, compute the forward pass to get $f(x)$ and the loss.

2. Backpropagate the loss and update the weights.

3. Treat $f(x)$ as a new example and compute $f(f(x))$

4. Compute a second backward pass.

Steps 3 and 4 can be repeated several times.

Finally, the authors explore different non-linear [activation functions](https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py). They found that on this task ELU, SELU and LRELU, which have non-zero negative parts, perform much better than SIGMOID, RELU, RELU6, and TANH.

In [20]:
%run ./DeepRecommender/run.py --gpu_ids $GPUS \
    --path_to_train_data $TRAIN \
    --path_to_eval_data $EVAL \
    --hidden_layers $HIDDEN \
    --non_linearity_type $ACTIVATION \
    --batch_size $BATCH_SIZE \
    --logdir $MODEL_OUTPUT_DIR \
    --drop_prob $DROPOUT \
    --optimizer $OPTIMIZER \
    --lr $LR \
    --weight_decay $WD \
    --aug_step $AUG_STEP \
    --num_epochs $EPOCHS 

Namespace(aug_step=1, batch_size=128, constrained=False, drop_prob=0.8, gpu_ids='0', hidden_layers='512,512,1024', logdir='model_save', lr=0.005, noise_prob=0.0, non_linearity_type='selu', num_epochs=10, optimizer='momentum', path_to_eval_data='Netflix/N3M_VALID', path_to_train_data='Netflix/N3M_TRAIN', skip_last_layer_nl=False, weight_decay=0.0)
Loading training data from Netflix/N3M_TRAIN
Data loaded
Total items found: 311315
Vector dim: 17736
Loading eval data from Netflix/N3M_VALID
******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************
Using GPUs: [0]
Doing epoch 0 of 10
Total epoch 

## Evaluation
Now we are going to evaluate the model on the test set and compute the final loss.

In [30]:
%run ./DeepRecommender/infer.py \
--path_to_train_data $TRAIN \
--path_to_eval_data $TEST \
--hidden_layers $HIDDEN \
--non_linearity_type $ACTIVATION \
--save_path  $MODEL_PATH \
--drop_prob $DROPOUT \
--predictions_path $INFER_OUTPUT

Namespace(constrained=False, drop_prob=0.8, hidden_layers='512,512,1024', non_linearity_type='selu', path_to_eval_data='Netflix/N3M_TEST', path_to_train_data='Netflix/N3M_TRAIN', predictions_path='preds.txt', save_path='model_save/model.epoch_9', skip_last_layer_nl=False)
Loading training data
Data loaded
Total items found: 311315
Vector dim: 17736
Loading eval data
******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************
Loading model from: model_save/model.epoch_9
Done: 0
Done: 10000
Done: 20000
Done: 30000
Done: 40000
Done: 50000
Done: 60000
Done: 70000
Done: 80000
Done: 90000
Done: 100

In [31]:
%run ./DeepRecommender/compute_RMSE.py --path_to_predictions=$INFER_OUTPUT

Namespace(path_to_predictions='preds.txt', round=False)
####################
RMSE: 0.9746437597050387
####################


Let's do something more real

In [9]:
titles = pd.read_csv(MOVIE_TITLES, names=['MovieID','Year','Title'], encoding = "latin")
titles.head()

Unnamed: 0,MovieID,Year,Title
0,1,2003.0,Dinosaur Planet
1,2,2004.0,Isle of Man TT 2004 Review
2,3,1997.0,Character
3,4,1994.0,Paula Abdul's Get Up & Dance
4,5,2004.0,The Rise and Fall of ECW


In [10]:
me_gusta = [13, 191, 209, 316, 345, 468, 560, 752, 1066, 1551, 1601, 1905, 2189, 2252, 5507] 
me_gusta_pred = [2452, 2532, 2689, 3012, 3287]
booo = [148, 270, 571, 559, 1195, 1400, 1387, 1947, 1962,  2282, 2457, 2803, 3106, 3414, 4308]
booo_pred = [4783, 4894, 5088, 5107, 5184]

In [11]:
df_me_gusta = titles[titles['MovieID'].isin(me_gusta)]
df_me_gusta

Unnamed: 0,MovieID,Year,Title
12,13,2003.0,Lord of the Rings: The Return of the King: Ext...
190,191,2003.0,X2: X-Men United
208,209,1996.0,Star Trek: Deep Space Nine: Season 5
315,316,1999.0,Futurama: Monster Robot Maniac Fun Collection
344,345,1998.0,Star Trek: Voyager: Season 5
467,468,2003.0,The Matrix: Revolutions
559,560,2003.0,Star Trek: Enterprise: Season 3
751,752,1993.0,Star Trek: The Next Generation: Season 7
1065,1066,1978.0,Superman: The Movie
1550,1551,2004.0,Spider-Man 2: Bonus Material


In [12]:
df_me_gusta_pred = titles[titles['MovieID'].isin(me_gusta_pred)]
df_me_gusta_pred

Unnamed: 0,MovieID,Year,Title
2451,2452,2001.0,Lord of the Rings: The Fellowship of the Ring
2531,2532,1999.0,Futurama: Vol. 1
2688,2689,2002.0,Minority Report: Bonus Material
3011,3012,2001.0,Dragon Ball Z: World Tournament
3286,3287,2003.0,Terminator 3: Rise of the Machines: Bonus Mate...


In [13]:
df_booo = titles[titles['MovieID'].isin(booo)]
df_booo

Unnamed: 0,MovieID,Year,Title
147,148,2001.0,Sweet November
269,270,2001.0,Sex and the City: Season 4
558,559,1940.0,Rebecca: Bonus Material
570,571,1999.0,American Beauty
1194,1195,1988.0,Madonna: The Girlie Show: Live Down Under
1386,1387,1999.0,The Girl Next Door
1399,1400,2000.0,Britney Spears: Britney in Hawaii: Live and More
1946,1947,2002.0,Gilmore Girls: Season 3
1961,1962,2004.0,50 First Dates
2281,2282,2005.0,Disney Princess Stories: Vol. 2: Tales of Frie...


In [14]:
df_booo_pred = titles[titles['MovieID'].isin(booo_pred)]
df_booo_pred

Unnamed: 0,MovieID,Year,Title
4782,4783,1999.0,Felicity: Season 2
4893,4894,1998.0,Celia Cruz: Fania Allstars in Africa
5087,5088,1987.0,Dirty Dancing: Bonus Material
5106,5107,2001.0,Barbra Streisand: Timeless: Live in Concert
5183,5184,2000.0,'N Sync: Making of the Tour


In [15]:
USER_MOVIES = './user/miguel.txt'
USER_PRED = 'miguel_pred.txt'
CUSTOMER_ID = 0 #500000-> gets key error

In [16]:
df_user = pd.DataFrame({'CustomerID':CUSTOMER_ID, 'MovieId':df_me_gusta['MovieID'], 'Ratings':5.0})
df_user = pd.concat([df_user, pd.DataFrame({'CustomerID':CUSTOMER_ID, 'MovieId':df_booo['MovieID'], 'Ratings':1.0})])
df_user_pred = pd.DataFrame({'CustomerID':CUSTOMER_ID, 'MovieId':df_me_gusta_pred['MovieID'], 'Ratings':5.0})
df_user_pred = pd.concat([df_user_pred, pd.DataFrame({'CustomerID':CUSTOMER_ID, 'MovieId':df_booo_pred['MovieID'], 'Ratings':1.0})])

In [17]:
df_user.to_csv(USER_MOVIES, sep='\t',header=False, index=False)
df_user_pred.to_csv(USER_PRED, sep='\t',header=False, index=False)

In [125]:
%run ./DeepRecommender/infer.py \
--path_to_train_data $TRAIN \
--path_to_eval_data ./Netflix/test1 \
--hidden_layers $HIDDEN \
--non_linearity_type $ACTIVATION \
--save_path  $MODEL_PATH \
--drop_prob $DROPOUT \
--predictions_path pred2.txt

Namespace(constrained=False, drop_prob=0.8, hidden_layers='512,512,1024', non_linearity_type='selu', path_to_eval_data='./Netflix/test1', path_to_train_data='Netflix/N3M_TRAIN', predictions_path='pred2.txt', save_path='model_save/model.epoch_9', skip_last_layer_nl=False)
Loading training data
Data loaded
Total items found: 311315
Vector dim: 17736
Loading eval data
******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************
Loading model from: model_save/model.epoch_9
Done: 0
Routine finished. Process time 54.71354603767395 s


In [126]:
%run ./DeepRecommender/infer.py \
--path_to_train_data $TRAIN \
--path_to_eval_data ./Netflix/test2 \
--hidden_layers $HIDDEN \
--non_linearity_type $ACTIVATION \
--save_path  $MODEL_PATH \
--drop_prob $DROPOUT \
--predictions_path pred_test2.txt

Namespace(constrained=False, drop_prob=0.8, hidden_layers='512,512,1024', non_linearity_type='selu', path_to_eval_data='./Netflix/test2', path_to_train_data='Netflix/N3M_TRAIN', predictions_path='pred_test2.txt', save_path='model_save/model.epoch_9', skip_last_layer_nl=False)
Loading training data
Data loaded
Total items found: 311315
Vector dim: 17736
Loading eval data
******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************
Loading model from: model_save/model.epoch_9
Done: 0
Routine finished. Process time 55.23368763923645 s


In [31]:
%run ./DeepRecommender/infer.py \
--path_to_train_data $TRAIN \
--path_to_eval_data ./user_short \
--hidden_layers $HIDDEN \
--non_linearity_type $ACTIVATION \
--save_path  $MODEL_PATH \
--drop_prob $DROPOUT \
--predictions_path pred_user3.txt

Namespace(constrained=False, drop_prob=0.8, hidden_layers='512,512,1024', non_linearity_type='selu', path_to_eval_data='./user_short', path_to_train_data='Netflix/N3M_TRAIN', predictions_path='pred_user3.txt', save_path='model_save/model.epoch_9', skip_last_layer_nl=False)
Loading training data
Data loaded
Total items found: 311315
Vector dim: 17736
Loading eval data
******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************
Loading model from: model_save/model.epoch_9
len= 17736
non_zeros  [4766, 9777, 15311]
13
191
209
Done: 0
Routine finished. Process time 57.151479721069336 s


In [18]:
  from DeepRecommender.reco_encoder.data import input_layer
  params = dict()
  params['batch_size'] = 1
  params['data_dir'] =  TRAIN
  params['major'] = 'users'
  params['itemIdInd'] = 1
  params['userIdInd'] = 0
  print("Loading training data")
  data_layer = input_layer.UserItemRecDataProvider(params=params)

Loading training data


In [20]:
print(data_layer.vector_dim)
params

17736


{'batch_size': 1,
 'data_dir': 'Netflix/N3M_TRAIN',
 'itemIdInd': 1,
 'major': 'users',
 'userIdInd': 0}

In [22]:
  import copy
  eval_params = copy.deepcopy(params)
  # must set eval batch size to 1 to make sure no examples are missed
  eval_params['batch_size'] = 1
  eval_params['data_dir'] = TEST
  eval_data_layer = input_layer.UserItemRecDataProvider(params=eval_params,
                                                        user_id_map=data_layer.userIdMap,
                                                        item_id_map=data_layer.itemIdMap)

In [23]:
print(eval_data_layer.vector_dim)
eval_params

17736


{'batch_size': 1,
 'data_dir': 'Netflix/N3M_TEST',
 'itemIdInd': 1,
 'major': 'users',
 'userIdInd': 0}

In [30]:
len(list(set(eval_data_layer.data.keys())))
ll = set()
for k,i in eval_data_layer.data.items():
    for movie, rating in i:
        ll.add(movie)
    
print(len(list(ll)))

16243


In [31]:
df_query = df_user.drop(['CustomerID'], axis=1).set_index('MovieId')
dict_query = df_query.to_dict()['Ratings']
dict_query

{13: 5.0,
 148: 1.0,
 191: 5.0,
 209: 5.0,
 270: 1.0,
 316: 5.0,
 345: 5.0,
 468: 5.0,
 559: 1.0,
 560: 5.0,
 571: 1.0,
 752: 5.0,
 1066: 5.0,
 1195: 1.0,
 1387: 1.0,
 1400: 1.0,
 1551: 5.0,
 1601: 5.0,
 1905: 5.0,
 1947: 1.0,
 1962: 1.0,
 2189: 5.0,
 2252: 5.0,
 2282: 1.0,
 2457: 1.0,
 2803: 1.0,
 3106: 1.0,
 3414: 1.0,
 4308: 1.0,
 5507: 5.0}

In [35]:
  from DeepRecommender.reco_encoder.data import input_layer_api
  params_api = dict()
  params_api['batch_size'] = 1
  params_api['data_dict'] =  dict_query
  params_api['major'] = 'users'
  params_api['itemIdInd'] = 1
  params_api['userIdInd'] = 0
  data_api = input_layer_api.UserItemRecDataProviderAPI(params=params_api,
                                                        user_id_map=data_layer.userIdMap,
                                                        item_id_map=data_layer.itemIdMap)


In [36]:
data_api.data

{0: [(1601, 5.0),
  (5507, 5.0),
  (2252, 5.0),
  (13, 5.0),
  (270, 1.0),
  (1551, 5.0),
  (2189, 5.0),
  (209, 5.0),
  (148, 1.0),
  (3414, 1.0),
  (2457, 1.0),
  (345, 5.0),
  (1947, 1.0),
  (2282, 1.0),
  (560, 5.0),
  (3106, 1.0),
  (4308, 1.0),
  (1066, 5.0),
  (1387, 1.0),
  (559, 1.0),
  (752, 5.0),
  (1905, 5.0),
  (2803, 1.0),
  (1195, 1.0),
  (1400, 1.0),
  (468, 5.0),
  (571, 1.0),
  (316, 5.0),
  (1962, 1.0),
  (191, 5.0)]}

In [37]:
data_api.vector_dim

17736

In [39]:
from DeepRecommender.reco_encoder.model import model
rencoder_api = model.AutoEncoder(layer_sizes=[data_layer.vector_dim] + [int(l) for l in HIDDEN.split(',')],
                               nl_type=ACTIVATION,
                               is_constrained=False,
                               dp_drop_prob=DROPOUT,
                               last_layer_activations=False)

******************************
******************************
[17736, 512, 512, 1024]
Dropout drop probability: 0.8
Encoder pass:
torch.Size([512, 17736])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([1024, 512])
torch.Size([1024])
Decoder pass:
torch.Size([512, 1024])
torch.Size([512])
torch.Size([512, 512])
torch.Size([512])
torch.Size([17736, 512])
torch.Size([17736])
******************************
******************************


In [40]:
import torch
import os
def load_model_weights(model_architecture, weights_path):
  if os.path.isfile(weights_path):
    print("Loading model from: {}".format(weights_path))
    model_architecture.load_state_dict(torch.load(weights_path))
  else:
    raise ValueError("Path not found {}".format(weights_path))

load_model_weights(rencoder_api, './model_save/model.epoch_9')

Loading model from: ./model_save/model.epoch_9


In [41]:
rencoder_api.state_dict()

OrderedDict([('encode_w.0', 
               1.3762e-02 -6.6482e-03  1.9167e-02  ...   4.8230e-03  1.1178e-02 -7.4413e-03
              -1.2196e-02 -2.8160e-03 -4.9494e-03  ...   1.3009e-02 -5.3862e-03  1.5944e-02
               6.3024e-03 -1.7319e-02  5.8461e-03  ...   7.4736e-04 -1.6535e-02 -9.1369e-03
                              ...                   ⋱                   ...                
               2.0446e-02  3.2055e-03 -3.0206e-02  ...   1.9669e-03 -6.5677e-03 -9.1297e-03
              -4.1579e-02  1.0100e-02 -2.2402e-03  ...   1.7774e-02  5.5699e-04  1.0243e-02
               8.0284e-03 -1.1262e-02 -2.2738e-02  ...   1.2861e-02  1.4607e-02  7.3368e-03
              [torch.FloatTensor of size 512x17736]), ('encode_w.1', 
               2.5071e-02  4.3264e-02  1.2727e-02  ...   8.6099e-02 -1.3217e-02  4.7283e-02
              -1.8742e-02 -1.6810e-02  6.8249e-02  ...   3.6553e-03  2.3814e-02 -2.4263e-02
               7.1702e-02  3.5000e-02  1.1755e-02  ...   7.2973e-02  6.89

In [42]:
  rencoder_api.eval()
  rencoder_api = rencoder_api.cuda()

In [19]:
    from torch.autograd import Variable 
    data_api.src_data = data_layer.data
    for i, ((out, src), majorInd) in enumerate(data_api.iterate_one_epoch_eval(for_inf=True)):
      inputs = Variable(src.cuda().to_dense())
      targets_np = out.to_dense().numpy()[0, :]
      outputs = rencoder_api(inputs).cpu().data.numpy()[0, :]
      non_zeros = targets_np.nonzero()[0].tolist()
      print(non_zeros)

NameError: name 'data_layer' is not defined

In [35]:
!curl -X POST -d '{"13": "5.0","191": "5.0", "209":"5.0"}' -H "Content-type: application/json" http://127.0.0.1:5000/recommend

{
  "10649": "4.1924996", 
  "12331": "3.3941908", 
  "5086": "3.1280162"
}


2