## NRMS: Neural News Recommendation with Multi-Head Self-Attention
NRMS [1] is a neural news recommendation approach with multi-head selfattention. The core of NRMS is a news encoder and a user encoder. In the newsencoder, a multi-head self-attentions is used to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multihead self-attention to capture the relatedness between the news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news.

### Properties of NRMS:
NRMS is a content-based neural news recommendation approach.
It uses multi-self attention to learn news representations by modeling the iteractions between words and learn user representations by capturing the relationship between user browsed news.
NRMS uses additive attentions to learn informative news and user representations by selecting important words and news.

### Set data

Make sure you are under `LightRec/`. Run `mkdir data`.
TODO...After you download data and unzip data...
The data folder should look like
```
data/
    train
    valid
    utils
```
Note that we are using the `small` version of the [MIND](https://msnews.github.io/) dataset.

### Import lightrec
`lightrec.model`
* `lightrec.model.zoo`, store the recommender models here
* `lightrec.model.zoo`, helper functions from model training
`lightrec.data`
* `lightrec.data`, access to specific dataset and its iterator


In [2]:
from lightrec.model import NRMS
from lightrec.model.training import timer, params, cal_metric
from lightrec.data import MindIterator
from lightrec.data.tools import set_seed
from torch import optim
from tqdm import tqdm
import torch
import numpy as np

set_seed(2020)

In [4]:
param = params(for_model="nrms",
                   file="./data/utils/nrms.yaml",
                   wordDict_file="./data/utils/word_dict_all.pkl",
                   vertDict_file="./data/utils/vert_dict.pkl",
                   subvertDict_file="./data/utils/subvert_dict.pkl",
                   userDict_file="./data/utils/uid2index.pkl",
                   wordEmb_file="./data/utils/embedding_all.npy")
device = torch.device('cuda') if torch.cuda.is_available() else torch.device("cpu")

print(param)
print(device)

---------------------  --------------------------------------
attention_hidden_dim   200
batch_size             32
data_format            news
dropout                0.2
epochs                 10
head_dim               20
head_num               20
his_size               50
learning_rate          0.0001
loss                   cross_entropy_loss
metrics                ['group_auc', 'mean_mrr', 'ndcg@5;10']
model_type             nrms
npratio                4
optimizer              adam
show_step              100000
subvertDict_file       ./data/utils/subvert_dict.pkl
support_quick_scoring  True
title_size             30
userDict_file          ./data/utils/uid2index.pkl
vertDict_file          ./data/utils/vert_dict.pkl
wordDict_file          ./data/utils/word_dict_all.pkl
wordEmb_file           ./data/utils/embedding_all.npy
word_emb_dim           300
---------------------  --------------------------------------
cpu


### Set up model and data

In [5]:
model = NRMS(param).to(device)

#### data for training

In [6]:
news = "./data/train/news.tsv"
user = "./data/train/behaviors.tsv"
iterator = MindIterator(param)
iterator.open(news, user)

#### data for testing

In [None]:
news = "./data/valid/news.tsv"
user = "./data/valid/behaviors.tsv"
test_iterator = MindIterator(param)
test_iterator.open(news, user)

### Define evaluate function

In [2]:
def evaluate(model, test_iterator):
    model.eval()
    critical_size = 150
    label_bag = model.offer_label_bag()
    nrms_bag = model.offer_data_bag()
    nrms_bag.append('user index')
    group = {}
    with torch.no_grad():
        preds = {}
        labels = {}
        for bag in tqdm(
                test_iterator.batch(data_bag=nrms_bag, test=True,
                                    size=250)):
            truth = bag[label_bag].squeeze()
            pred = model(bag, scale=True,
                            by_user=True).cpu().numpy().squeeze()
            for i, tag in enumerate(bag['user index']):
                if preds.get(tag, None):
                    preds[tag].append(pred[i])
                else:
                    preds[tag] = [pred[i]]

                if labels.get(tag, None):
                    labels[tag].append(truth[i])
                else:
                    labels[tag] = [truth[i]]
                    assert truth[i] == 1
            del bag
            # print(labels)
        group_pred = []
        group_label = []
        names = list(preds)
        for name in names:
            group_pred.append(np.asarray(preds[name]))
            group_label.append(np.asarray(labels[name]))
    return cal_metric(group_label, group_pred, metrics=param.metrics)


### Data bag
To tell which parts of data are needed by NRMS

In [8]:
label_bag = model.offer_label_bag()
nrms_bag = model.offer_data_bag()

### One last step

In [9]:
opt = optim.Adam(model.parameters(), lr=param.learning_rate)

## Start training

In [11]:
for epoch in range(param.epochs):
    model = model.train()
    with timer(name="epoch"):
        count, loss_epoch = 1, 0.
        bar = tqdm(iterator.batch(data_bag=nrms_bag))
        start_loss = None
        for bag in bar:
            pred = model(bag, by_user=True)
            truth = bag[label_bag]
            # print(pred.shape)
            # print(truth.shape, pred.shape)
            loss = model.loss(pred, truth)
            opt.zero_grad()
            loss.backward()
            opt.step()
            if start_loss is None:
                start_loss = loss.item()
            loss_epoch += loss.item()
            bar.set_description(
                f"loss: {loss_epoch/count:.3f}/{start_loss:.3f}")
            count += 1
            del bag
            # print(f"    {loss_epoch/count}")
    print()
    loss_epoch /= count
    report = evaluate(model, test_iterator)
    print(f"[{epoch+1}/{param.epochs}]: {loss_epoch:.3f} - {report}")

#     It gonna take a long time

### Another way
Run `python -m lightrec.model._test`, you will step into the same training process

### TODO
1. `download` function
2. `annotation`