<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# NRMS: Neural News Recommendation with Multi-Head Self-Attention
NRMS \[1\] is a neural news recommendation approach with multi-head selfattention. The core of NRMS is a news encoder and a user encoder. In the newsencoder, a multi-head self-attentions is used to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multihead self-attention to capture the relatedness between the news. Besides, we apply additive
attention to learn more informative news and user representations by selecting important words and news.

## Properties of NRMS:
- NRMS is a content-based neural news recommendation approach.
- It uses multi-self attention to learn news representations by modeling the iteractions between words and learn user representations by capturing the relationship between user browsed news.
- NRMS uses additive attentions to learn informative news and user representations by selecting important words and news.

## Data format:

### train data
One simple example: <br>

`1 0 0 0 0 Impression:0 User:2903 CandidateNews0:27006,11901,21668,9856,16156,21390,1741,2003,16983,8164 CandidateNews1:8377,10423,9960,5485,20494,7553,1251,17232,4745,9178 CandidateNews2:1607,26414,25830,16156,15337,16461,4004,6230,17841,10704 CandidateNews3:17323,20324,27855,16156,2934,14673,551,0,0,0 CandidateNews4:7172,3596,25442,21596,26195,4745,17988,16461,1741,76 ClickedNews0:11362,8205,22501,9349,12911,20324,1238,11362,26422,19185 ...`
<br>

In general, each line in data file represents one positive instance and n negative instances in a same impression. The format is like: <br>

`[label0] ... [labeln] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] ... [CandidateNewsn:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`

<br>

It contains several parts seperated by space, i.e. label part, Impression part `<impresison id>`, User part `<user id>`, CandidateNews part, ClickedHistory part. CandidateNews part describes the target news article we are going to score in this instance, it is represented by (aligned) title words. To take a quick example, a news title may be : `Trump to deliver State of the Union address next week` , then the title words value may be `CandidateNewsi:34,45,334,23,12,987,3456,111,456,432`. <br>
ClickedNewsk describe the k-th news article the user ever clicked and the format is the same as candidate news. Words are aligned in news title. We use a fixed length to describe an article, if the title is less than the fixed length, just pad it with zeros.

### test data
One simple example: <br>
`1 Impression:0 User:6446 CandidateNews0:18707,23848,13490,10948,21385,11606,1251,16591,827,28081 ClickedNews0:27838,7376,16567,28518,119,21248,7598,9349,20324,9349 ClickedNews1:7969,9783,1741,2549,27104,14669,14777,21343,7667,20324 ...`
<br>

In general, each line in data file represents one instance. The format is like: <br>

`[label] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`
<br>

## Global settings and imports

In [1]:
import sys
sys.path.append("../../")
from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources, prepare_hparams 
from reco_utils.recommender.deeprec.models.nrms import NRMSModel
from reco_utils.recommender.deeprec.IO.news_iterator import NewsIterator
import papermill as pm
from tempfile import TemporaryDirectory
import tensorflow as tf
import os

print("System version: {}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))

# tmpdir = TemporaryDirectory()
data_path = '../../tests/resources/nrms'

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


System version: 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0]
Tensorflow version: 1.12.0


## Download and load data

In [2]:
#data_path = tmpdir.name
yaml_file = os.path.join(data_path, r'nrms.yaml')
train_file = os.path.join(data_path, r'train.txt')
valid_file = os.path.join(data_path, r'test.txt')
wordEmb_file = os.path.join(data_path, r'embedding.npy')
if not os.path.exists(yaml_file):
    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'nrms.zip')

## Create hyper-parameters

In [3]:
epoch=10

In [4]:
hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epoch)
print(hparams)

[('DNN_FIELD_NUM', None), ('EARLY_STOP', 100), ('FEATURE_COUNT', None), ('FIELD_COUNT', None), ('L', None), ('MODEL_DIR', None), ('PAIR_NUM', None), ('SUMMARIES_DIR', None), ('T', None), ('activation', None), ('att_fcn_layer_sizes', None), ('attention_activation', None), ('attention_dropout', 0.0), ('attention_hidden_dim', 200), ('attention_layer_sizes', None), ('attention_size', None), ('batch_size', 64), ('body_size', None), ('cate_embedding_dim', None), ('cate_vocab', None), ('cnn_activation', None), ('cross_activation', 'identity'), ('cross_l1', 0.0), ('cross_l2', 0.0), ('cross_layer_sizes', None), ('cross_layers', None), ('data_format', 'news'), ('dense_activation', None), ('dim', None), ('doc_size', 10), ('dropout', [0.2]), ('dtype', 32), ('embed_l1', 0.0), ('embed_l2', 0.0), ('embedding_dropout', 0.3), ('enable_BN', False), ('entityEmb_file', None), ('entity_dim', None), ('entity_embedding_method', None), ('entity_size', None), ('epochs', 10), ('fast_CIN_d', 0), ('filter_num', 2

In [5]:
iterator = NewsIterator

## Train the NRMS model

In [44]:
model = NRMSModel(hparams, iterator)

In [7]:
print(model.run_eval(valid_file))

{'group_auc': 0.5, 'mean_mrr': 0.0662, 'ndcg@5': 0.0436, 'ndcg@10': 0.0769}


In [41]:
model.fit(train_file, valid_file)

at epoch 1
train info: logloss loss:-59113.94261767694
eval info: group_auc:0.4997, mean_mrr:0.0662, ndcg@10:0.0769, ndcg@5:0.0436
at epoch 1 , train time: 10.3 eval time: 8.7
at epoch 2
train info: logloss loss:-757542.4017857143
eval info: group_auc:0.4997, mean_mrr:0.0662, ndcg@10:0.0769, ndcg@5:0.0436
at epoch 2 , train time: 10.2 eval time: 8.8


KeyboardInterrupt: 

In [48]:
load_sess = model.sess
for batch_data_input in model.train_iterator.load_data_from_file(train_file):
    batch_data_input[model.layer_keeps] = model.keep_prob_test
    batch_data_input[model.is_train_stage] = False

    step_pred, step_labels, tp = load_sess.run(
        [model.train_pred, model.train_iterator.labels, model.train_title_repr], feed_dict=batch_data_input
    )
    step_imprids = batch_data_input[model.train_iterator.impression_index_batch]
    break

In [49]:
step_pred

array([[0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5, 0.5],
       [0.5, 0

In [50]:
p, u, t= load_sess.run([model.train_pred, model.train_user_repr, model.train_title_repr], feed_dict=batch_data_input)

In [53]:
t

array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

  

In [45]:
load_sess = model.sess
for batch_data_input in model.test_iterator.load_data_from_file(valid_file):
    batch_data_input[model.layer_keeps] = model.keep_prob_test
    batch_data_input[model.is_train_stage] = False

    step_pred, step_labels = load_sess.run(
        [model.test_pred, model.test_iterator.labels], feed_dict=batch_data_input
    )
    step_imprids = batch_data_input[model.test_iterator.impression_index_batch]
    break

In [46]:
step_labels

array([[1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]], dtype=float32)

In [47]:
step_pred

array([[0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5]], dtype=float32)

In [14]:
step_pred

array([[0.5095516 , 0.42863166, 0.5400511 , 0.43195063, 0.4581615 ],
       [0.46077135, 0.3924935 , 0.47708964, 0.48315147, 0.43532807],
       [0.60601646, 0.43371066, 0.19490904, 0.12504771, 0.44610018],
       [0.46701747, 0.4490704 , 0.4490704 , 0.4490704 , 0.4490704 ],
       [0.24380839, 0.19506973, 0.07937005, 0.12532127, 0.09117012],
       [0.40702802, 0.39447728, 0.48193341, 0.34525892, 0.4715624 ],
       [0.5341571 , 0.3536765 , 0.25641757, 0.0541916 , 0.28734046],
       [0.15748194, 0.18015526, 0.32798606, 0.21549517, 0.18771045],
       [0.22869192, 0.17324495, 0.18322118, 0.17340857, 0.23869453],
       [0.45808038, 0.48978978, 0.2356459 , 0.24774429, 0.2815473 ],
       [0.5279312 , 0.10949214, 0.3189701 , 0.74459726, 0.04726722],
       [0.37202296, 0.29995906, 0.0730842 , 0.3266416 , 0.16354729],
       [0.18839952, 0.22919318, 0.21449931, 0.09221247, 0.27051148],
       [0.334669  , 0.15474617, 0.3601978 , 0.42837706, 0.38611224],
       [0.53603286, 0.08429644, 0.

In [11]:
step_labels.shape

(64, 5)

In [13]:
step_imprids.shape

(64, 1)

In [9]:
print(model.run_eval(valid_file))

{'group_auc': 0.5267, 'mean_mrr': 0.1674, 'ndcg@5': 0.1608, 'ndcg@10': 0.219}


## Reference
\[1\] Wu et al. "Neural News Recommendation with Multi-Head Self-Attention." in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<br>