# Neural News Recommendation with Multi-Head Self-Attention (NRMS)

Here we shown the implementation in *PyTorch* of the paper Neural News Recommendation with Multi-Head Self-Attention [(Wu  & al., 2020)](https://www.aclweb.org/anthology/D19-1671.pdf).

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Neural-News-Recommendation-with-Multi-Head-Self-Attention-(NRMS)" data-toc-modified-id="Neural-News-Recommendation-with-Multi-Head-Self-Attention-(NRMS)-1">Neural News Recommendation with Multi-Head Self-Attention (NRMS)</a></span></li><li><span><a href="#Dataset-Description" data-toc-modified-id="Dataset-Description-2">Dataset Description</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#behaviors.tsv" data-toc-modified-id="behaviors.tsv-2.0.1">behaviors.tsv</a></span></li><li><span><a href="#news.tsv" data-toc-modified-id="news.tsv-2.0.2">news.tsv</a></span></li></ul></li></ul></li><li><span><a href="#1.-Import-Packages-and-setting-some-Sytle-Guidelines" data-toc-modified-id="1.-Import-Packages-and-setting-some-Sytle-Guidelines-3">1. Import Packages and setting some Sytle Guidelines</a></span><ul class="toc-item"><li><span><a href="#1.1-Import-Packages" data-toc-modified-id="1.1-Import-Packages-3.1">1.1 Import Packages</a></span></li><li><span><a href="#1.2-General-Requirements:-Customized-Python-modules" data-toc-modified-id="1.2-General-Requirements:-Customized-Python-modules-3.2">1.2 General Requirements: Customized Python modules</a></span></li><li><span><a href="#1.3--Let's-setup-some-style!" data-toc-modified-id="1.3--Let's-setup-some-style!-3.3">1.3  Let's setup some style!</a></span></li></ul></li><li><span><a href="#2.-Neural-News-Recommendation-with-Multi-Head-Self-Attention-(NRMS)" data-toc-modified-id="2.-Neural-News-Recommendation-with-Multi-Head-Self-Attention-(NRMS)-4">2. Neural News Recommendation with Multi-Head Self-Attention (NRMS)</a></span><ul class="toc-item"><li><span><a href="#2.1-Description" data-toc-modified-id="2.1-Description-4.1">2.1 Description</a></span></li></ul></li><li><span><a href="#2.2-Analysis-at-Inference-Time-for-NRMS-Model" data-toc-modified-id="2.2-Analysis-at-Inference-Time-for-NRMS-Model-5">2.2 Analysis at Inference Time for <code>NRMS Model</code></a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#2.2.1-Analysis-at-Inference-Time:-Example-1" data-toc-modified-id="2.2.1-Analysis-at-Inference-Time:-Example-1-5.0.1">2.2.1 Analysis at Inference Time: <code>Example 1</code></a></span></li><li><span><a href="#2.2.2-Analysis-at-Inference-Time:-Example-2" data-toc-modified-id="2.2.2-Analysis-at-Inference-Time:-Example-2-5.0.2">2.2.2 Analysis at Inference Time: <code>Example 2</code></a></span></li><li><span><a href="#2.2.3-Analysis-at-Inference-Time:-Now-try-your-self" data-toc-modified-id="2.2.3-Analysis-at-Inference-Time:-Now-try-your-self-5.0.3">2.2.3 Analysis at Inference Time: <code>Now try your-self</code></a></span></li></ul></li></ul></li><li><span><a href="#References" data-toc-modified-id="References-6">References</a></span></li></ul></div>

# Dataset Description

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.

MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID. For more detailed information about the MIND dataset, you can refer to the paper MIND: A Large-scale Dataset for News Recommendation.[(Wu  & al., 2020)](https://msnews.github.io/assets/doc/ACL2020_MIND.pdf)

### behaviors.tsv

The behaviors.tsv file contains the impression logs and users’ news click hostories. It has 5 columns divided by the tab symbol:

* Impression ID. The ID of an impression.
* User ID. The anonymous ID of a user.
* Time. The impression time with format “MM/DD/YYYY HH:MM:SS AM/PM”.
* History. The news click history (ID list of clicked news) of this user before this impression.
* Impressions. List of news displayed in this impression and user’s click behaviors on them (1 for click and 0 for non-click).

An example is shown in the table below:

Column | Content
------------- | -------------
Impression ID | 91
User ID | U397059
Time | 11/15/2019 10:22:32 AM
History | N106403 N71977 N97080 N102132 N97212 N121652
Impressions | N129416-0 N26703-1 N120089-1 N53018-0 N89764-0 N91737-0 N29160-0
 
### news.tsv

The docs.tsv file contains the detailed information of news articles involved in the behaviors.tsv file.
It has 7 columns, which are divided by the tab symbol:

* News ID 
* Category 
* SubCategory
* Title
* Abstract
* URL
* Title Entities (entities contained in the title of this news)
* Abstract Entities (entites contained in the abstract of this news)

The full content body of MSN news articles are not made available for download, due to licensing structure. However, for your convenience, we have provided a [utility script](https://github.com/msnews/MIND/tree/master/crawler) to help parse news webpage from the MSN URLs in the dataset. Due to time limitation, some URLs are expired and cannot be accessed successfully. Currently, we are tring our best to solve this problem.

An example is shown in the following table:

Column | Content
------------- | -------------
News ID | N37378
Category | sports
SubCategory | golf
Title | PGA Tour winners
Abstract | A gallery of recent winners on the PGA Tour.
URL | https://www.msn.com/en-us/sports/golf/pga-tour-winners/ss-AAjnQjj?ocid=chopendata
Title Entities | [{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [0], "SurfaceForms": ["PGA Tour"]}]	
Abstract Entites | [{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [35], "SurfaceForms": ["PGA Tour"]}]



# 1. Import Packages and setting some Sytle Guidelines

Here is just some stuff we are gonna need!!

## 1.1 Import Packages

In [1]:
# Data Science Packages
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from  matplotlib import pyplot
import seaborn as sns
import warnings
import csv
import os
from tqdm import tqdm
# PyTorch Packages
import torch
from torchtext import data, datasets, vocab

from io import open
import unicodedata
import string
import re
import random

## 1.2 General Requirements: Customized Python modules

The code in notebooks tends to grow and grow to the point of being incomprehensible. To overcome this problem, the only way is to extract parts of it into Python modules once in a while. Since it only makes sense to extract functions and classes into Python modules, I often start cleaning up a messy notebook by thinking about the actual task a group of cells is accomplishing. This helps me to refactor those cells into a proper function which I can then migrate into a Python module.

In [2]:
# Custom Package reloading
%load_ext autoreload
%autoreload 2
#sys.path.append("../newsrecom/")

In [3]:
from newsrecom import *

## 1.3  Let's setup some style!

This is not a requirement but anyway you know that they say:
> "Fashions fade, style is eternal." <br/>
_Yves Saint Laurent_

In [4]:
# Seaborn Style
sns.set(style='ticks')
sns.set_style({'font.family': 'Hiragino Maru Gothic Pro'})
sns.set_palette("cool")

# Pandas Style
pd.set_option("display.max_column", 9999)
pd.set_option("display.max_row", 9999)
pd.set_option("display.max_colwidth", 250)

# Ignore annoying warning 
warnings.filterwarnings('ignore')

# 2. Neural News Recommendation with Multi-Head Self-Attention (NRMS) 

Here we shown the implementation in *PyTorch* of the paper Neural News Recommendation with Multi-Head Self-Attention [(Wu  & al., 2020)](https://www.aclweb.org/anthology/D19-1671.pdf).

## 2.1 Description

News recommendation can help users find in- terested news and alleviate information over- load. Precisely modeling news and users is critical for news recommendation, and capturing the contexts of words and news is important to learn news and user representations. The paper, proposea a neural news recommendation approach with multi-head self- attention (NRMS). The core of the approach is a news encoder and a user encoder. In the news encoder, it is used multi-head self-attentions to learn news representations from news titles by modeling the interactions between words. In the user encoder, it is learned the representations of users from their browsed news and use multihead self-attention to capture the relatedness between the news. Besides, it is appled additive attention to learn more informative news and user representations by selecting important words and news.

This approch is motivated by several observations. First, the interactions between words in news title are important for understanding the news. For example, in Fig. 1, the word “Rockets” has strong relatedness with “Bulls”. Besides, a word may in- teract with multiple words, e.g., “Rockets” also has semantic interactions with “trade”. Second, different news articles browsed by the same user may also have relatedness. For example, in Fig. 1 the second news is related to the first and the third news. Third, different words may have different importance in representing news. In Fig. 1, the word “NBA” is more informative than “2018”. Besides, different news articles browsed by the same user may also have different importance in representing this user. For example, the first three news articles are more informative than the last one.

In [5]:
import ipyplot
ipyplot.plot_images(['imgs/nrms1.png'], ['Fig. 1'], img_width=550)

In the paper, they propose a neural news recommendation approach with multi-head self- attention (NRMS). The core of our approach is a news encoder and a user encoder. In the news encoder, we learn news representations from news titles by using multi-head self-attention to model the interactions between words. In the user encoder, They learn representations of users from their browsing by using multi-head self-attention to capture their relatedness. Besides, we apply additive attentions to both news and user encoders to select important words and news to learn more informative news and user representations. Extensive experiments on a realworld dataset show that our approach can effectively and efficiently improve the performance of news recommendation.

The NRMS approach for news recommendation is shown in Fig. 2. It contains three modules, i.e.:

* news encoder 
* user encoder
* click predictor.

In [6]:
import ipyplot
ipyplot.plot_images(['imgs/nrms2.png'], ['Fig. 2'], img_width=550)

# 2.2 Analysis at Inference Time for `NRMS Model`

We have trained a NRMS model using the following hyperparamters:

* Max Title lenght: 10
* Number of Multi-Head Attention: 10
* Dimesion of pretrain GloVe Word Vectors: 100
* Negative Sampling K: 4
* Maximun number of historical News seen by the user: 50
* Vocabulary Size : 40000

Here we are goin to show at inference time how the model ranked 50 random candidate news. Let's start by loading the model

In [10]:
from newsrecom.inference import Model
from newsrecom.inference import get_inference_analysis
nrms = Model.load_from_checkpoint('../models/ranger/v1/epoch=14-auroc=0.71.ckpt')

>**Now how to evaluate the performance?**

During training time we get comparable performance with respect to what is was reported on the paper. But a fun way to see the model working is by trying to imagine ourselves the type of user based on the news previously seen by the user and create possible hypotheses such as The user seems to be interested in Hollywood news and sports but not in Political news. Therefore, from a pool of candidate news based on our intuitions as humans, we will expect that political news will be rank lower than sports news for example.

### 2.2.1 Analysis at Inference Time: `Example 1`

Let's see the news previously seen by the user. What we can guess about the user?

In [11]:
item= 3
result, val, news_ids_reorder, news_reorder, news_viewed = get_inference_analysis(nrms,item)
pd.DataFrame([ ' '.join(word for word in sent) for sent in news_viewed], columns=['News Viewed'])

Unnamed: 0,News Viewed
0,what causes vertigo ? 15 things doctors wish you knew
1,"dow stock crash 1929 : october marks 90 years , and could it fall again ?"
2,jeff goodman 's college basketball coaches on the hot seat
3,hong kong protesters are burning lebron james jerseys
4,ryan reynolds shares the first photo of his and blake lively 's newborn daughter
5,former cowboys pro bowler marion barber arrested on criminal mischief charges
6,"after throwing a punch , dabo swinney made cb andrew booth ride the manager bus back to clemson"
7,freshman georgia southern offensive lineman jordan wiggins dies at 18
8,local news anchor discovers she has rare form of cancer caused by pregnancy : ' it 's very complicated '
9,why the patriots made a very un - patriots trade for mohamed sanu


Here we can see that this user is insterested in some political news (eg. '12 things u.s . presidents have to pay for on their own'), relationship tips (eg.'50 secrets it 's ok to keep from your partner' ), health (eg. 'what causes vertigo ? 15 things doctors wish you knew), but over all sports. Therefore, we will expect that the top ranked candidate news should be related with sports.

In [12]:
df_cadidates=pd.DataFrame(data={'Probability':val,'Candidate News':[ ' '.join(word for word in sent) for sent in news_reorder]})

Let's see the top ranked candidate news.

In [13]:
df_cadidates.head(5)

Unnamed: 0,Probability,Candidate News
0,0.951134,browns qb baker mayfield unhappy with ' ridiculous ' fine for criticizing referees
1,0.949839,lapd changing controversial program that uses data to predict where crimes will occur
2,0.947032,"chiefs tuesday injury report : sammy watkins remains "" limited """
3,0.946968,what is the kansas eliminate revision of census population amendment ?
4,0.943013,source : us - china trade deal signing could be delayed until december


It seems great. The top ranked news are realted with sports!. 

But now, what about the lowest ranked news?

In [14]:
df_cadidates.tail(5)

Unnamed: 0,Probability,Candidate News
195,0.53227,police and protester share an emotional embrace in chile
196,0.525045,carly simon describes her deathbed farewell to jackie kennedy
197,0.491012,how a military family honors the memory of wife 's fallen first husband
198,0.481187,10 funny christmas gifts to make your friends giggle
199,0.467843,fhp : 7-year - old critically injured in buckman bridge crash


Well, it does not seem to match to much with the user type!

### 2.2.2 Analysis at Inference Time: `Example 2`

What about the user. Wele ce can see that several news watched by the user were related with Hollywood, food, health and travel. Therefore, we can infere that the user problably is not to interested in sports, politics and economy.

In [15]:
item=84
result, val, news_ids_reorder, news_reorder, news_viewed = get_inference_analysis(nrms,item)
pd.DataFrame([ ' '.join(word for word in sent) for sent in news_viewed], columns=['News Viewed'])

Unnamed: 0,News Viewed
0,secrets to making perfect cookies
1,jennifer lawrence hired a food truck for her wedding and the owner had no idea who she was
2,are delta 's new skymiles membership perks worth the $ 59 annual fee ?
3,the royal tiaras are out in full force at emperor naruhito 's enthronement celebration
4,nickelodeon universe -- the largest indoor theme park in north america -- opens this week
5,"nurses reveal 10 things they wish they could tell patients , but ca n't"
6,"four flight attendants were arrested in miami 's airport after bringing in thousands in cash , police say"
7,world 's best restaurants in 2019 according to tripadvisor users
8,"ever wake up to a numb , dead arm ? here 's what 's happening ."
9,american dream nickelodeon universe giant indoor theme park sells out for first weekend


In [16]:
df_cadidates=pd.DataFrame(data={'Probability':val,'Candidate News':[ ' '.join(word for word in sent) for sent in news_reorder]})

Noew between the news hier rankr we had health and destinations

In [17]:
df_cadidates.head(5)

Unnamed: 0,Probability,Candidate News
0,0.968273,"listen to conference - goers at trump resort chant for "" war ! """
1,0.952856,gastonia food pantry needs your help to buy refrigerated truck
2,0.93643,"if prince harry was n't so nice , kate 's engagement ring could have been meghan 's"
3,0.933255,how a 229-year - old log church inspires new earthquake - resistant high - rise buildings
4,0.92386,your salt shaker may soon come with a warning label


And we have sport news low rated. Cool!

In [18]:
df_cadidates.tail(5)

Unnamed: 0,Probability,Candidate News
195,0.245331,"video : tyrann mathieu , lesean mccoy speak to the media"
196,0.244164,celebrity mug shots
197,0.227857,seahawks claim former patriots wr gordon off waivers
198,0.22614,"detroit lions keep darius slay , make no trade deadline moves"
199,0.212098,ricky gervais hosting the 2020 golden globes : his best burns !


### 2.2.3 Analysis at Inference Time: `Now try your-self`

Pick a number and try to guess the the  user type. What he likes or dislikes.

In [22]:
item= 44
result, val, news_ids_reorder, news_reorder, news_viewed = get_inference_analysis(nrms,item)
pd.DataFrame([ ' '.join(word for word in sent) for sent in news_viewed], columns=['News Viewed'])

Unnamed: 0,News Viewed
0,milania giudice says she 'll see dad joe ' soon ' after he leaves ice custody
1,bananas for constipation : do they work ?
2,jeff bezos : this is the ' smartest thing we ever did ' at amazon
3,"ways to lose weight : 36 fast , easy tips"
4,photo of emotional nurse after ' particularly hard day ' goes viral
5,here 's exactly how to eat a vegetarian keto diet
6,"dad thought he dressed toddler daughter in a hat , but he was hilariously wrong"


In [23]:
df_cadidates=pd.DataFrame(data={'Probability':val,'Candidate News':[ ' '.join(word for word in sent) for sent in news_reorder]})

Check your predictions... Does It makes sense?

In [24]:
df_cadidates.head(5)

Unnamed: 0,Probability,Candidate News
0,0.983803,queen elizabeth 's favorite beauty products have stood the test of time
1,0.969546,butler co. woman arrested for allegedly having sex with 15-year - old boy
2,0.96907,kate middleton & prince william just took george & charlotte to a soccer match
3,0.96649,"man allegedly killed wife , 5-year - old daughter in manhattan double murder - suicide"
4,0.965922,these christmas dinner menus will create an unforgettable meal


In [25]:
df_cadidates.tail(5)

Unnamed: 0,Probability,Candidate News
195,0.181717,norway arrests us white supremacist ahead of far - right conference
196,0.171239,"trump campaign scoops up biden 's latino voter web address , trolls his voter outreach"
197,0.168538,"1 dead , 2 injured , suspect still at large after shooting at church 's chicken"
198,0.155085,report : rockets were shocked at how poorly carmelo anthony performed in their defensive scheme
199,0.129373,the stock market 's 10-year run became the best bull market ever this month


# References
\[1\] Wu et al. "Neural News Recommendation with Multi-Head Self-Attention." in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<br>