# News Feed Recommender System using ALS

This notebook demonstrates collaborative filtering approach to recommend item to users based on implicit data, ie. users' behaviour to items without rating or specific action such as like or dislike. In this example, our items are news feed and the implicit data are whether a user has read on a news feed. For illustration purpose, we have 

We implement the Alternating Least Squares algorithm using Implicit library, which has very efficient performance on large dataset. Implicit also has built in functions for recommendations and similar items.



In [1]:
import pandas as pd
import numpy as np
import implicit
import scipy.sparse as sparse
from time import time

## 1. Load and Prepare Data 

First, let us load the dataset. The dataset has 1M rows of user-news interaction. Read=1 indicates that user has read the news feed. There are around 110k users and 30k items.

Implicit expects data as a item-user matrix, thus we create two matricies, one for fitting the model (item-user) and one for recommendations (user-item).

In [3]:
# load data
raw_data = pd.read_csv("data/data.csv")
raw_data.columns = ['user','news','read']

# drop nan columns
data = raw_data.dropna()

# create numeric user_id and news_id
data['user'] = data['user'].astype("category")
data['news'] = data['news'].astype("category")
data['user_id'] = data['user'].cat.codes
data['news_id'] = data['news'].cat.codes

# create item-user and user-item sparse matrices
sparse_item_user = sparse.csr_matrix((data['read'].astype(float), (data['news_id'], data['user_id'])))
sparse_user_item = sparse.csr_matrix((data['read'].astype(float), (data['user_id'], data['news_id'])))

print('Dimensions of user-item matrix: ', sparse_user_item.shape)
data[['user_id','news_id','read']].head()

Dimensions of user-item matrix:  (125801, 39956)


Unnamed: 0,user_id,news_id,read
0,11085,37758,1
1,89374,37197,1
2,1989,37768,1
3,10535,37755,1
4,55284,37528,1


We also create item_lookup to help lookup for news headlines, so that we can intreprete the results later.

In [13]:
# create item_lookup for easier lookup for news headlines
item_lookup = data[['news_id', 'news']].drop_duplicates()

headlines = pd.read_csv("data/headlines.csv")
headlines['news_id'] = [item_lookup[item_lookup.news==x].news_id.iloc[0] for x in headlines.news]

## 2. Train ALS Model

The ALS model takes a few parameters.
* factor: number of latent features we want to have
* regularization: regularization parameter
* iterations: number of times alternating between fixing and updating user and item vectors in the ALS algorithm
* alpha_val: The rate in which we'll increase our confidence in a preference with more interactions

In [7]:
# initiate the ALS model using sparse item-user matrix
model = implicit.als.AlternatingLeastSquares(factors=20, regularization=0.1, iterations=20)
alpha_val = 40
data_conf = (sparse_item_user * alpha_val).astype('double')

# fit the model
start_time = time()
model.fit(data_conf)
print("--- %s seconds ---" % (time() - start_time))

100%|██████████| 20.0/20 [00:03<00:00,  5.91it/s]

--- 3.4426791667938232 seconds ---





## 3. Find Similar Items

After we have trained the ALS model, we can make recommendations. First, let us find news feed similar to the news headline 'formula one cars arrive in melbourne'.  Implicit has built in functions for similar items. It gets the similarity by takeing the dot product of the item vectors with the item vector of a specific news feed.

In [18]:
# find top 5 most similar news
item_id = 37758
n_similar = 5

print('Searching for news feed similar to: ', headlines[headlines.news_id==item_id].headline.iloc[0])
similar = model.similar_items(item_id, n_similar)

print('\nnews_id | score | headline')
for item in similar:
    idx, score = item
    if (idx in headlines.news_id.values):
        print(idx, format(score,'.2f'), headlines[headlines.news_id==idx].headline.iloc[0])
    else:
        print(idx, format(score,'.2f'), '[HEADLINES NOT FOUND]')
#         print('Debug msg :', item_lookup[item_lookup.news_id==idx].news.iloc[0])

Searching for news feed similar to:  formula one cars arrive in melbourne

news_id | score | headline
37758 1.00 formula one cars arrive in melbourne
37770 0.92 formula one too expensive schumacher
37726 0.91 iceman is formula ones hot property
33877 0.88 [HEADLINES NOT FOUND]
37750 0.88 formula one heats up


### 4. Make Recommendation to Users

To recommend news feeds to users, we can also use built in function for recommendation.

In [33]:
user_id = 1

# get users' reading history
consumed_idx = sparse_user_item[user_id,:].nonzero()[1].astype(str)
print('Reading History of User', user_id)
print( pd.DataFrame({'news_id': consumed_idx}))

# make recommendation
recommended = model.recommend(user_id, sparse_user_item)

items = []
scores = []
for item in recommended:
    idx, score = item
    items.append(idx)
    scores.append(score)

recommendations = pd.DataFrame({'news_id': items, 'score': scores})
print('\nRecommendation to User', user_id)
print(recommendations)

Reading History of User 1
   news_id
0     8288
1    12237
2    15921
3    16109
4    35636
5    36819
6    37038
7    37768
8    38052
9    38130
10   38277
11   38338
12   38522

Recommendation to User 1
   news_id     score
0    38199  1.091851
1    38311  1.047194
2    38118  1.016983
3    37232  0.870030
4    39555  0.817655
5    38549  0.810417
6    36818  0.806976
7    38152  0.790003
8    38540  0.783468
9    38180  0.750013
