## Setup Most Popular Recommender

For the Most popular Recommender we will need two main things:
* An **Index Map** to map an item_id into an index (e.g. 1, 2, 7, 45, etc.)
* The **Item Scores** to define which items are, in fact, the most popular ones

In this notebook we will set up those two. However, the actual recommendation happens in `most_popular.py` that will answer to the BentoML api when requested.


### Importing Libraries

In [1]:
import pandas as pd
import numpy as np
from most_popular import MostPopularRecommender
from preprocessing import preprocess, read_sample
import ast




### Acquire preprocessed Data

In [2]:
df = read_sample("/media/backup/datasets/yahoo/yahoo_dataset_clicked.csv", p=1)
df.head()

Unnamed: 0.1,Unnamed: 0,Timestamp,Clicked_Article,Click,User_Features,Article_List
0,7,1317513293,563938,1,[1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1...,[552077 555224 555528 559744 559855 560290 560...
1,13,1317513293,564335,1,[1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 0 1 1 1...,[552077 555224 555528 559744 559855 560290 560...
2,39,1317513295,564335,1,[1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1...,[552077 555224 555528 559744 559855 560290 560...
3,144,1317513299,565747,1,[1 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1...,[552077 555224 555528 559744 559855 560290 560...
4,176,1317513300,563115,1,[1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1...,[552077 555224 555528 559744 559855 560290 560...


In [3]:
import re
def literal_eval(element):
    if isinstance(element, str):
        return ast.literal_eval(re.sub('\s+',',',element))
    return element

df['User_Features'] = df['User_Features'].apply(literal_eval)
df['Article_List'] = df['Article_List'].apply(literal_eval)

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,Timestamp,Clicked_Article,Click,User_Features,Article_List
0,7,1317513293,563938,1,"[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, ...","[552077, 555224, 555528, 559744, 559855, 56029..."
1,13,1317513293,564335,1,"[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, ...","[552077, 555224, 555528, 559744, 559855, 56029..."
2,39,1317513295,564335,1,"[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, ...","[552077, 555224, 555528, 559744, 559855, 56029..."
3,144,1317513299,565747,1,"[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, ...","[552077, 555224, 555528, 559744, 559855, 56029..."
4,176,1317513300,563115,1,"[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, ...","[552077, 555224, 555528, 559744, 559855, 56029..."


### Index Map

First, we get all articles in a list

In [5]:
articles = df['Clicked_Article'].unique()

Then, we iterate over them creating a dictionary for the index map.

In [6]:
index_map = {}
idx = 1 # idx starts at 1 so that 0 is used for when the article is not found in the index map
for art in articles:
    index_map[art] = idx
    idx+=1
# index_map

{563938: 1,
 564335: 2,
 565747: 3,
 563115: 4,
 565533: 5,
 563787: 6,
 563582: 7,
 565589: 8,
 563643: 9,
 565648: 10,
 560518: 11,
 564418: 12,
 565364: 13,
 560290: 14,
 564604: 15,
 559744: 16,
 555224: 17,
 565561: 18,
 565822: 19,
 565515: 20,
 555528: 21,
 563846: 22,
 559855: 23,
 560620: 24,
 552077: 25,
 565479: 26,
 565930: 27,
 566013: 28,
 566022: 29,
 566092: 30,
 560805: 31,
 564371: 32,
 562265: 33,
 565980: 34,
 566431: 35,
 566439: 36,
 559833: 37,
 566541: 38,
 562374: 39,
 566587: 40,
 566478: 41,
 566573: 42,
 566602: 43,
 562637: 44,
 566631: 45,
 566689: 46,
 566726: 47,
 566825: 48,
 566838: 49,
 566767: 50,
 563204: 51,
 566997: 52,
 567110: 53,
 567145: 54,
 567169: 55,
 567334: 56,
 490956: 57,
 563819: 58,
 563642: 59,
 566888: 60,
 567079: 61,
 567654: 62,
 567768: 63,
 560591: 64,
 568030: 65,
 568045: 66,
 568217: 67,
 568362: 68,
 568439: 69,
 568479: 70,
 568445: 71,
 568271: 72,
 568524: 73,
 568610: 74,
 568538: 75,
 568470: 76,
 568669: 77,
 568734:

### Item Score

For each article, we count how many times it has been clicked.

In [7]:
popular = df.loc[(df['Click']==1)].groupby('Clicked_Article').size().sort_values(ascending=False)
popular.head(5)

Clicked_Article
566587    7128
579837    6862
579435    6060
567169    5509
595770    5509
dtype: int64

Now, using the Index Map, we associate each index with a value.

In [8]:
item_score = {0: -1} 
#since 0 is used for when the article was not found in the index map, here it'll have the lowest value
for art in articles:
    item_score[index_map[art]] = popular[art]

### Saving Dictionaries

In order to pass our **Index Map** and **Item Score** dictionaries to the model, we use BentoML. Thus, our recommender will load those dictionaries in order to make its recommendations.

The `pack()` function takes care of saving our dictionaries.

In [9]:
model = MostPopularRecommender()

In [10]:
model.pack("item_score", item_score)

<most_popular.MostPopularRecommender at 0x7f21f06ecfd0>

In [11]:
model.pack("index_map", index_map)

<most_popular.MostPopularRecommender at 0x7f21f06ecfd0>

After packing what our recommender will need, we can test it with a small sample

In [12]:
test_articles = [565648, 563115, 552077, 564335, 565589, 563938, 560290, 563643, 560620, 565822, 563787, 555528, 565364, 559855, 560518]

In [13]:
model.rank({'Timestamp': 123456789, 'Clicked_Article': 565822, 'Click': 1, 'User_Features': np.asarray([True,False,False,False,True]), 'Article_List': np.asarray(test_articles)})

[565822,
 563643,
 563115,
 565589,
 559855,
 565648,
 560290,
 555528,
 564335,
 560518,
 565364,
 563938,
 560620,
 552077,
 563787]

To check if the prediction is correct, we can do it ourselves:

First converting the ids into indexes

In [14]:
indexes = [index_map[art] for art in test_articles]
indexes

[10, 4, 25, 2, 8, 1, 14, 9, 24, 19, 6, 21, 13, 23, 11]

Then gather the scores for each index 

In [15]:
scores = [item_score[idx] for idx in indexes]
scores

[1161,
 2140,
 47,
 606,
 1782,
 164,
 1119,
 3528,
 143,
 3868,
 21,
 1014,
 457,
 1232,
 550]

And finally sort them by the score

In [16]:
sorted(zip(scores, test_articles),reverse=True)

[(3868, 565822),
 (3528, 563643),
 (2140, 563115),
 (1782, 565589),
 (1232, 559855),
 (1161, 565648),
 (1119, 560290),
 (1014, 555528),
 (606, 564335),
 (550, 560518),
 (457, 565364),
 (164, 563938),
 (143, 560620),
 (47, 552077),
 (21, 563787)]

In [17]:
model.save()

[2020-07-06 14:10:18,791] INFO - BentoService bundle 'MostPopularRecommender:1.0.20200706141003_DD900D' saved to: /home/marlesson/bentoml/repository/MostPopularRecommender/1.0.20200706141003_DD900D


'/home/marlesson/bentoml/repository/MostPopularRecommender/1.0.20200706141003_DD900D'