# Recommenders 3 -- Sequence Recommenders (45m) 

## Goals of this practical:

- Understand the sequence recommendation framework (~5min)
- Load/Format dataset (~5min)
- Understand/train the prod2vec model (~10min)
- Evaluate (~10min)
- Visualize (~10min)
- Fiddle (~5min)



In [1]:
# !pip install gensim --upgrade



In [141]:
import pandas as pd
import numpy as np

# Sequence Recommenders:

> What will you click next ?

The sequence recommendation setting is a particular case of the implicit collaborative filtering setting. Given a sequence of items $i_0,i_1,...,i_n$ the goal is to predict the $i_{(n+1)},...$ items the user will consume. Playlist continuation is a neat use case of sequence recommenders. You've been listening to those songs, what can you listen to now ?


This setting differs from the classical collaborative filtering because the history is the recent trace and not the full saved interactions. Also, it's possible to do sequence recommendation without any specific latent user profile. 

#### Here we propose to explore this unpersonalized sequence recommandation

## Data used : [smallest movie-lens dataset](https://grouplens.org/datasets/movielens/)

Here we'll use the same data as before but instead of seeing $(user,item,rating)$ triplets or a $(user,item)$ interaction , we'll see item sequences: $user: [item, item,...]$

## Loading Data (same as before but in chronological order):

In [142]:
ratings = pd.read_csv("dataset/ratings.csv")
ratings = ratings.sort_values("timestamp",ascending=True)
print(ratings.iloc[0]["timestamp"] < ratings.iloc[-1]["timestamp"] ) # just checking 
print(ratings.shape)

True
(100836, 4)


In [186]:
print(len(set(ratings["userId"])))
print(len(set(ratings["movieId"])))

610
9724


In [143]:
ratings.head(5)

Unnamed: 0,userId,movieId,rating,timestamp
66719,429,595,5.0,828124615
66716,429,588,5.0,828124615
66717,429,590,5.0,828124615
66718,429,592,5.0,828124615
66712,429,432,3.0,828124615


In [144]:
# we also load titles and create an id2title dictionnary
titleCSV = pd.read_csv("dataset/movies.csv")
id2title = titleCSV[["movieId","title"]].set_index("movieId").to_dict()["title"]
id2title[1]

'Toy Story (1995)'

In [145]:
titleCSV.head(5)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## (a) Create sequence datasets:
For this task, we need sequences of items as data:

## (Todo): extract all movie sequences (in chronological order) from the dataset:


In this dataset, each user has seen at least 20 movies.


- We need to extract all movie rating sequences (there is one per user) from the dataset:

`sequence_of_movies = [[movieid,...],[movieid,...],...]`

In [187]:
sequences_of_movies = [i[1].tolist() for i in ratings.groupby(["userId"])["movieId"]]
len(sequences_of_movies)

610

## (Todo): Create a train/test dataset

Here, we propose as task to predict the last 5 items of each sequence.

In [147]:
train_seq,test_seq = [],[]

for seq in sequences_of_movies:
    train_seq.append(seq[:-5])
    test_seq.append(seq[-5:])
    
last_consumed_item = [seq[-1] for seq in train_seq] # We save the last consumed item for each list
                                                    # We'll use it as a starting point

## (Todo): Create the list of the most popular movies

- Here, popular is the number of times the movie appears in a list

In [148]:
from collections import Counter

counts = Counter( [id for list_id in train_seq for id in list_id ])
most_popular = np.array(counts.most_common())[:,0]
num_items = len(most_popular)
most_popular[:10]

array([ 356,  318,  296, 2571,  593,  260,  480,  110,  589,    1])

``` python
#Most popular looks like this:
[356,318,296,2571,593,260,480,110,589,...]
 ```

## Word2Vec skip-gram <=> Prod2Vec


### Word2Vec

The MAIN idea of word2vec is to maximise the similarity (dot product) between the vectors for words which appear close together (in the context of each other) in text, and minimise the similarity of words that do not. 

This can be applied to products instead of words: it clusters similar products together.


#### Paper Abstract:
> In recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of internet resources, determining an appropriate type of advertising for a given platform has become critical to financial success. Native advertisements, namely ads that are similar in look and feel to content, have had great success in news and social feeds. However, to date there has not been a winning formula for ads in e-mail clients. In this paper we describe a system that leverages user purchase history determined from e-mail receipts to deliver highly personalized product ads to Yahoo Mail users. We propose to use a novel neural language-based algorithm specifically tailored for delivering effective product recommendations, which was evaluated against baselines that included showing popular products and products predicted based on co-occurrence. We conducted rigorous offline testing using a large-scale product purchase data set, covering purchases of more than 29 million users from 172 e-commerce websites. Ads in the form of product recommendations were successfully tested on online traffic, where we observed a steady 9% lift in click-through rates over other ad formats in mail, as well as comparable lift in conversion rates. Following successful tests, the system was launched into production during the holiday season of 2014

[Prod2Vec Model](https://arxiv.org/abs/1606.07154)



## Gensim has the best python implementation of word2vec's algorithms:

We can just use these raw implementations. The only thing to do is to consider items as words:

In [149]:
import gensim
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

train_seq_str = [list(map(str,seq)) for seq in train_seq] # we just say that our items id's are strings..
    

# the following configuration is the default configuration
w2v = gensim.models.word2vec.Word2Vec(sentences=train_seq_str,
                                vector_size=50, window=10,               ### here we train a cbow model 
                                min_count=0,                      
                                sample=0.001, ns_exponent=0.75, workers=10,
                                sg=1, hs=0, negative=15,          ### set sg to 1 to train a sg model => Prod2Vec
                                cbow_mean=0,
                                epochs=50)

2022-02-23 15:21:55,337 : INFO : collecting all words and their counts
2022-02-23 15:21:55,340 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-02-23 15:21:55,376 : INFO : collected 9616 word types from a corpus of 97786 raw words and 610 sentences
2022-02-23 15:21:55,376 : INFO : Creating a fresh vocabulary
2022-02-23 15:21:55,423 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 retains 9616 unique words (100.0%% of original 9616, drops 0)', 'datetime': '2022-02-23T15:21:55.423200', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.22000-SP0', 'event': 'prepare_vocab'}
2022-02-23 15:21:55,424 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 leaves 97786 word corpus (100.0%% of original 97786, drops 0)', 'datetime': '2022-02-23T15:21:55.424145', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'pla

2022-02-23 15:21:59,008 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:21:59,043 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:21:59,049 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:21:59,082 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:21:59,083 : INFO : EPOCH - 6 : training on 97786 raw words (97654 effective words) took 0.6s, 158317 effective words/s
2022-02-23 15:21:59,556 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 15:21:59,556 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 15:21:59,586 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 15:21:59,599 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 15:21:59,606 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 15:21:59,615 : INFO : worker thread f

2022-02-23 15:22:03,584 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 15:22:03,597 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:22:03,599 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:22:03,622 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:22:03,625 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:22:03,625 : INFO : EPOCH - 14 : training on 97786 raw words (97634 effective words) took 0.6s, 171454 effective words/s
2022-02-23 15:22:04,079 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 15:22:04,133 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 15:22:04,141 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 15:22:04,151 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 15:22:04,170 : INFO : worker thread 

2022-02-23 15:22:08,134 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 15:22:08,138 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 15:22:08,156 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:22:08,164 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:22:08,167 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:22:08,176 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:22:08,176 : INFO : EPOCH - 22 : training on 97786 raw words (97643 effective words) took 0.6s, 173835 effective words/s
2022-02-23 15:22:08,618 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 15:22:08,657 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 15:22:08,676 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 15:22:08,686 : INFO : worker thread 

2022-02-23 15:22:12,489 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 15:22:12,557 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 15:22:12,574 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 15:22:12,581 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:22:12,591 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:22:12,597 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:22:12,599 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:22:12,600 : INFO : EPOCH - 30 : training on 97786 raw words (97646 effective words) took 0.6s, 175719 effective words/s
2022-02-23 15:22:13,020 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 15:22:13,060 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 15:22:13,083 : INFO : worker thread 

2022-02-23 15:22:17,035 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 15:22:17,059 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 15:22:17,116 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 15:22:17,118 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 15:22:17,123 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:22:17,128 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:22:17,131 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:22:17,140 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:22:17,140 : INFO : EPOCH - 38 : training on 97786 raw words (97657 effective words) took 0.6s, 156583 effective words/s
2022-02-23 15:22:17,650 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 15:22:17,664 : INFO : worker thread 

2022-02-23 15:22:21,684 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 15:22:21,697 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 15:22:21,722 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 15:22:21,723 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 15:22:21,726 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 15:22:21,733 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 15:22:21,736 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 15:22:21,738 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 15:22:21,748 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 15:22:21,748 : INFO : EPOCH - 46 : training on 97786 raw words (97640 effective words) took 0.5s, 179203 effective words/s
2022-02-23 15:22:22,225 : INFO : worker thread 

### A few things:

In [150]:
print(w2v.wv.vectors[0])              # The vector of index 0
print(w2v.wv.index_to_key[0])           # codes for the movieId 356  : index2word -> index_to_key
print(w2v.wv.key_to_index["356"] )      # Inverse mapping :  vocab -> key_to_index

[-0.27825177  0.05073303 -0.30350322  0.49897218 -0.43219474 -0.3556385
  0.28310165  0.5208147  -0.39110366 -0.52834207 -0.15466134 -0.75271803
 -0.26825935  0.42898378 -0.01368484  0.39651856  0.4827199  -0.19683474
 -0.3760517  -0.7158907   0.1067467   0.02812896  0.962306   -0.4048893
  0.18964584  0.17758189 -0.6085433   0.70119095 -0.13911028 -0.12139854
  0.09713151 -0.03532421  0.22046252  0.32695717 -0.23656611  0.05698903
 -0.13464738  0.4866654   0.07209812 -0.51379585  0.37555242 -0.17435944
 -0.38007522  0.5301092   0.05459659  0.22382212 -0.08039364 -0.28165382
  0.49696276  0.27662408]
356
0


## Getting similar items:

The heart of the algorithm is in the similar item search. As in word2vec, we simply use cosine distance between items to find "similar items"

### We can search by id's

In [151]:
def get_similar_ids(w2vmodel,iid,num=5):
    
    if str(iid) in w2vmodel.wv.key_to_index:
        return [int(iid) for iid,_ in w2vmodel.wv.most_similar(str(iid),topn=num)] 
    else:
        return []

get_similar_ids(w2v,last_consumed_item[0],num=5)

[26142, 66915, 27829, 26195, 228]

### Or by vector

In [152]:
w2v.wv[str(last_consumed_item[0])]

array([ 0.09873178,  1.0231442 , -0.07677509,  0.21820693,  0.19420545,
       -0.99418104, -0.11817683,  0.9886847 , -0.10364438,  0.721734  ,
        0.01292138, -0.22700045, -0.34948814,  0.4857323 ,  0.6686215 ,
       -0.7163373 ,  0.6335093 ,  0.41135943, -1.2004654 , -0.80307186,
        0.20577149, -0.49734357,  0.5995932 , -0.5055137 , -1.2243613 ,
        0.29595345,  0.12054698,  0.6420872 , -1.0629041 ,  0.18982516,
       -0.4757336 ,  0.66419184,  0.27802587,  1.2994084 , -0.28180864,
        0.30063114, -0.05716378, -0.06702942, -0.8282801 , -0.4500681 ,
        0.2749277 , -0.43798903, -0.35724545,  1.1578708 ,  1.0002444 ,
        0.04804122,  0.6924802 , -1.4123019 ,  0.0496449 , -0.03420306],
      dtype=float32)

In [153]:
def get_similar_vectors(w2vmodel,vec,num=5):
        return [int(iid) for iid,_ in w2vmodel.wv.most_similar(positive=[vec],topn=num)] 

get_similar_vectors(w2v,w2v.wv[str(last_consumed_item[0])],num=5) # items are strings

[157, 26142, 66915, 27829, 26195]

### Let's see if this works

We can query by id

In [154]:
ID = 1
NUM_SIM = 3

print("Movies similar to: ", id2title[ID])
print("")
for x in get_similar_ids(w2v,ID,NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Toy Story (1995)

-->  Forrest Gump (1994)
-->  Aladdin (1992)
-->  Toy Story 2 (1999)


We can also query by vector

**NOTE:** the 1st results can be the item(s) you've used to query

In [155]:
ID = 1
NUM_SIM = 4
print("Movies similar to: ", id2title[ID])
print("")
for x in get_similar_vectors(w2v,w2v.wv[str(ID)],NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Toy Story (1995)

-->  Toy Story (1995)
-->  Forrest Gump (1994)
-->  Aladdin (1992)
-->  Toy Story 2 (1999)


Our result was the following (for ID = 1 & NUM_SIM = 3)

>Movies similar to:  Toy Story (1995)
>-  Beauty and the Beast (1991)
-   Toy Story 2 (1999)
-   Lion King, The (1994)


Using vectors enables operations like additions to be made

In [156]:
ID1 = 2571
ID2 = 589
NUM_SIM = 10

vec = np.max([w2v.wv[str(ID1)],w2v.wv[str(ID2)]],axis=0)# + w2v.wv[str(318)]

print("Movies similar to: ", id2title[ID1] , "+",  id2title[ID2] )
print("")
for x in get_similar_vectors(w2v,vec ,NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Matrix, The (1999) + Terminator 2: Judgment Day (1991)

-->  Matrix, The (1999)
-->  Saving Private Ryan (1998)
-->  Terminator 2: Judgment Day (1991)
-->  Terminator, The (1984)
-->  Star Wars: Episode V - The Empire Strikes Back (1980)
-->  Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
-->  Seven (a.k.a. Se7en) (1995)
-->  Braveheart (1995)
-->  Fuzz (1972)
-->  Speed (1994)


Ok, we now have a good base for our sequence recommendation algorithm, let's write something to evaluate our predictions

## (Todo) write a `get_relevance_list(proposed_ids,real_ids)` function:

This function will be used to compare proposed items w/ real items:


- A relevant item is an item which is in the ground truth
- It returns a list which length is the number of proposed items filled of 0's and 1's : 0 means the item is not relevant, 1 means it's relevant.

- get_relevance_list([1,2,3,4],[1,4,5,6]) should returns [1,0,0,1]  because items 1 and 4 are relevant.


In [157]:
def get_relevance_list(proposed_ids,real_ids):
    real_ids = set(real_ids)
    return [1 if i in real_ids else 0 for i in proposed_ids]

get_relevance_list([1,2,3,4],[1,4,5,6]) #returns [1,0,0,1]

[1, 0, 0, 1]

### Let's test our function on our data

In [158]:
get_relevance_list(most_popular[:25],test_seq[1])

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [159]:
get_relevance_list(get_similar_ids(w2v,last_consumed_item[0],25),test_seq[0])

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## Ok, now, let's write prediction funtions:

- `predict_pop` will recommend the k's most popular items
- `predict_w2v` will recommend the k's most similar items to the last one consumed

#### (TODO) : complete those functions

In [185]:
def predict_pop(last_seen,k,popular=most_popular):
    return popular[:k]

def predict_w2v(last_seen,k,w2vmodel=w2v):
    return  get_similar_ids(w2vmodel,last_seen,num=k)

#data is list of last_consumed:
def get_predictions(predict_func,data,truth,k=5):
    if k == -1 or k == 0:
        k = num_items
    return [get_relevance_list(predict_func(last_seen,k),will_see) for last_seen,will_see in zip(data,truth)]

**Note**: The `get_predictions(...)` function returns the relevant list associated to predictions

### The following cells should return list of lists

In [177]:
last_consumed_item[:5]

[157, 80906, 688, 4641, 475]

In [178]:
test_seq[:5]

[[1445, 553, 2478, 2012, 2492],
 [89774, 1704, 122882, 114060, 80489],
 [3949, 2090, 527, 5048, 2424],
 [4273, 4381, 4741, 4896, 4246],
 [266, 534, 300, 247, 474]]

In [179]:
print(get_predictions(predict_pop,last_consumed_item[:5],test_seq[:5],3))
print(get_predictions(predict_w2v,last_consumed_item[:5],test_seq[:5],3))

157
80906
688
4641
475
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
157
80906
688
4641
475
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]


expected output: 
```
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]

```

## The return of the MRR and nDCG functions

In [160]:
test_list = [[0,0,1],[0,1,0],[1,0,0],[0,0,0]]

In [161]:
def rr(list_items):
    relevant_indexes = np.asarray(list_items).nonzero()[0]
    
    if len(relevant_indexes) > 0:
        return 1/(relevant_indexes[0]+1) # arrays are indexed from 0
    else:
        return 0

def mrr(list_list_items):
    return np.mean([rr(list_item) for list_item in list_list_items])

mrr(test_list) #0.4583333333333333

0.4583333333333333

In [162]:
# The dcg@k is the sum of the relevance, penalized gradually
def dcg_at_k(r, k):
    """Score is discounted cumulative gain (dcg)
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
        
    """
    r = np.asfarray(r)[:k]
    if r.size:
        return np.sum(r / np.log2(np.arange(2, r.size + 2)))
        
    return 0.

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# dcg_at_k(r, 1) => 3.0
# dcg_at_k(r, 2) => 4.2618595071429155
r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
print(dcg_at_k(r, 1))
print(dcg_at_k(r, 2))

3.0
4.2618595071429155


In [163]:
# And it's normalized version
def ndcg_at_k(r, k):
    """
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
    """
    dcg_max =  dcg_at_k(sorted(r)[::-1],k) 
    if not dcg_max:
        return 0.
    return dcg_at_k(r, k) / dcg_max

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# ndcg_at_k(r, 1) => 1.0
# ndcg_at_k(r, 4) => 0.794285
    
r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]    
ndcg_at_k(r, 4)

0.7942854176010882

In [164]:
def mean_dcg(rel_lists,k):
    return np.mean([dcg_at_k(rel_list,k) for rel_list in rel_lists])

def mean_ndcg(rel_lists,k):
    return np.mean([ndcg_at_k(rel_list,k) for rel_list in rel_lists])

## Let's see how this naïve way of predicting items to show works

In [165]:
pop_preds = get_predictions(predict_pop,last_consumed_item,test_seq,-1)
w2v_preds = get_predictions(predict_w2v,last_consumed_item,test_seq,-1)

print("1/MRR")
print(1/mrr(pop_preds))
print(1/mrr(w2v_preds))
print("")
print("DCG")
print(mean_dcg(pop_preds,5))
print(mean_dcg(w2v_preds,5))
print("")
print("nDCG")
print(mean_ndcg(pop_preds,5))
print(mean_ndcg(w2v_preds,5))

1/MRR
18.089018857528632
11.60834039203516

DCG
0.0507673354188168
0.08805951910386073

nDCG
0.01745016484461114
0.03017633559831179


### (TODO) Can we do better ?

Now, try a different strategy: 


- History should be discarded from prediction
- Instead of basing the prediction on the last seen item, we'll take all the `seen[-n:]` ones (horizon) into account
- To aggregate all items, we'll simply take the min rank to take into account the history offset.
- Equal scores can be handled using the history offset.
Example: 

> Let's say you chose to use the two last seen items `[item 44, Item 398]` to predict the following items

Therefore, using `get_similar_ids` method on both items will yield two lists of **similar** ranked item id's:
 - Similar to item 44: `[item 1, item 33, item 5]`
 - Similar to item 398: `[item 25, item 1, item 5]`
 scores (rank,offset):
 ```
 scores := {item 1: (0,0), item 33: (1,0), item 5: (2,0) , item 25: (0,1)}```
 
 
Then, aggregation by best rank should yield: `[1,25,33,5]`


array([], dtype=int64)

In [192]:
def predict_max_w2v(seen,k,horizon=2,w2vmodel=w2v):
    
   
    dico = dict()
    id_items = seen[-horizon:]

    for id in id_items :
        list_item = get_similar_ids(w2vmodel,id,num=k)
        
        for i,item in enumerate(list_item) :
            if item not in dico.keys() :
                dico[item] = i
    
    return [i[1] for i in sorted(dico.items(),key=lambda x : x[1])]

w2v_best_preds = get_predictions(predict_max_w2v,train_seq,test_seq,-1)



In [194]:
print(1/mrr(w2v_best_preds))
print(mean_dcg(w2v_best_preds,5))
print(mean_ndcg(w2v_best_preds,5))

73.72668807608646
0.00869195918810561
0.0029479666624677926


#### => Not really better

## Let's visualize learned embeddings

Just like in the 1st practical, we propose to visualize learnt items embeddings with the [Tensorflow projector](https://projector.tensorflow.org/).

In [195]:
# This function saves embeddings (a numpy array) and associated labels into tsv files.

def save_embeddings(embs,dict_label,path="saved_word_vectors"):
    """
    embs is Numpy.array(N,size)
    dict_label is {str(word)->int(idx)} or {int(idx)->str(word)}
    """
    def int_first(k,v):
        if type(k) == int:
            return (k,v)
        else:
            return (v,k)

    np.savetxt(f"{path}_vectors.tsv", embs, delimiter="\t")

    #labels 
    if dict_label:
        sorted_labs = np.array([lab for idx,lab in sorted([int_first(k,v) for k,v in dict_label.items()])])
        print(sorted_labs)
        with open(f"{path}_metadata.tsv","w") as metadata_file:
            for x in sorted_labs: #hack for space
                if len(x.strip()) == 0:
                    x = f"space-{len(x)}"
                    
                metadata_file.write(f"{x}\n")

In [197]:
vec2title = {i:id2title[int(mid)] for i,mid in enumerate(w2v.wv.index_to_key)}

In [198]:
save_embeddings(w2v.wv.vectors,vec2title)

['Forrest Gump (1994)' 'Shawshank Redemption, The (1994)'
 'Pulp Fiction (1994)' ...
 'Weekend (a.k.a. Le Week-end) (Week End) (1967)' 'Mischief (1985)'
 'Splinter (2008)']


## How to:

- Now, [open this link](https://projector.tensorflow.org/), and select "load".
- look for saved_word_vectors_vectors.tsv and saved_word_vectors_metadata.tsv. 

=> These are respectively, the items latent representations and their labels

## Hyperparameters matter when using Word2Vec for Item recommendation:


> Skip-gram with negative sampling, a popular variant of Word2vec originally designed and tuned to create word embeddings for Natural Language Processing, has been used to create item embeddings with successful applications in recommendation. While these fields do not share the same type of data, neither evaluate on the same tasks, recommendation applications tend to use the same already tuned hyperparameters values, even if optimal hyperparameters values are often known to be data and task dependent. We thus investigate the marginal importance of each hyperparameter in a recommendation setting through large hyperparameter grid searches on various datasets. Results reveal that optimizing neglected hyperparameters, namely negative sampling distribution, number of epochs, subsampling parameter and window-size, significantly improves performance on a recommendation task, and can increase it by an order of magnitude. Importantly, we find that optimal hyperparameters configurations for Natural Language Processing tasks and Recommendation tasks are noticeably different. 

[Hyperparameters matter](https://arxiv.org/abs/1804.04212)

#### It turns out that  hyperparameters are really important for this task: especially the sampling parameter.  Try and learn multiple models to see how the ns_exponent parameter modifies the results:


In [199]:
# the following configuration is the default configuration
w2v = gensim.models.word2vec.Word2Vec(sentences=train_seq_str,
                                vector_size=50, window=3,               ### here we train a cbow model 
                                min_count=0,                      
                                sample=0.001, ns_exponent=-0.4, workers=10,
                                sg=1, hs=0, negative=15,          ### set sg to 1 to train a sg model => Prod2Vec
                                cbow_mean=0,
                                epochs=25)



2022-02-23 19:10:07,099 : INFO : collecting all words and their counts
2022-02-23 19:10:07,101 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-02-23 19:10:07,120 : INFO : collected 9616 word types from a corpus of 97786 raw words and 610 sentences
2022-02-23 19:10:07,121 : INFO : Creating a fresh vocabulary
2022-02-23 19:10:07,159 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 retains 9616 unique words (100.0%% of original 9616, drops 0)', 'datetime': '2022-02-23T19:10:07.159296', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.22000-SP0', 'event': 'prepare_vocab'}
2022-02-23 19:10:07,161 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 leaves 97786 word corpus (100.0%% of original 97786, drops 0)', 'datetime': '2022-02-23T19:10:07.161248', 'gensim': '4.1.2', 'python': '3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]', 'pla

2022-02-23 19:10:08,669 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 19:10:08,679 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 19:10:08,679 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 19:10:08,683 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 19:10:08,684 : INFO : EPOCH - 6 : training on 97786 raw words (97646 effective words) took 0.2s, 447722 effective words/s
2022-02-23 19:10:08,832 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 19:10:08,850 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 19:10:08,851 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 19:10:08,861 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 19:10:08,863 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 19:10:08,880 : INFO : worker thread f

2022-02-23 19:10:10,352 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 19:10:10,356 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 19:10:10,358 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 19:10:10,361 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 19:10:10,372 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 19:10:10,373 : INFO : EPOCH - 14 : training on 97786 raw words (97668 effective words) took 0.2s, 463578 effective words/s
2022-02-23 19:10:10,562 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 19:10:10,567 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 19:10:10,571 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 19:10:10,575 : INFO : worker thread finished; awaiting finish of 6 more threads
2022-02-23 19:10:10,581 : INFO : worker thread 

2022-02-23 19:10:12,063 : INFO : worker thread finished; awaiting finish of 5 more threads
2022-02-23 19:10:12,083 : INFO : worker thread finished; awaiting finish of 4 more threads
2022-02-23 19:10:12,086 : INFO : worker thread finished; awaiting finish of 3 more threads
2022-02-23 19:10:12,089 : INFO : worker thread finished; awaiting finish of 2 more threads
2022-02-23 19:10:12,098 : INFO : worker thread finished; awaiting finish of 1 more threads
2022-02-23 19:10:12,111 : INFO : worker thread finished; awaiting finish of 0 more threads
2022-02-23 19:10:12,112 : INFO : EPOCH - 22 : training on 97786 raw words (97659 effective words) took 0.2s, 460233 effective words/s
2022-02-23 19:10:12,266 : INFO : worker thread finished; awaiting finish of 9 more threads
2022-02-23 19:10:12,285 : INFO : worker thread finished; awaiting finish of 8 more threads
2022-02-23 19:10:12,287 : INFO : worker thread finished; awaiting finish of 7 more threads
2022-02-23 19:10:12,288 : INFO : worker thread 

In [200]:
pop_preds = get_predictions(predict_pop,last_consumed_item,test_seq,-1)
w2v_preds = get_predictions(predict_w2v,last_consumed_item,test_seq,-1)

In [201]:
print(1/mrr(pop_preds))
print(1/mrr(w2v_preds))

print(mean_dcg(pop_preds,5))
print(mean_dcg(w2v_preds,5))

print(mean_ndcg(pop_preds,5))
print(mean_ndcg(w2v_preds,5))

18.089018857528632
11.744708220632175
0.0507673354188168
0.09221702700701417
0.01745016484461114
0.03162255943287431


## Still got time ? Try making a more clever item selection mechanism:

- You could, for example, cluster items in groups (using k-means) and propose the most popular items of the last seen group