<a href="https://colab.research.google.com/github/kluo9/HM-personalized-fashion-recommendation/blob/main/HM_recall.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The purpose of recall stage is to reduce the number of items from about 100K to a few hundreds for next stage ranking. 
The goal is to include as few items as possible that are likely to be bought by the user in next week but not excluding any potential items.

The evaluation of the recall stage is 
1. Mean Average Precision @ 12
2. Precision: the number of items that were purchased / total number of items recalled
3. Item Recall Rate: the number of items that were purchased / total number of items user purchased
4. User Recall Rate: the number of users who purchased the item recalled / total number of users

The recall strategy:
1. popularity (time-weighted)
2. purchase history (up to 4 weeks)
3. related items to what the user recently purchased (items bought together)
4. popular items under user's attributes (age)
5. same section_name items
6. generate item embedding and user embedding, and find items by closest embedding to user embedding.


In [None]:
import numpy as np
import pandas as pd
import os
import glob
from tqdm import tqdm
import datetime
from scipy import stats
from collections import defaultdict
from collections import Counter

# Read data

In [None]:
! pip install -q kaggle
from google.colab import files

In [None]:
uploaded = files.upload() # upload kaggle token downloaded from kaggle personal account page 'kaggle.json'

Saving kaggle.json to kaggle.json


In [None]:
 ! mkdir ~/.kaggle
 ! cp kaggle.json ~/.kaggle/
 ! chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
! kaggle competitions download -c h-and-m-personalized-fashion-recommendations -f transactions_train.csv

Downloading transactions_train.csv.zip to /content
 99% 576M/584M [00:05<00:00, 132MB/s]
100% 584M/584M [00:05<00:00, 109MB/s]


In [None]:
transaction_df = pd.read_csv('/content/transactions_train.csv.zip')
transaction_df.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,663713001,0.050831,2
1,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,541518023,0.030492,2
2,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,505221004,0.015237,2
3,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687003,0.016932,2
4,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687004,0.016932,2


Keep 4 weeks as train and the last week as validation.

In [None]:
print("All Transactions Date Range: {} to {}".format(transaction_df['t_dat'].min(), transaction_df['t_dat'].max()))

transaction_df["t_dat"] = pd.to_datetime(transaction_df["t_dat"])
train1 = transaction_df.loc[(transaction_df["t_dat"] >= datetime.datetime(2020,9,8)) & (transaction_df['t_dat'] < datetime.datetime(2020,9,16))]
train2 = transaction_df.loc[(transaction_df["t_dat"] >= datetime.datetime(2020,9,1)) & (transaction_df['t_dat'] < datetime.datetime(2020,9,8))]
train3 = transaction_df.loc[(transaction_df["t_dat"] >= datetime.datetime(2020,8,23)) & (transaction_df['t_dat'] < datetime.datetime(2020,9,1))]
train4 = transaction_df.loc[(transaction_df["t_dat"] >= datetime.datetime(2020,8,15)) & (transaction_df['t_dat'] < datetime.datetime(2020,8,23))]

val = transaction_df.loc[transaction_df["t_dat"] >= datetime.datetime(2020,9,16)]

All Transactions Date Range: 2018-09-20 to 2020-09-22


In [None]:
del transaction_df

In [None]:
# List of all purchases per user (has repetitions)
positive_items_per_user1 = train1.groupby(['customer_id'])['article_id'].apply(list)
positive_items_per_user2 = train2.groupby(['customer_id'])['article_id'].apply(list)
positive_items_per_user3 = train3.groupby(['customer_id'])['article_id'].apply(list)
positive_items_per_user4 = train4.groupby(['customer_id'])['article_id'].apply(list)

In [None]:
train = pd.concat([train1, train2, train3, train4], axis=0)

# popularity (time-weighted)

Next we do time decay based popularity for items. This leads to items bought more recently having more weight in the popularity list. In simple words, item A bought 5 times on the first day of the train period is inferior than item B bought 4 times on the last day of the train period.

In [None]:
train['pop_factor'] = train['t_dat'].apply(lambda x: 1/(datetime.datetime(2020,9,16) - x).days)
train['pop_factor'].describe()

count    1.179208e+06
mean     1.182349e-01
std      1.629160e-01
min      3.125000e-02
25%      4.166667e-02
50%      6.250000e-02
75%      1.111111e-01
max      1.000000e+00
Name: pop_factor, dtype: float64

In [None]:
popular_items_group = train.groupby(['article_id'])['pop_factor'].sum()

_, popular_items = zip(*sorted(zip(popular_items_group, popular_items_group.keys()))[::-1])
popular_items = list(popular_items)

# purchase history (up to 4 weeks)

Find items that bought by each user in the past month

In [None]:
# List of all purchases per user 
items_per_user = train.groupby(['customer_id'])['article_id'].apply(list) 

In [None]:
def purchase_history(user, purchase_data_group):
  most_common_items_of_user = list({k:v for k, v in Counter(items_per_user[user]).most_common()}.keys())
  return most_common_items_of_user

In [None]:
purchase_history(items_per_user.keys()[2], items_per_user)

['719530003', '448509014']

# relative items to what the user recently purchased (items bought together)

Use the mlxtend library

In [None]:
! pip install -q mlxtend

create dataset with items purchased together in each transaction

In [None]:
items_per_user_transaction = train.groupby(['customer_id','t_dat'],group_keys=False)['article_id'].apply(list) 

only keep list with len>1

In [None]:
items_per_user_transaction_more = []
for l in items_per_user_transaction:
  if len(l) > 1:
    items_per_user_transaction_more.append(l)

In [None]:
del items_per_user_transaction

In [None]:
# one-hot encoding
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

# fit the TransactionEncoder
te = TransactionEncoder()
items_per_user_transaction_1hot = te.fit(items_per_user_transaction_more).transform(items_per_user_transaction_more)
del items_per_user_transaction_more
items_per_user_transaction_1hot = pd.DataFrame(items_per_user_transaction_1hot, columns=te.columns_)


run the Apriori model

In [None]:
ar_ap = apriori(items_per_user_transaction_1hot, min_support=0.01, max_len=5,
                use_colnames=True)

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
ar_ap = pd.read_csv('/content/gdrive/MyDrive/ar_ap.csv')
ar_ap

Mounted at /content/gdrive


Unnamed: 0,support,itemsets
0,0.001547,frozenset({156231001})
1,0.002260,frozenset({158340001})
2,0.002354,frozenset({160442007})
3,0.001966,frozenset({160442010})
4,0.001101,frozenset({253448003})
...,...,...
699,0.001344,"frozenset({909911001, 909912001})"
700,0.001532,"frozenset({944506001, 909924002})"
701,0.001218,"frozenset({915526001, 915526002})"
702,0.001191,"frozenset({915529001, 915529003})"


filter the results

In [None]:
ar_ap1 = ar_ap.copy()
itemsets = ar_ap1['itemsets']
itemsets_new = []
for i in itemsets:
  itemsets_new.append(i[11:-2].split(','))
ar_ap1['itemsets'] = itemsets_new

In [None]:
ar_ap1['length'] = ar_ap1['itemsets'].apply(lambda x: len(x))
ar_ap1 = ar_ap1[ ar_ap1['length'] == 2]
print(ar_ap1.shape[0])
ar_ap1

11


Unnamed: 0,support,itemsets,length
693,0.001383,"[706016001, 706016003]",2
694,0.00143,"[918292001, 856270002]",2
695,0.001708,"[918292004, 868823007]",2
696,0.001238,"[868823008, 918292001]",2
697,0.001101,"[918835001, 896169002]",2
698,0.001767,"[918836001, 896169002]",2
699,0.001344,"[909911001, 909912001]",2
700,0.001532,"[944506001, 909924002]",2
701,0.001218,"[915526001, 915526002]",2
702,0.001191,"[915529001, 915529003]",2


Write a function to find items bought together given the target item

In [None]:
ref_map = defaultdict(list)
for _, row in ar_ap1.iterrows():
  item1 = row[1][0].strip()
  item2 = row[1][1].strip()
  ref_map[item1].append(item2)
  ref_map[item2].append(item1)

def bought_together(item, ref):
  if item in ref:
    return ref[item]

In [None]:
bought_together('936622001', ref_map)

['935892001']

# popular items under user's attributes

This part requires combination with user demographic information 

In [None]:
! kaggle competitions download -c h-and-m-personalized-fashion-recommendations -f customers.csv

Downloading customers.csv.zip to /content
 91% 89.0M/97.9M [00:00<00:00, 148MB/s]
100% 97.9M/97.9M [00:00<00:00, 149MB/s]


In [None]:
customers_df = pd.read_csv('/content/customers.csv.zip')
customers_df.head()

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,,,ACTIVE,NONE,54.0,5d36574f52495e81f019b680c843c443bd343d5ca5b1c2...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,1.0,1.0,ACTIVE,Regularly,52.0,25fa5ddee9aac01b35208d01736e57942317d756b32ddd...


Merge customer info to transaction data

In [None]:
train = pd.merge(customers_df, train, how='right', left_on = 'customer_id', right_on = 'customer_id')
val = pd.merge(customers_df, val, how='right', left_on = 'customer_id', right_on = 'customer_id')

fill missing age with mode

In [None]:
train["age"].fillna(int(stats.mode(train["age"])[0]), inplace = True)

In [None]:
age_cat = pd.cut(train["age"], bins = [0,20,30,40,50,60,100], labels = ['1','2','3','4','5','6'])
train['age_cat'] = age_cat

In [None]:
age_cat = pd.cut(val["age"], bins = [0,20,30,40,50,60,100], labels = ['1','2','3','4','5','6'])
val['age_cat'] = age_cat

In [None]:
train['age_cat'].value_counts()

2    516516
3    195288
4    181010
5    161962
1     80228
6     44204
Name: age_cat, dtype: int64

In [None]:
age_cat_items = train.groupby(['age_cat'])['article_id'].apply(list)

In [None]:
def user_attribute(age_cat, age_cat_items, n):
  most_common_items_of_age_cat = {k:v for k, v in Counter(age_cat_items[age_cat]).most_common()}
  return list(most_common_items_of_age_cat.keys())[:(n+1)]

# same section_name items

This part requires combination with user demographic information 

In [None]:
! kaggle competitions download -c h-and-m-personalized-fashion-recommendations -f articles.csv

Downloading articles.csv.zip to /content
 70% 3.00M/4.26M [00:00<00:00, 5.57MB/s]
100% 4.26M/4.26M [00:00<00:00, 6.29MB/s]


In [None]:
articles_df = pd.read_csv('/content/articles.csv.zip')
articles_df.head(10)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
3,110065001,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,9,Black,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,10,White,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
5,110065011,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,12,Light Beige,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
6,111565001,111565,20 den 1p Stockings,304,Underwear Tights,Socks & Tights,1010016,Solid,9,Black,...,Tights basic,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,"Semi shiny nylon stockings with a wide, reinfo..."
7,111565003,111565,20 den 1p Stockings,302,Socks,Socks & Tights,1010016,Solid,13,Beige,...,Tights basic,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,"Semi shiny nylon stockings with a wide, reinfo..."
8,111586001,111586,Shape Up 30 den 1p Tights,273,Leggings/Tights,Garment Lower body,1010016,Solid,9,Black,...,Tights basic,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,Tights with built-in support to lift the botto...
9,111593001,111593,Support 40 den 1p Tights,304,Underwear Tights,Socks & Tights,1010016,Solid,9,Black,...,Tights basic,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,"Semi shiny tights that shape the tummy, thighs..."


In [None]:
articles_df['section_name'].value_counts()

Womens Everyday Collection        7295
Divided Collection                7124
Baby Essentials & Complements     4932
Kids Girl                         4469
Young Girl                        3899
Womens Lingerie                   3598
Girls Underwear & Basics          3490
Womens Tailoring                  3376
Kids Boy                          3328
Womens Small accessories          3270
Womens Casual                     2725
Kids Outerwear                    2665
Womens Trend                      2622
Divided Projects                  2364
Young Boy                         2352
H&M+                              2337
Men Underwear                     2322
Mama                              2266
Kids & Baby Shoes                 2142
Boys Underwear & Basics           2034
Womens Shoes                      2026
Ladies H&M Sport                  1894
Womens Swimwear, beachwear        1839
Contemporary Smart                1778
Baby Girl                         1760
Divided Accessories      

In [None]:
train = pd.merge(articles_df[['article_id','section_name']], train, how='right', left_on = 'article_id', right_on = 'article_id')
val = pd.merge(articles_df[['article_id','section_name']], val, how='right', left_on = 'article_id', right_on = 'article_id')

In [None]:
section_items = train.groupby(['section_name'])['article_id'].apply(list) 

In [None]:
def same_section_items(section, section_items, n):
  most_common_items_of_section = {k:v for k, v in Counter(section_items[section]).most_common()}
  return list(most_common_items_of_section.keys())[:(n+1)]

# Embedding

Item embedding is generated with each transaction information (order of item purchased) and pyspark 'word2vec' function

User embedding is the average of his/her purchased item embeddings

In [None]:
! pip install -q gensim

In [None]:
from gensim.models import Word2Vec

In [None]:
train['article_id'] = train['article_id'].astype("string")

In [None]:
items_per_user_transaction = train.groupby(['customer_id','t_dat'],group_keys=False)['article_id'].apply(list).reset_index()
items_per_user_transaction = list(items_per_user_transaction['article_id'])

In [None]:
model = Word2Vec(sentences=items_per_user_transaction, size=10, window=5, min_count=1, workers=4, sg=1, hs=0, negative=5)
item_vectors = model.wv



model.wv is the item embedding


In [None]:
vector = model.wv['706016003']
print(vector)
model.wv.most_similar('706016003', topn=10)

[-0.12409377  2.1211483  -0.13372296 -1.0931098   0.7188208  -2.067353
 -0.55426615  0.24494302  0.40380996  0.22232339]


[('867966009', 0.9948374629020691),
 ('882899005', 0.9945924282073975),
 ('706016001', 0.9937094449996948),
 ('706016038', 0.9887040853500366),
 ('757926003', 0.9865995645523071),
 ('835168001', 0.9863295555114746),
 ('867966002', 0.9860538244247437),
 ('798579002', 0.985958993434906),
 ('867966010', 0.9846370220184326),
 ('872266001', 0.9838160276412964)]

define a function to find the embedding of a customer

In [None]:
# List of all purchases per user 
items_per_user = train.groupby(['customer_id'])['article_id'].apply(list) 

In [None]:
def customer_embedding(purchase_history, customer, item_vectors):
  items = purchase_history[customer]
  customer_embed = [0] * 10
  for item in items:
    customer_embed = np.add(customer_embed, item_vectors[item])
  customer_embed = np.divide(customer_embed, len(items))
  return customer_embed


In [None]:
customer_embedding(items_per_user, '0000757967448a6cb83efb3ea7a3fb9d418ac7adf2379d8cd0c725276a467a2a', item_vectors)

array([ 0.14475741,  1.92418385, -0.66438076, -0.12911131,  1.10160816,
       -2.02153605, -0.42055298, -0.03544839,  0.19006272,  0.29608285])

Given a customer embedding and item embeddings, return the most similar 10 items to the customer

In [None]:
def retrieve_item(customer_embed, item_vectors, n = 20):
  return item_vectors.similar_by_vector(customer_embed,topn=n)

# Put together

Now put together all the recall items for a user with all strategies.
1. popularity: 50 items
2. purchase history: 50 items
3. popular items under user's attributes: 50 items
4. same section_name items: 10 same section items for each item the user recently purchased (up to 20 most recent purchses)
5. embedding similarity: 100 items

In [None]:

def recall(user, transaction_data, customer_data, item_data, item_embedding):
  transaction_data.sort_values(by=['t_dat'], ascending = False, inplace = True)

  # 1. popularity
  transaction_data['pop_factor'] = transaction_data['t_dat'].apply(lambda x: 1/(datetime.datetime(2020,9,16) - x).days)
  popular_items_group = transaction_data.groupby(['article_id'])['pop_factor'].sum()
  _, popular_items = zip(*sorted(zip(popular_items_group, popular_items_group.keys()))[::-1])
  popular_items = list(popular_items)[:51]

  # 2. purchase history
  items_per_user = transaction_data.groupby(['customer_id'])['article_id'].apply(list) 
  most_common_items_of_user = list({k:v for k, v in Counter(items_per_user[user]).most_common()}.keys())
  purchase_history = most_common_items_of_user[:51]

  # 3. popular items under user's attributes
  transaction_data = pd.merge(customer_data, transaction_data, how='right', left_on = 'customer_id', right_on = 'customer_id')
  transaction_data["age"].fillna(int(stats.mode(transaction_data["age"])[0]), inplace = True)
  age_cat = pd.cut(transaction_data["age"], bins = [0,20,30,40,50,60,100], labels = ['1','2','3','4','5','6'])
  transaction_data['age_cat'] = age_cat
  age_cat_items = transaction_data.groupby(['age_cat'])['article_id'].apply(list)
  
  user_age = customer_data['age'][customer_data['customer_id']==user]
  user_age_cut = pd.cut([user_age], bins = [0,20,30,40,50,60,100], labels = ['1','2','3','4','5','6'])[0]
  most_common_items_of_age_cat = {k:v for k, v in Counter(age_cat_items[age_cat]).most_common()}
  item_under_user_attribute = list(most_common_items_of_age_cat.keys())[:(50+1)]

  # 4. same section_name items
  transaction_data = pd.merge(item_data[['article_id','section_name']], transaction_data, how='right', left_on = 'article_id', right_on = 'article_id')
  section_items = transaction_data.groupby(['section_name'])['article_id'].apply(list) 
  same_section_items = []
  for item in purchase_history[:21]:
    section = articles_df['section_name'][articles_df['article_id'] == item]
    most_common_items_of_section = {k:v for k, v in Counter(section_items[section]).most_common()}
    same_section_items.append(list(most_common_items_of_section.keys())[:(11+1)])
  
  # 5. embedding similarity
  items = items_per_user[user]
  user_embedding = [0] * 10
  for item in items:
    user_embedding = np.add(user_embedding, item_vectors[item])
  user_embedding = np.divide(user_embedding, len(items))
  embedding_similarity_items = item_embedding.similar_by_vector(user_embedding,topn=n)
  
  out = popular_items + purchase_history + age_cat_items + item_under_user_attribute + same_section_items + embedding_similarity_items
  return list(set(out))

# Validation

Define evaluation metric: Mean Average Precision @ 12

In [None]:
def apk(actual, predicted, k=12):
    if len(predicted)>k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0

    for i,p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i+1.0)

    if not actual:
        return 0.0

    return score / min(len(actual), k)

def mapk(actual, predicted, k=12):
    return np.mean([apk(a,p,k) for a,p in zip(actual, predicted)])

Precision: the number of items that were purchased / total number of items recalled

In [None]:
def precision(actual, predicted):
  num_hits = 0.0
  for i,p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
  if not actual:
        return 0.0
  return num_hits/len(predicted)
            

Item Recall Rate: the number of items that were purchased / total number of items user purchased

In [None]:
def recall_rate(actual, predicted):
  num_hits = 0.0
  for i,p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
  if not actual:
        return 0.0
  return num_hits/len(actual)

construct data set with items bought by users in the validation period.

In [None]:
positive_items_val = val.groupby(['customer_id'])['article_id'].apply(list)

In [None]:
# creating validation set for metrics use case
val_users = positive_items_val.keys()
val_items = []

for i,user in enumerate(val_users):
    val_items.append(positive_items_val[user])
    
print("Total users in validation:", len(val_users))

Total users in validation: 68984


Test the strategy on validation set

In [None]:
outputs = []
cnt = 0

for user in tqdm(val_users):
    user_output = []
    if user in positive_items_per_user1.keys():
        most_common_items_of_user = {k:v for k, v in Counter(positive_items_per_user1[user]).most_common()}
        user_output += list(most_common_items_of_user.keys())[:12]
    if user in positive_items_per_user2.keys():
        most_common_items_of_user = {k:v for k, v in Counter(positive_items_per_user2[user]).most_common()}
        user_output += list(most_common_items_of_user.keys())[:12]
    if user in positive_items_per_user3.keys():
        most_common_items_of_user = {k:v for k, v in Counter(positive_items_per_user3[user]).most_common()}
        user_output += list(most_common_items_of_user.keys())[:12]
    if user in positive_items_per_user4.keys():
        most_common_items_of_user = {k:v for k, v in Counter(positive_items_per_user4[user]).most_common()}
        user_output += list(most_common_items_of_user.keys())[:12]
    
    user_output += list(popular_items[:12 - len(user_output)])
    outputs.append(user_output)
    
print("mAP Score on Validation set:", mapk(val_items, outputs))

100%|██████████| 68984/68984 [00:04<00:00, 15542.77it/s]


mAP Score on Validation set: 0.023448012511813318
