### Основное

Дедлайн - 13 ноября 23:59
Целевая метрика precision@5
Бейзлайн решения - MainRecommender
Сдаем ссылку на github с решением. На github должен быть файл recommendations.csv (user_id | [rec_1, rec_2, ...] с рекомендациями. rec_i - реальные id item-ов (из retail_train.csv)
Hints:

#### Сначала просто попробуйте разные параметры MainRecommender:

N в топ-N товарах при формировании user-item матирцы (сейчас топ-5000)
Различные веса в user-item матрице (0/1, кол-во покупок, log(кол-во покупок + 1), сумма покупки, ...)
Разные взвешивания матрицы (TF-IDF, BM25 - у него есть параметры)
Разные смешивания рекомендаций (обратите внимание на бейзлайн - прошлые покупки юзера)
Сделайте MVP - минимально рабочий продукт - (пусть даже top-popular), а потом его улучшайте

Если вы делаете двухуровневую модель - следите за валидацией

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Для работы с матрицами
from scipy.sparse import csr_matrix

# Матричная факторизация
from implicit import als

# Модель второго уровня
from lightgbm import LGBMClassifier

import os, sys
module_path = os.path.abspath(os.path.join(os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

# Написанные нами функции
from src.metrics import precision_at_k, recall_at_k
from src.utils import prefilter_items
from src.recommenders import MainRecommender

In [2]:
data = pd.read_csv('data/retail_train.csv')
item_features = pd.read_csv('data/product.csv')
user_features = pd.read_csv('data/hh_demographic.csv')

# column processing
item_features.columns = [col.lower() for col in item_features.columns]
user_features.columns = [col.lower() for col in user_features.columns]

item_features.rename(columns={'product_id': 'item_id'}, inplace=True)
user_features.rename(columns={'household_key': 'user_id'}, inplace=True)


# Важна схема обучения и валидации!
# -- давние покупки -- | -- 6 недель -- | -- 3 недель -- 
# подобрать размер 2-ого датасета (6 недель) --> learning curve (зависимость метрики recall@k от размера датасета)
val_lvl_1_size_weeks = 6
val_lvl_2_size_weeks = 3

data_train_lvl_1 = data[data['week_no'] < data['week_no'].max() - (val_lvl_1_size_weeks + val_lvl_2_size_weeks)]
data_val_lvl_1 = data[(data['week_no'] >= data['week_no'].max() - (val_lvl_1_size_weeks + val_lvl_2_size_weeks)) &
                      (data['week_no'] < data['week_no'].max() - (val_lvl_2_size_weeks))]

data_train_lvl_2 = data_val_lvl_1.copy()  # Для наглядности. Далее мы добавим изменения, и они будут отличаться
data_val_lvl_2 = data[data['week_no'] >= data['week_no'].max() - val_lvl_2_size_weeks]

data_train_lvl_1.head(2)

Unnamed: 0,user_id,basket_id,day,item_id,quantity,sales_value,store_id,retail_disc,trans_time,week_no,coupon_disc,coupon_match_disc
0,2375,26984851472,1,1004906,1,1.39,364,-0.6,1631,1,0.0,0.0
1,2375,26984851472,1,1033142,1,0.82,364,0.0,1631,1,0.0,0.0


In [3]:
n_items_before = data_train_lvl_1['item_id'].nunique()

data_train_lvl_1 = prefilter_items(data_train_lvl_1, item_features=item_features, take_n_popular=5000) #5000

n_items_after = data_train_lvl_1['item_id'].nunique()
print('Decreased # items from {} to {}'.format(n_items_before, n_items_after))

Decreased # items from 83685 to 5001


In [4]:
%%time
# parameters for grid search
params = {'factor': [50, 75, 100, 125, 150, 175, 200], #10, 25 - слишком мало
          'l_reg': [0.001, 0.01, 0.05, 0.1],
          'itr': [10, 15, 25, 35]
         }
recommender = MainRecommender(data_train_lvl_1, params)



Step 1 of 112. Fitting model for factor=50, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 47.84 sec.

Step 2 of 112. Fitting model for factor=50, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 47.33 sec.

Step 3 of 112. Fitting model for factor=50, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.47 sec.
Get recommendation time: 48.61 sec.

Step 4 of 112. Fitting model for factor=50, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.62 sec.
Get recommendation time: 48.93 sec.

Step 5 of 112. Fitting model for factor=50, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 48.7 sec.

Step 6 of 112. Fitting model for factor=50, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.3 sec.
Get recommendation time: 48.23 sec.

Step 7 of 112. Fitting model for factor=50, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.47 sec.
Get recommendation time: 48.0 sec.

Step 8 of 112. Fitting model for factor=50, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.64 sec.
Get recommendation time: 49.13 sec.

Step 9 of 112. Fitting model for factor=50, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 48.39 sec.

Step 10 of 112. Fitting model for factor=50, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.3 sec.
Get recommendation time: 49.15 sec.

Step 11 of 112. Fitting model for factor=50, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.47 sec.
Get recommendation time: 49.52 sec.

Step 12 of 112. Fitting model for factor=50, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.65 sec.
Get recommendation time: 49.26 sec.

Step 13 of 112. Fitting model for factor=50, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 49.06 sec.

Step 14 of 112. Fitting model for factor=50, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.29 sec.
Get recommendation time: 48.32 sec.

Step 15 of 112. Fitting model for factor=50, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.48 sec.
Get recommendation time: 48.08 sec.

Step 16 of 112. Fitting model for factor=50, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.64 sec.
Get recommendation time: 48.7 sec.

Step 17 of 112. Fitting model for factor=75, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 55.2 sec.

Step 18 of 112. Fitting model for factor=75, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 54.71 sec.

Step 19 of 112. Fitting model for factor=75, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.5 sec.
Get recommendation time: 55.01 sec.

Step 20 of 112. Fitting model for factor=75, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.65 sec.
Get recommendation time: 54.63 sec.

Step 21 of 112. Fitting model for factor=75, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.24 sec.
Get recommendation time: 55.13 sec.

Step 22 of 112. Fitting model for factor=75, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.31 sec.
Get recommendation time: 55.03 sec.

Step 23 of 112. Fitting model for factor=75, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.49 sec.
Get recommendation time: 55.12 sec.

Step 24 of 112. Fitting model for factor=75, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.66 sec.
Get recommendation time: 55.2 sec.

Step 25 of 112. Fitting model for factor=75, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 54.77 sec.

Step 26 of 112. Fitting model for factor=75, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 54.96 sec.

Step 27 of 112. Fitting model for factor=75, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.48 sec.
Get recommendation time: 55.0 sec.

Step 28 of 112. Fitting model for factor=75, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.66 sec.
Get recommendation time: 55.3 sec.

Step 29 of 112. Fitting model for factor=75, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 55.95 sec.

Step 30 of 112. Fitting model for factor=75, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.3 sec.
Get recommendation time: 55.25 sec.

Step 31 of 112. Fitting model for factor=75, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.47 sec.
Get recommendation time: 54.49 sec.

Step 32 of 112. Fitting model for factor=75, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.64 sec.
Get recommendation time: 54.35 sec.

Step 33 of 112. Fitting model for factor=100, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.24 sec.
Get recommendation time: 61.07 sec.

Step 34 of 112. Fitting model for factor=100, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 61.2 sec.

Step 35 of 112. Fitting model for factor=100, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.51 sec.
Get recommendation time: 60.97 sec.

Step 36 of 112. Fitting model for factor=100, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.69 sec.
Get recommendation time: 62.38 sec.

Step 37 of 112. Fitting model for factor=100, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.23 sec.
Get recommendation time: 62.11 sec.

Step 38 of 112. Fitting model for factor=100, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 61.26 sec.

Step 39 of 112. Fitting model for factor=100, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.51 sec.
Get recommendation time: 62.22 sec.

Step 40 of 112. Fitting model for factor=100, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.65 sec.
Get recommendation time: 62.44 sec.

Step 41 of 112. Fitting model for factor=100, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 63.07 sec.

Step 42 of 112. Fitting model for factor=100, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.31 sec.
Get recommendation time: 61.77 sec.

Step 43 of 112. Fitting model for factor=100, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.49 sec.
Get recommendation time: 61.13 sec.

Step 44 of 112. Fitting model for factor=100, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.67 sec.
Get recommendation time: 61.4 sec.

Step 45 of 112. Fitting model for factor=100, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 61.49 sec.

Step 46 of 112. Fitting model for factor=100, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 61.72 sec.

Step 47 of 112. Fitting model for factor=100, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.53 sec.
Get recommendation time: 62.37 sec.

Step 48 of 112. Fitting model for factor=100, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.68 sec.
Get recommendation time: 61.32 sec.

Step 49 of 112. Fitting model for factor=125, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 70.69 sec.

Step 50 of 112. Fitting model for factor=125, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.36 sec.
Get recommendation time: 69.99 sec.

Step 51 of 112. Fitting model for factor=125, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.53 sec.
Get recommendation time: 71.51 sec.

Step 52 of 112. Fitting model for factor=125, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.76 sec.
Get recommendation time: 70.05 sec.

Step 53 of 112. Fitting model for factor=125, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.26 sec.
Get recommendation time: 70.21 sec.

Step 54 of 112. Fitting model for factor=125, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.35 sec.
Get recommendation time: 70.18 sec.

Step 55 of 112. Fitting model for factor=125, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.52 sec.
Get recommendation time: 69.56 sec.

Step 56 of 112. Fitting model for factor=125, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.7 sec.
Get recommendation time: 70.1 sec.

Step 57 of 112. Fitting model for factor=125, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 70.17 sec.

Step 58 of 112. Fitting model for factor=125, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.34 sec.
Get recommendation time: 70.43 sec.

Step 59 of 112. Fitting model for factor=125, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.5 sec.
Get recommendation time: 70.2 sec.

Step 60 of 112. Fitting model for factor=125, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.69 sec.
Get recommendation time: 70.65 sec.

Step 61 of 112. Fitting model for factor=125, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 70.77 sec.

Step 62 of 112. Fitting model for factor=125, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.34 sec.
Get recommendation time: 70.52 sec.

Step 63 of 112. Fitting model for factor=125, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.53 sec.
Get recommendation time: 69.92 sec.

Step 64 of 112. Fitting model for factor=125, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.69 sec.
Get recommendation time: 70.11 sec.

Step 65 of 112. Fitting model for factor=150, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.27 sec.
Get recommendation time: 81.24 sec.

Step 66 of 112. Fitting model for factor=150, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.36 sec.
Get recommendation time: 81.39 sec.

Step 67 of 112. Fitting model for factor=150, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.54 sec.
Get recommendation time: 80.04 sec.

Step 68 of 112. Fitting model for factor=150, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.76 sec.
Get recommendation time: 81.29 sec.

Step 69 of 112. Fitting model for factor=150, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 81.09 sec.

Step 70 of 112. Fitting model for factor=150, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.37 sec.
Get recommendation time: 80.67 sec.

Step 71 of 112. Fitting model for factor=150, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.55 sec.
Get recommendation time: 81.71 sec.

Step 72 of 112. Fitting model for factor=150, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.73 sec.
Get recommendation time: 82.24 sec.

Step 73 of 112. Fitting model for factor=150, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 80.4 sec.

Step 74 of 112. Fitting model for factor=150, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.36 sec.
Get recommendation time: 82.32 sec.

Step 75 of 112. Fitting model for factor=150, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.56 sec.
Get recommendation time: 81.42 sec.

Step 76 of 112. Fitting model for factor=150, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.77 sec.
Get recommendation time: 80.3 sec.

Step 77 of 112. Fitting model for factor=150, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.25 sec.
Get recommendation time: 81.17 sec.

Step 78 of 112. Fitting model for factor=150, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.36 sec.
Get recommendation time: 80.72 sec.

Step 79 of 112. Fitting model for factor=150, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.53 sec.
Get recommendation time: 80.6 sec.

Step 80 of 112. Fitting model for factor=150, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.73 sec.
Get recommendation time: 81.22 sec.

Step 81 of 112. Fitting model for factor=175, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 92.77 sec.

Step 82 of 112. Fitting model for factor=175, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.38 sec.
Get recommendation time: 92.69 sec.

Step 83 of 112. Fitting model for factor=175, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.61 sec.
Get recommendation time: 93.03 sec.

Step 84 of 112. Fitting model for factor=175, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.83 sec.
Get recommendation time: 92.3 sec.

Step 85 of 112. Fitting model for factor=175, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 92.18 sec.

Step 86 of 112. Fitting model for factor=175, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.41 sec.
Get recommendation time: 91.62 sec.

Step 87 of 112. Fitting model for factor=175, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.61 sec.
Get recommendation time: 92.81 sec.

Step 88 of 112. Fitting model for factor=175, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.82 sec.
Get recommendation time: 91.49 sec.

Step 89 of 112. Fitting model for factor=175, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 91.97 sec.

Step 90 of 112. Fitting model for factor=175, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.37 sec.
Get recommendation time: 91.8 sec.

Step 91 of 112. Fitting model for factor=175, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.59 sec.
Get recommendation time: 91.69 sec.

Step 92 of 112. Fitting model for factor=175, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.83 sec.
Get recommendation time: 95.03 sec.

Step 93 of 112. Fitting model for factor=175, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.3 sec.
Get recommendation time: 93.61 sec.

Step 94 of 112. Fitting model for factor=175, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.38 sec.
Get recommendation time: 94.18 sec.

Step 95 of 112. Fitting model for factor=175, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.66 sec.
Get recommendation time: 97.38 sec.

Step 96 of 112. Fitting model for factor=175, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.87 sec.
Get recommendation time: 96.98 sec.

Step 97 of 112. Fitting model for factor=200, l_reg=0.001, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.32 sec.
Get recommendation time: 105.34 sec.

Step 98 of 112. Fitting model for factor=200, l_reg=0.001, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.42 sec.
Get recommendation time: 108.63 sec.

Step 99 of 112. Fitting model for factor=200, l_reg=0.001, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.64 sec.
Get recommendation time: 110.69 sec.

Step 100 of 112. Fitting model for factor=200, l_reg=0.001, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.83 sec.
Get recommendation time: 107.39 sec.

Step 101 of 112. Fitting model for factor=200, l_reg=0.01, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.33 sec.
Get recommendation time: 107.27 sec.

Step 102 of 112. Fitting model for factor=200, l_reg=0.01, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.38 sec.
Get recommendation time: 106.88 sec.

Step 103 of 112. Fitting model for factor=200, l_reg=0.01, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.64 sec.
Get recommendation time: 106.5 sec.

Step 104 of 112. Fitting model for factor=200, l_reg=0.01, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.85 sec.
Get recommendation time: 106.67 sec.

Step 105 of 112. Fitting model for factor=200, l_reg=0.05, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 104.81 sec.

Step 106 of 112. Fitting model for factor=200, l_reg=0.05, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.41 sec.
Get recommendation time: 104.79 sec.

Step 107 of 112. Fitting model for factor=200, l_reg=0.05, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.62 sec.
Get recommendation time: 105.72 sec.

Step 108 of 112. Fitting model for factor=200, l_reg=0.05, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.83 sec.
Get recommendation time: 106.11 sec.

Step 109 of 112. Fitting model for factor=200, l_reg=0.1, iterations=10:

  0%|          | 0/10 [00:00<?, ?it/s]

Fit time: 0.28 sec.
Get recommendation time: 106.03 sec.

Step 110 of 112. Fitting model for factor=200, l_reg=0.1, iterations=15:

  0%|          | 0/15 [00:00<?, ?it/s]

Fit time: 0.39 sec.
Get recommendation time: 106.41 sec.

Step 111 of 112. Fitting model for factor=200, l_reg=0.1, iterations=25:

  0%|          | 0/25 [00:00<?, ?it/s]

Fit time: 0.61 sec.
Get recommendation time: 106.28 sec.

Step 112 of 112. Fitting model for factor=200, l_reg=0.1, iterations=35:

  0%|          | 0/35 [00:00<?, ?it/s]

Fit time: 0.85 sec.
Get recommendation time: 105.23 sec.

                      params  result_train  result_test  fit_time  \
106   als_f-200_lr-0.05_i-25      0.939158     0.174516      0.62   
99   als_f-200_lr-0.001_i-35      0.939158     0.175127      0.83   
98   als_f-200_lr-0.001_i-25      0.939158     0.175127      0.64   
109    als_f-200_lr-0.1_i-15      0.939078     0.173293      0.39   
107   als_f-200_lr-0.05_i-35      0.938998     0.173293      0.83   
..                       ...           ...          ...       ...   
13      als_f-50_lr-0.1_i-15      0.652745     0.141284      0.29   
7      als_f-50_lr-0.01_i-35      0.652665     0.137513      0.64   
15      als_f-50_lr-0.1_i-35      0.652585     0.137513      0.64   
3     als_f-50_lr-0.001_i-35      0.651703     0.137513      0.62   
0     als_f-50_lr-0.001_i-10      0.650020     0.142304      0.25   

     get_rec_time                                           cv_model  \
106        105.72  <implicit.als.Alternat

  0%|          | 0/5001 [00:00<?, ?it/s]

Wall time: 2h 19min 4s


In [5]:
result_lvl_1 = data_val_lvl_1.groupby('user_id')['item_id'].unique().reset_index()
result_lvl_1.columns=['user_id', 'actual']
result_lvl_1.head(2)

Unnamed: 0,user_id,actual
0,1,"[853529, 865456, 867607, 872137, 874905, 87524..."
1,2,"[15830248, 838136, 839656, 861272, 866211, 870..."


In [6]:
%%time
_als_recs = []
_own_recs = []
_sim_itm_recs = []
_sim_usr_recs = []
d_recs = dict()
for uid in result_lvl_1['user_id']:
    try:
        _als_recs.append([uid, recommender.get_als_recommendations(uid, N=5)])
    except IndexError:
        _als_recs.append([uid, []])
    try:
        _own_recs.append([uid, recommender.get_own_recommendations(uid, N=5)])
    except (IndexError, ValueError):
        _own_recs.append([uid, []])
    try:
        _sim_itm_recs.append([uid, recommender.get_similar_items_recommendation(uid, N=5)])
    except IndexError:
        _sim_itm_recs.append([uid, []])
    try:
        _sim_usr_recs.append([uid, recommender.get_similar_users_recommendation(uid, N=5)])
    except (IndexError, ValueError):
        _sim_usr_recs.append([uid, []])
d_recs['als'] = _als_recs
d_recs['own'] = _own_recs
d_recs['sim_itm'] = _sim_itm_recs
d_recs['sim_usr'] = _sim_usr_recs
als_recs = pd.DataFrame(d_recs['als'], columns=['uid', 'als'])
own_recs = pd.DataFrame(d_recs['own'], columns=['uid', 'own'])
sim_itm_recs = pd.DataFrame(d_recs['sim_itm'], columns=['uid', 'sim_itm'])
sim_usr_recs = pd.DataFrame(d_recs['sim_usr'], columns=['uid', 'sim_usr'])

Wall time: 1min 50s


In [7]:
result_lvl_1_als = pd.merge(result_lvl_1, als_recs, left_on='user_id', right_on='uid', how='left')
result_lvl_1_als.drop(['uid'], axis=1, inplace=True) 

result_lvl_1_own = pd.merge(result_lvl_1, own_recs, left_on='user_id', right_on='uid', how='left')
result_lvl_1_own.drop(['uid'], axis=1, inplace=True) 

result_lvl_1_sim_itm = pd.merge(result_lvl_1, sim_itm_recs, left_on='user_id', right_on='uid', how='left')
result_lvl_1_sim_itm.drop(['uid'], axis=1, inplace=True) 

result_lvl_1_sim_usr = pd.merge(result_lvl_1, sim_usr_recs, left_on='user_id', right_on='uid', how='left')
result_lvl_1_sim_usr.drop(['uid'], axis=1, inplace=True) 

In [8]:
print(f"recall@k-als: {result_lvl_1_als.apply(lambda row: recall_at_k(row['als'], row['actual']), axis=1).mean()}")
print(f"precision@k-als: {result_lvl_1_als.apply(lambda row: precision_at_k(row['als'], row['actual']), axis=1).mean()}")
print(f"recall@k-own: {result_lvl_1_own.apply(lambda row: recall_at_k(row['own'], row['actual']), axis=1).mean()}")
print(f"recall@k-sim-itm: {result_lvl_1_sim_itm.apply(lambda row: recall_at_k(row['sim_itm'], row['actual']), axis=1).mean()}")
print(f"recall@k-sim-usr: {result_lvl_1_sim_usr.apply(lambda row: recall_at_k(row['sim_usr'], row['actual']), axis=1).mean()}")

recall@k-als: 0.014466172675828613
precision@k-als: 0.13110181311018004
recall@k-own: 0.018176536856402324


  precision = flags.sum() / recommended_list.size


recall@k-sim-itm: 0.008023621313273782
recall@k-sim-usr: 0.0003051019269572906


In [9]:
result_lvl_1_als['recall@k-als'] = result_lvl_1_als.apply(lambda row: recall_at_k(row['als'], row['actual']), axis=1)
result_lvl_1_als.sort_values('recall@k-als', ascending=False, inplace=True)
#result_lvl_1_als_top = result_lvl_1_als.head(50)

result_lvl_1_als['precision@k-als'] = result_lvl_1_als.apply(lambda row: precision_at_k(row['als'], row['actual']), axis=1)
#result_lvl_1_als.sort_values('precision@k-als', ascending=False, inplace=True)
result_lvl_1_als_top = result_lvl_1_als.head(50)

result_lvl_1_own['recall@k-own'] = result_lvl_1_own.apply(lambda row: recall_at_k(row['own'], row['actual']), axis=1)
result_lvl_1_own.sort_values('recall@k-own', ascending=False, inplace=True)
result_lvl_1_own_top = result_lvl_1_own.head(50)

result_lvl_1_sim_itm['recall@k-sim-itm'] = result_lvl_1_sim_itm.apply(lambda row: recall_at_k(row['sim_itm'], row['actual']), axis=1)
result_lvl_1_sim_itm.sort_values('recall@k-sim-itm', ascending=False, inplace=True)
result_lvl_1_sim_itm_top = result_lvl_1_sim_itm.head(50)

result_lvl_1_sim_usr['recall@k-sim-usr'] = result_lvl_1_sim_usr.apply(lambda row: recall_at_k(row['sim_usr'], row['actual']), axis=1)
result_lvl_1_sim_usr.sort_values('recall@k-sim-usr', ascending=False, inplace=True)
result_lvl_1_sim_usr_top = result_lvl_1_sim_usr.head(50)

In [10]:
result_lvl_1_als_top.head()

Unnamed: 0,user_id,actual,als,recall@k-als,precision@k-als
1704,1970,[1072086],"[1072086, 7147145, 1025581, 1106523, 854900]",1.0,0.2
229,267,"[901976, 929373, 1062782]","[929373, 901976, 948888, 842423, 935393]",0.666667,0.4
1270,1477,"[961979, 1029743]","[961979, 9570062, 1125278, 986912, 928640]",0.5,0.2
2051,2382,"[1050741, 9297062]","[9297062, 1025650, 6703742, 5570383, 9297574]",0.5,0.2
2090,2428,"[944486, 6533681]","[1020604, 6533681, 851819, 1111986, 977798]",0.5,0.2


In [11]:
print(f"recall@k-als-top: {result_lvl_1_als.head(50).apply(lambda row: recall_at_k(row['als'], row['actual']), axis=1).mean()}")
print(f"recall@k-own-top: {result_lvl_1_own.head(50).apply(lambda row: recall_at_k(row['own'], row['actual']), axis=1).mean()}")
print(f"recall@k-sim-itm-top: {result_lvl_1_sim_itm.head(50).apply(lambda row: recall_at_k(row['sim_itm'], row['actual']), axis=1).mean()}")
print(f"recall@k-sim-usr-top: {result_lvl_1_sim_usr.head(50).apply(lambda row: recall_at_k(row['sim_usr'], row['actual']), axis=1).mean()}")

recall@k-als-top: 0.21392568217644953
recall@k-own-top: 0.25352436528792444
recall@k-sim-itm-top: 0.14006960595763032
recall@k-sim-usr-top: 0.013143791013320078


In [12]:
recs_output = result_lvl_1_als.copy()
recs_output.drop(['actual', 'recall@k-als', 'precision@k-als'], axis=1, inplace=True)
recs_output.sort_index(inplace=True)
recs_output.rename(columns={'als': 'recommendations'}, inplace=True)
recs_output.to_csv('recommendations_als.csv')

### Обучаем модель 2 уровня.

In [13]:
users_lvl_2 = pd.DataFrame(data_train_lvl_2['user_id'].unique())
users_lvl_2.columns = ['user_id']

# Пока только warm start
train_users = result_lvl_1_als_top['user_id'].unique()
users_lvl_2 = users_lvl_2[users_lvl_2['user_id'].isin(train_users)]

users_lvl_2['candidates'] = users_lvl_2['user_id'].apply(lambda x: recommender.get_own_recommendations(x, N=50))

In [14]:
users_lvl_2.head(2)

Unnamed: 0,user_id,candidates
159,930,"[917033, 1016800, 1050741, 854716, 851188, 113..."
180,975,"[868888, 910151, 856215, 1027835, 1042571, 111..."


In [15]:
s = users_lvl_2.apply(lambda x: pd.Series(x['candidates']), axis=1).stack().reset_index(level=1, drop=True)
s.name = 'item_id'

users_lvl_2 = users_lvl_2.drop('candidates', axis=1).join(s)
users_lvl_2['flag'] = 1

users_lvl_2.head(4)

Unnamed: 0,user_id,item_id,flag
159,930,917033,1
159,930,1016800,1
159,930,1050741,1
159,930,854716,1


In [16]:
users_lvl_2.shape[0]

2500

In [17]:
users_lvl_2['user_id'].nunique()

50

In [18]:
targets_lvl_2 = data_train_lvl_2[['user_id', 'item_id']].copy()
targets_lvl_2['target'] = 1  # тут только покупки 

targets_lvl_2 = users_lvl_2.merge(targets_lvl_2, on=['user_id', 'item_id'], how='left')

targets_lvl_2['target'].fillna(0, inplace= True)
targets_lvl_2.drop('flag', axis=1, inplace=True)

In [19]:
targets_lvl_2.head(2)

Unnamed: 0,user_id,item_id,target
0,930,917033,0.0
1,930,1016800,0.0


In [20]:
targets_lvl_2['target'].mean()

0.0637042862760519

In [21]:
item_features.head(2)

Unnamed: 0,item_id,manufacturer,department,brand,commodity_desc,sub_commodity_desc,curr_size_of_product
0,25671,2,GROCERY,National,FRZN ICE,ICE - CRUSHED/CUBED,22 LB
1,26081,2,MISC. TRANS.,National,NO COMMODITY DESCRIPTION,NO SUBCOMMODITY DESCRIPTION,


In [22]:
user_features.head(2)

Unnamed: 0,age_desc,marital_status_code,income_desc,homeowner_desc,hh_comp_desc,household_size_desc,kid_category_desc,user_id
0,65+,A,35-49K,Homeowner,2 Adults No Kids,2,None/Unknown,1
1,45-54,A,50-74K,Homeowner,2 Adults No Kids,2,None/Unknown,7


In [23]:
targets_lvl_2 = targets_lvl_2.merge(item_features, on='item_id', how='left')
targets_lvl_2 = targets_lvl_2.merge(user_features, on='user_id', how='left')

targets_lvl_2.head()

Unnamed: 0,user_id,item_id,target,manufacturer,department,brand,commodity_desc,sub_commodity_desc,curr_size_of_product,age_desc,marital_status_code,income_desc,homeowner_desc,hh_comp_desc,household_size_desc,kid_category_desc
0,930,917033,0.0,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 12/18&15PK CAN CAR,12 OZ,,,,,,,
1,930,1016800,0.0,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 12/18&15PK CAN CAR,12 OZ,,,,,,,
2,930,1050741,1.0,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,
3,930,1050741,1.0,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,
4,930,854716,0.0,2,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,


In [24]:
X_train = targets_lvl_2.drop('target', axis=1)
y_train = targets_lvl_2[['target']]

In [25]:
cat_feats = X_train.columns[2:].tolist()
X_train[cat_feats] = X_train[cat_feats].astype('category')

cat_feats

['manufacturer',
 'department',
 'brand',
 'commodity_desc',
 'sub_commodity_desc',
 'curr_size_of_product',
 'age_desc',
 'marital_status_code',
 'income_desc',
 'homeowner_desc',
 'hh_comp_desc',
 'household_size_desc',
 'kid_category_desc']

In [26]:
lgb = LGBMClassifier(objective='binary', max_depth=7, categorical_column=cat_feats)
lgb.fit(X_train, y_train)

train_preds = lgb.predict(X_train)

  return f(*args, **kwargs)
  if self.handle is None:


In [27]:
X_train

Unnamed: 0,user_id,item_id,manufacturer,department,brand,commodity_desc,sub_commodity_desc,curr_size_of_product,age_desc,marital_status_code,income_desc,homeowner_desc,hh_comp_desc,household_size_desc,kid_category_desc
0,930,917033,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 12/18&15PK CAN CAR,12 OZ,,,,,,,
1,930,1016800,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 12/18&15PK CAN CAR,12 OZ,,,,,,,
2,930,1050741,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,
3,930,1050741,103,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,
4,930,854716,2,GROCERY,National,SOFT DRINKS,SOFT DRINKS 20PK&24PK CAN CARB,12 OZ,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2538,1745,903454,1216,MEAT-PCKGD,National,FROZEN MEAT,OTHER - FULLY COOKED,32 OZ,45-54,A,Under 15K,Unknown,Single Male,2,None/Unknown
2539,1745,9419888,759,GROCERY,National,YOGURT,YOGURT MULTI-PACKS,48 OZ,45-54,A,Under 15K,Unknown,Single Male,2,None/Unknown
2540,1745,1076769,3859,DELI,National,DELI MEATS,MEAT: LUNCHMEAT BULK,,45-54,A,Under 15K,Unknown,Single Male,2,None/Unknown
2541,1745,1092588,709,GROCERY,National,FLUID MILK PRODUCTS,MISCELLANEOUS MILK,32 OZ,45-54,A,Under 15K,Unknown,Single Male,2,None/Unknown


In [28]:
len(train_preds)

2543

In [29]:
result_lvl_2 = data_val_lvl_2.groupby('user_id')['item_id'].unique().reset_index()
result_lvl_2.columns=['user_id', 'actual']
result_lvl_2.head(2)

Unnamed: 0,user_id,actual
0,1,"[821867, 834484, 856942, 865456, 889248, 90795..."
1,3,"[835476, 851057, 872021, 878302, 879948, 90963..."


In [30]:
%%time
_own_recs_2 = []
d_recs_2 = dict()
for uid in result_lvl_2['user_id']:
    try:
        _own_recs_2.append([uid, recommender.get_own_recommendations(uid, N=5)])
    except (IndexError, ValueError):
        _own_recs_2.append([uid, []])
d_recs_2['own2'] = _own_recs_2
own_recs_2 = pd.DataFrame(d_recs_2['own2'], columns=['uid', 'own2'])

Wall time: 12.2 s


In [31]:
result_lvl_2_own = pd.merge(result_lvl_2, own_recs_2, left_on='user_id', right_on='uid', how='left')
result_lvl_2_own.drop(['uid'], axis=1, inplace=True)
#result_lvl_2_own.replace(np.NaN, '[]', inplace=True)

In [32]:
result_lvl_2_own

Unnamed: 0,user_id,actual,own2
0,1,"[821867, 834484, 856942, 865456, 889248, 90795...","[856942, 9297615, 5577022, 877391, 9655212]"
1,3,"[835476, 851057, 872021, 878302, 879948, 90963...","[1092937, 1008714, 12132312, 1075979, 998206]"
2,6,"[920308, 926804, 946489, 1006718, 1017061, 107...","[13003092, 972416, 995598, 923600, 1138596]"
3,7,"[840386, 889774, 898068, 909714, 929067, 95347...","[998519, 894360, 7147142, 9338009, 896666]"
4,8,"[835098, 872137, 910439, 924610, 992977, 10412...","[12808385, 981660, 939860, 7410201, 6463874]"
...,...,...,...
2037,2496,[6534178],"[872826, 983665, 991546, 1134296, 7441210]"
2038,2497,"[1016709, 9835695, 1132298, 16809501, 845294, ...","[870515, 1117219, 1102207, 1057168, 1135834]"
2039,2498,"[15716530, 834484, 901776, 914190, 958382, 972...","[1022066, 1076580, 1100379, 5565356, 931579]"
2040,2499,"[867188, 877580, 902396, 914190, 951590, 95813...","[7168055, 1128395, 6904613, 5570048, 889989]"


In [33]:
result_lvl_1_own.apply(lambda row: precision_at_k(row['own'], row['actual']), axis=1).mean()

  precision = flags.sum() / recommended_list.size


0.17712691771268907

In [34]:
result_lvl_1_own.head(50).apply(lambda row: precision_at_k(row['own'], row['actual']), axis=1).mean()

0.34

In [35]:
result_lvl_2_own.apply(lambda row: precision_at_k(row['own2'], row['actual']), axis=1).mean()

0.1444117647058813

In [36]:
X_tr_p = X_train.copy()
X_tr_p['preds'] = pd.Series(train_preds, index=X_tr_p.index)
X_tr_p = X_tr_p[X_tr_p.preds != 0]
lv2 = X_tr_p.groupby('user_id')['item_id'].apply(list).reset_index()
lv2

Unnamed: 0,user_id,item_id
0,238,"[1102416, 1102416, 948670, 948670]"
1,259,"[1087605, 1062128]"
2,267,"[929373, 901976]"
3,286,"[1115069, 1120361]"
4,382,[999250]
5,390,"[1058243, 1058243, 1058243, 1072693, 888543, 8..."
6,470,"[908314, 908314, 908314, 908314, 871633]"
7,478,[8203851]
8,505,[962991]
9,720,"[1056212, 1115069, 1115069, 1115069, 1055504, ..."


In [37]:
result_lvl_2_preds = pd.merge(result_lvl_2, lv2, left_on='user_id', right_on='user_id', how='left')
result_lvl_2_preds.replace(np.NaN, '[]', inplace=True)
result_lvl_2_preds.head()

Unnamed: 0,user_id,actual,item_id
0,1,"[821867, 834484, 856942, 865456, 889248, 90795...",[]
1,3,"[835476, 851057, 872021, 878302, 879948, 90963...",[]
2,6,"[920308, 926804, 946489, 1006718, 1017061, 107...",[]
3,7,"[840386, 889774, 898068, 909714, 929067, 95347...",[]
4,8,"[835098, 872137, 910439, 924610, 992977, 10412...",[]


In [38]:
result_lvl_2_preds.apply(lambda row: precision_at_k(row['item_id'], row['actual']), axis=1).mean()

  mask |= (ar1 == a)


0.006203068886712374

In [39]:
targets_val_lvl_2 = data_val_lvl_2[['user_id', 'item_id']].copy()
targets_val_lvl_2['target'] = 1  # тут только покупки 

targets_val_lvl_2 = users_lvl_2.merge(targets_val_lvl_2, on=['user_id', 'item_id'], how='left')

targets_val_lvl_2['target'].fillna(0, inplace= True)
targets_val_lvl_2.drop('flag', axis=1, inplace=True)

In [40]:
targets_val_lvl_2 = targets_val_lvl_2.merge(item_features, on='item_id', how='left')
targets_val_lvl_2 = targets_val_lvl_2.merge(user_features, on='user_id', how='left')

In [41]:
X_val = targets_val_lvl_2.drop('target', axis=1)
y_val = targets_val_lvl_2[['target']]

In [42]:
v_cat_feats = X_val.columns[2:].tolist()
X_val[v_cat_feats] = X_val[v_cat_feats].astype('category')

In [43]:
val_preds = lgb.predict(X_val)

In [44]:
X_val_p = X_val.copy()
X_val_p['preds'] = pd.Series(val_preds, index=X_val_p.index)
X_val_p = X_val_p[X_val_p.preds != 0]
lv2_v = X_val_p.groupby('user_id')['item_id'].apply(list).reset_index()
lv2_v

Unnamed: 0,user_id,item_id
0,238,"[1102416, 1102416, 948670]"
1,259,"[1087605, 1062128]"
2,267,"[929373, 901976]"
3,286,"[1115069, 1120361]"
4,382,[999250]
5,390,"[1058243, 1072693, 888543, 894439]"
6,470,"[908314, 871633]"
7,478,[8203851]
8,505,[962991]
9,720,"[1056212, 1115069, 1055504, 863447, 854852]"


In [45]:
result_val_lvl_2_preds = pd.merge(result_lvl_2, lv2_v, left_on='user_id', right_on='user_id', how='left')
result_val_lvl_2_preds.replace(np.NaN, '[]', inplace=True)
result_val_lvl_2_preds.head()

Unnamed: 0,user_id,actual,item_id
0,1,"[821867, 834484, 856942, 865456, 889248, 90795...",[]
1,3,"[835476, 851057, 872021, 878302, 879948, 90963...",[]
2,6,"[920308, 926804, 946489, 1006718, 1017061, 107...",[]
3,7,"[840386, 889774, 898068, 909714, 929067, 95347...",[]
4,8,"[835098, 872137, 910439, 924610, 992977, 10412...",[]


In [46]:
result_val_lvl_2_preds.apply(lambda row: precision_at_k(row['item_id'], row['actual']), axis=1).mean()

  mask |= (ar1 == a)


0.006782566111655241