### Задание. Подбор оптимальных гиперпараметров для ALS

- Попробуйте улучшить базовый вариант ALS, изменяя следующие параметры
  - regularization, iterations
  - factors
  - Вес (TF_IDF, BM25  взвешивание)
  
- Посчитайте метрики (Precision@5, MAP@5) для разных наборов гиперпараметров и выберете лучший набор



_____

Качаем необходимые библиотеки:

In [1]:
import pandas as pd
import numpy as np

# Для работы с матрицами
from scipy.sparse import csr_matrix

# Матричная факторизация
from implicit.als import AlternatingLeastSquares
from implicit.nearest_neighbours import bm25_weight, tfidf_weight

from tqdm import tqdm

import warnings
warnings.filterwarnings('ignore')

Функии для подсчёта метрик `Precision@5`, `MAP@5`:

- **Precision@5** (точность на 5) измеряет, как часто рекомендации, предоставленные системой, соответствуют предпочтениям пользователя. Она определяется как доля релевантных элементов в первых пяти рекомендациях, поделенная на общее количество рекомендаций.

- **MAP@5** (средняя точность на 5) измеряет, как хорошо система ранжирует рекомендации. Она определяется как среднее значение точности на каждой позиции до пятой позиции. Точность на каждой позиции вычисляется как доля релевантных элементов в первых k рекомендациях, где k - текущая позиция.

Обе метрики имеют значения от 0 до 1, где 1 будет означать идеальное качество рекомендаций.

In [2]:
def precision_at_k(recommended_list, bought_list, k):
    flags = np.isin(np.array(recommended_list)[:k], np.array(bought_list))
    return flags.sum() / len(recommended_list)

def average_precision_at_k(recommended_list, bought_list, k=5):
    flags = np.isin(np.array(recommended_list), np.array(bought_list))
    if sum(flags) == 0:
        return 0
    sum_ = 0
    for i in range(k):
        if flags[i]:
            p_k = precision_at_k(recommended_list, bought_list, k=i+1)
            sum_ += p_k
    return sum_ / k

Качаем данные, делим их на train и test (порог - последние 3 недели):

In [3]:
data = pd.read_csv('retail_train.csv')
  
data.columns = [col.lower() for col in data.columns]
data.rename(columns={'household_key': 'user_id', 'product_id': 'item_id'}, inplace=True)

data_train = data[data['week_no'] < data['week_no'].max() - 3]
data_test = data[data['week_no'] >= data['week_no'].max() - 3]

data_train.head(2)

Unnamed: 0,user_id,basket_id,day,item_id,quantity,sales_value,store_id,retail_disc,trans_time,week_no,coupon_disc,coupon_match_disc
0,2375,26984851472,1,1004906,1,1.39,364,-0.6,1631,1,0.0,0.0
1,2375,26984851472,1,1033142,1,0.82,364,0.0,1631,1,0.0,0.0


Формируем датасет с user_id и актуальными items, дальше будем добавлять к немуц рекомендации и считать метрики пол столбцу actual:

In [4]:
result = data_test.groupby('user_id')['item_id'].unique().reset_index()
result.columns=['user_id', 'actual']
result.head(2)

Unnamed: 0,user_id,actual
0,1,"[821867, 834484, 856942, 865456, 889248, 90795..."
1,3,"[835476, 851057, 872021, 878302, 879948, 90963..."


Заведем фиктивный item_id (если юзер покупал товары из топ-5000, то он "купил" такой товар):

In [5]:
popularity = data_train.groupby('item_id')['quantity'].sum().reset_index()
popularity.rename(columns={'quantity': 'n_sold'}, inplace=True)

top_5000 = popularity.sort_values('n_sold', ascending=False).head(5000).item_id.tolist()

Оставляем ТОП-5000 items:

In [6]:
data_train.loc[~data_train['item_id'].isin(top_5000), 'item_id'] = 999999

Формируем разреженную матрицу:

In [7]:
# Заведем фиктивный item_id (если юзер покупал товары из топ-5000, то он "купил" такой товар)
data_train.loc[~data_train['item_id'].isin(top_5000), 'item_id'] = 999999

user_item_matrix = pd.pivot_table(data_train, 
                                  index='user_id', columns='item_id', 
                                  values='quantity', # Можно пробоват ьдругие варианты
                                  aggfunc='count', 
                                  fill_value=0
                                 )

user_item_matrix = user_item_matrix.astype(float) # необходимый тип матрицы для implicit

# переведем в формат saprse matrix
sparse_user_item = csr_matrix(user_item_matrix).tocsr()

user_item_matrix.head(3)

item_id,202291,397896,420647,480014,545926,707683,731106,818980,819063,819227,...,15778533,15831255,15926712,15926775,15926844,15926886,15927403,15927661,15927850,16809471
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Делаем словари, в которых будут лежать реальные соответствия item_id и номера столбца:

In [8]:
userids = user_item_matrix.index.values
itemids = user_item_matrix.columns.values

matrix_userids = np.arange(len(userids))
matrix_itemids = np.arange(len(itemids))

id_to_itemid = dict(zip(matrix_itemids, itemids))
id_to_userid = dict(zip(matrix_userids, userids))

itemid_to_id = dict(zip(itemids, matrix_itemids))
userid_to_id = dict(zip(userids, matrix_userids))

Функция для получения рекомендаций:

In [9]:
def get_sort_recommendations(user, model, sparse_user_item, N=5):
    
    recs = model.recommend(userid=userid_to_id[user], 
                           user_items=sparse_user_item[userid_to_id[user]],
                           N=N,
                           filter_already_liked_items=False, 
                           filter_items=[itemid_to_id[999999]],
                           recalculate_user=True)
    mask = recs[1].argsort()[::-1]
    res = [id_to_itemid[rec] for rec in recs[0][mask]]
    
    return res

In [10]:
def fit_(factors=100, regularization=0.001, iterations=15, 
         user_item_matrix=user_item_matrix, tfidf=False, bm25=False):
    
    model = AlternatingLeastSquares(factors=factors, 
                                    regularization=regularization,
                                    iterations=iterations, 
                                    calculate_training_loss=True, 
                                    num_threads=4,
                                    random_state=42)
    if tfidf:
        tfidf_user_item_matrix = tfidf_weight(user_item_matrix.T).T
        model.fit(tfidf_user_item_matrix)
    elif bm25:
        bm25_user_item_matrix = bm25_weight(user_item_matrix.T).T.tocsr()
        model.fit(csr_matrix(bm25_user_item_matrix).tocsr())
    else:
        model.fit(sparse_user_item)
    return model

In [11]:
# пустая таблица
result_table = pd.DataFrame({'name': [], 'Precision@5': [], 'MAP@5': []})

Напишем безумный велосипед, который в уикле переберёт regularization, iterations, factors для обычной разреженной матрицы и для матрицы с обработкой весов при помощи `TF_IDF` и `BM25`-взвешивания. Все результаты соберём в единую таблицу и посмотрим на ТОП-5 лучших и худших по метрикам `precision_at_k` и `average_precision_at_k`:

In [12]:
for f in tqdm(range(50, 201, 50)):
    for r in np.logspace(-3, 0, num=4):
        for i in range(5, 26, 5):
            for model_type in ['als', 'tfidf', 'bm25']:
                if model_type == 'als':
                    model = fit_(factors=f, regularization=r, iterations=i)
                    col_name = f'als, factors={f}, regularization={r}, iterations={i}' 
                elif model_type == 'tfidf':
                    model = fit_(factors=f, regularization=r, iterations=i, tfidf=True)
                    col_name = f'tfidf, factors={f}, regularization={r}, iterations={i}'
                elif model_type == 'bm25':
                    model = fit_(factors=f, regularization=r, iterations=i, bm25=True)
                    col_name = f'bm25, factors={f}, regularization={r}, iterations={i}'
                    
                result[col_name] =\
                    result['user_id'].apply(lambda x: get_sort_recommendations(x, model, sparse_user_item, N=5))

                result_table.loc[len(result_table)] =\
                    [col_name, 
                     result.apply(lambda row: precision_at_k(row[col_name], row['actual'], k=5), axis=1).mean(), 
                     result.apply(lambda row: average_precision_at_k(row[col_name], row['actual'], k=5), axis=1).mean()]

  0%|                                                                                            | 0/4 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

 25%|████████████████████▊                                                              | 1/4 [07:07<21:23, 427.95s/it]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

 50%|█████████████████████████████████████████▌                                         | 2/4 [17:19<17:51, 535.80s/it]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

 75%|██████████████████████████████████████████████████████████████▎                    | 3/4 [30:06<10:41, 641.32s/it]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

100%|███████████████████████████████████████████████████████████████████████████████████| 4/4 [45:38<00:00, 684.60s/it]


In [13]:
# сортировка по Precision@5
display(
    result_table.sort_values('Precision@5', ascending=False).head(),
    result_table.sort_values('Precision@5', ascending=False).tail()
)

Unnamed: 0,name,Precision@5,MAP@5
212,"bm25, factors=200, regularization=0.1, iterati...",0.241234,0.081019
152,"bm25, factors=150, regularization=0.1, iterati...",0.240646,0.079667
137,"bm25, factors=150, regularization=0.01, iterat...",0.238688,0.079001
227,"bm25, factors=200, regularization=1.0, iterati...",0.237806,0.079197
197,"bm25, factors=200, regularization=0.01, iterat...",0.237023,0.078786


Unnamed: 0,name,Precision@5,MAP@5
207,"als, factors=200, regularization=0.01, iterati...",0.150049,0.045054
222,"als, factors=200, regularization=0.1, iteratio...",0.14907,0.044721
186,"als, factors=200, regularization=0.001, iterat...",0.148874,0.04427
189,"als, factors=200, regularization=0.001, iterat...",0.1476,0.044133
192,"als, factors=200, regularization=0.001, iterat...",0.146621,0.04382


In [14]:
# сортировка по MAP@5
display(
    result_table.sort_values('MAP@5', ascending=False).head(),
    result_table.sort_values('MAP@5', ascending=False).tail()
)

Unnamed: 0,name,Precision@5,MAP@5
212,"bm25, factors=200, regularization=0.1, iterati...",0.241234,0.081019
152,"bm25, factors=150, regularization=0.1, iterati...",0.240646,0.079667
227,"bm25, factors=200, regularization=1.0, iterati...",0.237806,0.079197
137,"bm25, factors=150, regularization=0.01, iterat...",0.238688,0.079001
197,"bm25, factors=200, regularization=0.01, iterat...",0.237023,0.078786


Unnamed: 0,name,Precision@5,MAP@5
207,"als, factors=200, regularization=0.01, iterati...",0.150049,0.045054
222,"als, factors=200, regularization=0.1, iteratio...",0.14907,0.044721
186,"als, factors=200, regularization=0.001, iterat...",0.148874,0.04427
189,"als, factors=200, regularization=0.001, iterat...",0.1476,0.044133
192,"als, factors=200, regularization=0.001, iterat...",0.146621,0.04382


Мы перебрали аж 240 моделей:

In [15]:
result_table.shape[0]

240

и из сводной таблицы видим, что модель с применением к разреженной матрице `bmp25` на размерность скрытого пространства = 200, с регуляризацией = 0.1 и шагом итераций = 5 даёт самый лучший результат и по `Precision@5` и по `MAP@5`, аналогично и для худшего результата по обеим метрикам вышла одна и та же модель - так же с размерностью скрытого пространства = 200, но с другой резуляризацией и с максимальным количеством итераций, которое мы указали, и без взвешивания $\Rightarrow$ так мы убедились, что подбирать гиперпараметры и обрабатывать матрицу взаимодействий действительно важно.