# Straightforward Training and Evaluation of Models using RecTools

Here, I train and evaluate diffrent models on each split, using [RecTools models and metrics implementation](https://rectools.readthedocs.io/en/stable/features.html).

## Data reading

In [10]:
N_USERS = 943
N_ITEMS = 1682
data_interim_dir = '../data/interim/'
data_filenames = [f'u{t}.{split}' for t in ['1', '2', '3', '4', '5', 'a', 'b'] for split in ['base', 'test']]
data_filenames

['u1.base',
 'u1.test',
 'u2.base',
 'u2.test',
 'u3.base',
 'u3.test',
 'u4.base',
 'u4.test',
 'u5.base',
 'u5.test',
 'ua.base',
 'ua.test',
 'ub.base',
 'ub.test']

In [43]:
import pickle


data = {}
for i in range(0, len(data_filenames), 2):
    base_filename, test_filename = data_filenames[i:i+2]
    data_title = base_filename.split('.')[0]
    with open(data_interim_dir + base_filename + '.pickle', 'rb') as base:
        with open(data_interim_dir + test_filename + '.pickle', 'rb') as test:
            with open(data_interim_dir + test_filename + '.df.pickle', 'rb') as base_df:
                with open(data_interim_dir + test_filename + '.df.pickle', 'rb') as test_df:
                    data[data_title] = (pickle.load(base), pickle.load(test), pickle.load(base_df), pickle.load(test_df))

data.keys()

dict_keys(['u1', 'u2', 'u3', 'u4', 'u5', 'ua', 'ub'])

## Fit-recommend-eval

For $k=10$ recommendation per user, I fit models (Random, Popular, PureSVD, ImplicitItemKNN) on each train test split just for the sake of interest. I evaluate models on 2 ranking metrics (MAP, NDCG) and 1 classification metric (Accuracy).

### RandomModel

In [50]:
from rectools.models import RandomModel
from rectools.metrics import NDCG, Accuracy, MAP
from rectools import Columns


k = 10
ndcg = NDCG(k=k, log_base=3)
acc = Accuracy(k=k)
mmap = MAP(k=k)

for name, datasets in data.items():
    base, test, base_df, test_df = datasets
    model = RandomModel()
    model.fit(base)
    
    recos = model.recommend(
        users=base_df[Columns.User].unique(),
        dataset=base,
        k=10,
        filter_viewed=True,
    )
    print(f'RandomModel on {name} split')
    print('MAP: ', mmap.calc(reco=recos, interactions=test_df))
    print("Accuracy: ", acc.calc(reco=recos, interactions=test_df, catalog=base_df[Columns.Item]))
    print("NDCG: ", ndcg.calc(reco=recos, interactions=test_df))
    print()


RandomModel on u1 split
MAP:  0.002092676528567473
Accuracy:  0.9973511982570806
NDCG:  0.030899630188415444

RandomModel on u2 split
MAP:  0.0023602361921541766
Accuracy:  0.9979920367534455
NDCG:  0.023438791876433357

RandomModel on u3 split
MAP:  0.001809859617857784
Accuracy:  0.9983629459148446
NDCG:  0.014003995209910126

RandomModel on u4 split
MAP:  0.002396904927536346
Accuracy:  0.9984312026002167
NDCG:  0.01607806689461031

RandomModel on u5 split
MAP:  0.0023036107356587806
Accuracy:  0.9984380798274002
NDCG:  0.016313677910489613

RandomModel on ua split
MAP:  0.00187875574407918
Accuracy:  0.9978903546700646
NDCG:  0.005815220932128331

RandomModel on ub split
MAP:  0.0020093756838189506
Accuracy:  0.9978928286677864
NDCG:  0.006609109894597713



### PopularModel

In [49]:
from rectools.models import PopularModel
from rectools.metrics import NDCG, Accuracy, MAP
from rectools import Columns


k = 10
ndcg = NDCG(k=k, log_base=3)
acc = Accuracy(k=k)
mmap = MAP(k=k)

for name, datasets in data.items():
    base, test, base_df, test_df = datasets
    model = PopularModel()
    model.fit(base)
    
    recos = model.recommend(
        users=base_df[Columns.User].unique(),
        dataset=base,
        k=10,
        filter_viewed=True,
    )
    print(f'PopularModel on {name} split')
    print('MAP: ', mmap.calc(reco=recos, interactions=test_df))
    print("Accuracy: ", acc.calc(reco=recos, interactions=test_df, catalog=base_df[Columns.Item]))
    print("NDCG: ", ndcg.calc(reco=recos, interactions=test_df))
    print()


PopularModel on u1 split
MAP:  0.0515884623370606
Accuracy:  0.9976265795206971
NDCG:  0.320243894441263

PopularModel on u2 split
MAP:  0.0565423486642172
Accuracy:  0.9982166921898926
NDCG:  0.26440298998043166

PopularModel on u3 split
MAP:  0.05668825595875101
Accuracy:  0.9985456846950519
NDCG:  0.21313056029600302

PopularModel on u4 split
MAP:  0.052612425440435574
Accuracy:  0.9986019501625135
NDCG:  0.19715029155177874

PopularModel on u5 split
MAP:  0.05432205905899851
Accuracy:  0.9985983818770227
NDCG:  0.18935083809348333

PopularModel on ua split
MAP:  0.05441616590078944
Accuracy:  0.998136404988929
NDCG:  0.13295180383112556

PopularModel on ub split
MAP:  0.05081759497719201
Accuracy:  0.9981262840891585
NDCG:  0.12660211783861397



### PureSVDModel

In [48]:
from rectools.models import PureSVDModel
from rectools.metrics import NDCG, Accuracy, MAP
from rectools import Columns


k = 10
ndcg = NDCG(k=k, log_base=3)
acc = Accuracy(k=k)
mmap = MAP(k=k)

for name, datasets in data.items():
    base, test, base_df, test_df = datasets
    model = PureSVDModel()
    model.fit(base)
    
    recos = model.recommend(
        users=base_df[Columns.User].unique(),
        dataset=base,
        k=10,
        filter_viewed=True,
    )
    print(f'PureSVDModel on {name} split')
    print('MAP: ', mmap.calc(reco=recos, interactions=test_df))
    print("Accuracy: ", acc.calc(reco=recos, interactions=test_df, catalog=base_df[Columns.Item]))
    print("NDCG: ", ndcg.calc(reco=recos, interactions=test_df))
    print()


PureSVDModel on u1 split
MAP:  0.12490142060896098
Accuracy:  0.9978093681917211
NDCG:  0.528658583680511

PureSVDModel on u2 split
MAP:  0.13517649746661312
Accuracy:  0.9983811638591118
NDCG:  0.449230375655791

PureSVDModel on u3 split
MAP:  0.12955316430597238
Accuracy:  0.9986985040276178
NDCG:  0.3790986770716973

PureSVDModel on u4 split
MAP:  0.13722123371125586
Accuracy:  0.9987525460455037
NDCG:  0.37142655587903717

PureSVDModel on u5 split
MAP:  0.13531015524298964
Accuracy:  0.9987376483279395
NDCG:  0.35023119831861615

PureSVDModel on ua split
MAP:  0.1495090811156559
Accuracy:  0.9983941505697503
NDCG:  0.2851740747159939

PureSVDModel on ub split
MAP:  0.14198854550657308
Accuracy:  0.9983703102280689
NDCG:  0.27318164556889707



### ImplicitItemKNN

In [57]:
from rectools.models import ImplicitItemKNNWrapperModel
from implicit.nearest_neighbours import TFIDFRecommender, CosineRecommender, BM25Recommender
from rectools.metrics import NDCG, Accuracy, MAP
from rectools import Columns


k = 10
ndcg = NDCG(k=k, log_base=3)
acc = Accuracy(k=k)
mmap = MAP(k=k)

for name, datasets in data.items():
    base, test, base_df, test_df = datasets
    model = ImplicitItemKNNWrapperModel(
                model=TFIDFRecommender(K=10)
                )

    model.fit(base)
    
    recos = model.recommend(
        users=base_df[Columns.User].unique(),
        dataset=base,
        k=10,
        filter_viewed=True,
    )
    print(f'ImplicitItemKNN on {name} split')
    print('MAP: ', mmap.calc(reco=recos, interactions=test_df))
    print("Accuracy: ", acc.calc(reco=recos, interactions=test_df, catalog=base_df[Columns.Item]))
    print("NDCG: ", ndcg.calc(reco=recos, interactions=test_df))
    print()


ImplicitItemKNN on u1 split
MAP:  0.11095594568471655
Accuracy:  0.9977684095860565
NDCG:  0.47923027186404077

ImplicitItemKNN on u2 split
MAP:  0.12252065805277586
Accuracy:  0.9983509954058192
NDCG:  0.41468383065947245

ImplicitItemKNN on u3 split
MAP:  0.1161651980713301
Accuracy:  0.9986631760644419
NDCG:  0.3429152253413324

ImplicitItemKNN on u4 split
MAP:  0.12060826766232134
Accuracy:  0.998718959913326
NDCG:  0.3362310065152404

ImplicitItemKNN on u5 split
MAP:  0.12056638664275297
Accuracy:  0.9987145631067961
NDCG:  0.32096577240299784

ImplicitItemKNN on ua split
MAP:  0.13110555639717889
Accuracy:  0.9983442207975495
NDCG:  0.25640336994380786

ImplicitItemKNN on ub split
MAP:  0.12053935430658654
Accuracy:  0.9983271277223814
NDCG:  0.24071597752491747

