# Testing Models

Testing recommender systems is less intuitive than testing other predictive models. There are many common metrics to use, which can be found [here in this useful Medium article](https://medium.com/swlh/rank-aware-recsys-evaluation-metrics-5191bba16832). The idea of these methods are all similar: generate a list of top ranked items to show the user, and your score is based on how many of the items you recommend are relevant to the user. The word "relevant" is not well-defined, but normally means that the user has interacted with this item in the past. 

In our case, our items are the restaurants and the interactions are orders. One issue with our data is that a plurality of our users have only ordered from a single restaurant, which makes the above methods tricky/impossible to apply. This is because the user's single order is used as the target, and hence they will be considered to have no interactions. 

Recall that our model takes in a user's past five ($k=5$) orders as inputs together with one restaurant $R$ and tries to predict how likely the user will order from $R$ next. We will evaluate our model as follows. Given a customer sequence of vendors, and given the target restaurant $R$ which is next in the sequence, we use the model to generate a ranked list of $k=5$ more restaurants that it thinks is the most likely to be ordered from next. If R is among the $k$ generated restaurants, we add $+1$ to the score, otherwise $+0$. 

Since there are only $100$ restaurants to recommend from, it should be very feasible to simply score each one, sort the rankings, and slice the top $5$ as recommendations. This is a very privillaged position we are in since many recommender systems are deployed in a context where there are millions of items to recommend from. In that case, one could perform clustering based on the customer and vendor embeddings before ranking within the clusters.  

Each user in the test set may possibly geenerate many length $5+1$ sequences, and we shall use all of these during the evaluation process. In the deployed version of the model, we could simply give recommendations based on the last $5$ orders made by the user.

Baselines:
1) Recommend the $5$ most popular restaurants.
2) Given a sequence of $5$ orders, $m$ of which are unique and not null, recommend those restaurants together with the $5-m$ most popular restaurants. 

We will define 'popularity' of a vendor to be equal to its number of orders in the training data. Note that the unordered set of top five most popular vendors would be the same if we changed the definition of popularity of a vendor to be equal to its number of unique customers (although their rankings are slightly different, see the last cell of "Munging.ipynb"). 

In [1]:
import pandas as pd
import pickle
import torch
from PreprocessingHelpers import CustomDataset
from torch.utils.data import Dataset, DataLoader
from Models.Models import Model1, Model2, Model3, Model4, Model5, Model6, Model7, Model8, Model9
from tqdm import tqdm

## Load Test Data

In [2]:
with open("ProcessedData/test_sequences_padded_dataset_6.pkl", "rb") as file:
    test_sequences_padded_dataset_6 = pickle.load(file)

test_loader_6 = DataLoader(test_sequences_padded_dataset_6, batch_size=1, shuffle=True)
num_trials_6 = test_sequences_padded_dataset_6.vendor.shape[0]

In [3]:
with open("ProcessedData/vendors_tensor.pkl", "rb") as file:
    vendors_tensor = pickle.load(file)

In [4]:
with open("ProcessedData/popular_vendors.pkl", "rb") as file:
    popular_vendors = pickle.load(file)

## Define Scoring for Model & Baselines

In [5]:
popular_vendors.head(10)

id
28    3237
25    2717
14    2681
19    2453
18    2184
15    2159
75    1377
21    1274
36    1145
6     1078
Name: num_orders, dtype: int64

In [6]:
def get_most_popular(vendors, k):
    return vendors[:k].index.tolist()

In [7]:
most_popular_5 = get_most_popular(popular_vendors, 5)
most_popular_5

[28, 25, 14, 19, 18]

In [8]:
def baseline1_scoring(target:int, k_most_popular:list):
    if target in k_most_popular:
        return 1
    else:
        return 0

In [9]:
def baseline2_scoring(seq:list, target:int, k_most_popular:list):
    seq = list(set(seq))    # remove duplicates
    try:
        seq.remove(0)       # remove null-token
    except ValueError:
        pass
    m = len(seq)
    seq = seq + k_most_popular[:-m] if (m != 0) else k_most_popular
    if target in seq:
        return 1
    else:
        return 0

In [10]:
def model_topk(seq:torch.tensor, model, k:int=5):
    seq = seq.view(1, -1)
    y = torch.ones([100, 1], dtype=torch.long)      # 100 vendors
    seq_dupe = y @ seq                              # 100 x 5 matrix
    v_ids = torch.arange(start=1, end=101, dtype=torch.long)
    rankings = model.forward(c_seq=seq_dupe, v_id=v_ids).view(-1)    
    top_k = torch.topk(rankings, k)[1][:5]          # Essentially argmax for top k
    top_k = top_k + 1                               # Shift indices to match vendors
    return top_k

In [11]:
def model_scoring(seq:torch.tensor, target:torch.tensor, model, k:int=5):
    top_k = model_topk(seq, model, k)
    if target.item() in top_k:
        return 1
    else:
        return 0

In [13]:
def load_score_model(model, base_path, epochs=201, test_every=20, test_loader=test_loader_6, num_trials=num_trials_6):
    for epoch in range(0, epochs, test_every):
        PATH = base_path + str(epoch) + ".pt"
        checkpoint = torch.load(PATH)
        model.load_state_dict(checkpoint['model_state_dict'])
        model.eval()
        model_score = 0
        for i, (c_seq, v_id) in enumerate(test_loader):
            with torch.no_grad():
                model_score += model_scoring(seq=c_seq, target=v_id, model=model, k=5)
        print(f'Model_{epoch}:\t{model_score} / {num_trials} = {model_score*100/num_trials:.2f}%')
    print("Done!")

# Give 5 Recommendations

## Baselines

In [36]:
# Score baselines

print("SCORING\n=======================================================")
print("Random:\t\t2384 / 47677 = 5.00% ")
baseline1_score = 0
baseline2_score = 0
for i, (c_seq, v_id) in enumerate(test_loader_6):
    target = v_id.item()
    c_seq_list = c_seq.view(-1).tolist()
    baseline1_score += baseline1_scoring(target=target, k_most_popular=most_popular_5)
    baseline2_score += baseline2_scoring(seq=c_seq_list, target=target, k_most_popular=most_popular_5)
print(f'Baseline1:\t{baseline1_score} / {num_trials_6} = {baseline1_score*100/num_trials_6:.2f}%')
print(f'Baseline2:\t{baseline2_score} / {num_trials_6} = {baseline2_score*100/num_trials_6:.2f}%')

SCORING
Random:		2384 / 47677 = 5.00% 
Baseline1:	9233 / 47677 = 19.37%
Baseline2:	24204 / 47677 = 50.77%



## Model1

### Model1_64

In [13]:
# Score model1_64 at different epochs

model = Model1(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adam_no_scheduler/model1_64_epoch"
load_score_model(model, base_path, test_every=40)

Model1_64_0:	15579 / 47677 = 32.68%
Model1_64_40:	21454 / 47677 = 45.00%
Model1_64_80:	22135 / 47677 = 46.43%
Model1_64_120:	22525 / 47677 = 47.25%
Model1_64_160:	22844 / 47677 = 47.91%
Model1_64_200:	22797 / 47677 = 47.82%
Done!


### Model1_128

In [None]:
# Score model1_128 at different epochs

model1 = Model1(vendors=vendors_tensor, d_fc=128)
base_path = "Models/give_5/adam_no_scheduler/model1_128_epoch"

load_score_model(model, base_path, test_every=40)

Model1_0:	15844 / 47677 = 33.23%
Model1_40:	21924 / 47677 = 45.98%
Model1_80:	22116 / 47677 = 46.39%
Model1_120:	22375 / 47677 = 46.93%
Model1_160:	22801 / 47677 = 47.82%
Model1_200:	22547 / 47677 = 47.29%


### Model1_64s

In [12]:
# Score model1_64s at different epochs

model1 = Model1(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model1_64s_epoch"
load_score_model(model, base_path)

Model1_64s_0:	15644 / 47677 = 32.81%
Model1_64s_20:	21455 / 47677 = 45.00%
Model1_64s_40:	21628 / 47677 = 45.36%
Model1_64s_60:	22108 / 47677 = 46.37%
Model1_64s_80:	22378 / 47677 = 46.94%
Model1_64s_100:	22836 / 47677 = 47.90%
Model1_64s_120:	22922 / 47677 = 48.08%
Model1_64s_140:	22967 / 47677 = 48.17%
Model1_64s_160:	23027 / 47677 = 48.30%
Model1_64s_180:	23044 / 47677 = 48.33%
Model1_64s_200:	23041 / 47677 = 48.33%
Done!


## Model2

In [None]:
# Score model2s at different epochs

model = Model2(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model2s_epoch"
load_score_model(model, base_path)

## Model3

In [25]:
# Score model3s at different epochs

model = Model3(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model3s_epoch"
load_score_model(model, base_path)

Model3_0:	17206 / 47677 = 36.09%
Model3_20:	21731 / 47677 = 45.58%
Model3_40:	21939 / 47677 = 46.02%
Model3_60:	22209 / 47677 = 46.58%
Model3_80:	22862 / 47677 = 47.95%
Model3_100:	23002 / 47677 = 48.25%
Model3_120:	22947 / 47677 = 48.13%
Model3_140:	23112 / 47677 = 48.48%
Model3_160:	23155 / 47677 = 48.57%
Model3_180:	23144 / 47677 = 48.54%
Model3_200:	23141 / 47677 = 48.54%
Done!


## Model4

In [27]:
# Score model4 at different epochs

model = Model4(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model4_64s_epoch"
load_score_model(model, base_path)

Model4_0:	15939 / 47677 = 33.43%
Model4_20:	20365 / 47677 = 42.71%
Model4_40:	21525 / 47677 = 45.15%
Model4_60:	21288 / 47677 = 44.65%
Model4_80:	21902 / 47677 = 45.94%
Model4_100:	22149 / 47677 = 46.46%
Model4_120:	22739 / 47677 = 47.69%
Model4_140:	22912 / 47677 = 48.06%
Model4_160:	22943 / 47677 = 48.12%
Model4_180:	23068 / 47677 = 48.38%
Done!


## Model5

In [12]:
# Score model5 at different epochs

model = Model5(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model5_64s_epoch"
load_score_model(model, base_path)

Model5_0:	17857 / 47677 = 37.45%
Model5_20:	23909 / 47677 = 50.15%
Model5_40:	24169 / 47677 = 50.69%
Model5_60:	24585 / 47677 = 51.57%
Model5_80:	24437 / 47677 = 51.26%
Model5_100:	24934 / 47677 = 52.30%
Model5_120:	24973 / 47677 = 52.38%
Model5_140:	24989 / 47677 = 52.41%
Model5_160:	25039 / 47677 = 52.52%
Model5_180:	25000 / 47677 = 52.44%
Model5_200:	25059 / 47677 = 52.56%
Done!


## Model6

In [34]:
# Score model6 at different epochs

model = Model6(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model6_64s_epoch"
load_score_model(model, base_path, epochs=301)

Model6_0:	16565 / 47677 = 34.74%
Model6_20:	23492 / 47677 = 49.27%
Model6_40:	24497 / 47677 = 51.38%
Model6_60:	24611 / 47677 = 51.62%
Model6_80:	24778 / 47677 = 51.97%
Model6_100:	25080 / 47677 = 52.60%
Model6_120:	25098 / 47677 = 52.64%
Model6_140:	25170 / 47677 = 52.79%
Model6_160:	25168 / 47677 = 52.79%
Model6_180:	25201 / 47677 = 52.86%
Model6_200:	25158 / 47677 = 52.77%
Model6_220:	25150 / 47677 = 52.75%
Model6_240:	25184 / 47677 = 52.82%
Model6_260:	25167 / 47677 = 52.79%
Model6_280:	25173 / 47677 = 52.80%
Model6_300:	25158 / 47677 = 52.77%
Done!


## Model7

### Model7_64

In [12]:
# Score model7 at different epochs

model = Model7(vendors=vendors_tensor, d_fc=64)
base_path = "Models/give_5/adamw_scheduler/model7_64s_epoch"
load_score_model(model, base_path)

Model7_0:	16003 / 47677 = 33.57%
Model7_20:	25734 / 47677 = 53.98%
Model7_40:	25703 / 47677 = 53.91%
Model7_60:	26025 / 47677 = 54.59%
Model7_80:	25891 / 47677 = 54.31%
Model7_100:	26020 / 47677 = 54.58%
Model7_120:	26135 / 47677 = 54.82%
Model7_140:	26127 / 47677 = 54.80%
Model7_160:	26133 / 47677 = 54.81%
Model7_180:	26134 / 47677 = 54.81%
Model7_200:	26145 / 47677 = 54.84%
Done!


### Model7_128

In [13]:
# Score model7 at different epochs

model = Model7(vendors=vendors_tensor, d_fc=128)
base_path = "Models/give_5/adamw_scheduler/model7_128s_epoch"
load_score_model(model, base_path)

Model7_0:	17695 / 47677 = 37.11%
Model7_20:	25597 / 47677 = 53.69%
Model7_40:	26191 / 47677 = 54.93%
Model7_60:	25941 / 47677 = 54.41%
Model7_80:	26117 / 47677 = 54.78%
Model7_100:	25946 / 47677 = 54.42%
Model7_120:	25983 / 47677 = 54.50%
Model7_140:	26130 / 47677 = 54.81%
Model7_160:	26166 / 47677 = 54.88%
Model7_180:	26120 / 47677 = 54.79%
Model7_200:	26199 / 47677 = 54.95%
Done!


### Model7_256

In [16]:
# Score model7 at different epochs

model = Model7(vendors=vendors_tensor, d_fc=256)
base_path = "Models/give_5/adamw_scheduler/model7_256s_epoch"
load_score_model(model, base_path, test_every=10)

Model1_64_0:	16446 / 47677 = 34.49%
Model1_64_10:	25642 / 47677 = 53.78%
Model1_64_20:	25940 / 47677 = 54.41%
Model1_64_30:	25849 / 47677 = 54.22%
Model1_64_40:	25884 / 47677 = 54.29%
Model1_64_50:	26168 / 47677 = 54.89%
Model1_64_60:	25978 / 47677 = 54.49%
Model1_64_70:	26071 / 47677 = 54.68%
Model1_64_80:	26087 / 47677 = 54.72%
Model1_64_90:	26166 / 47677 = 54.88%
Model1_64_100:	26009 / 47677 = 54.55%
Model1_64_110:	26092 / 47677 = 54.73%
Model1_64_120:	25960 / 47677 = 54.45%
Model1_64_130:	26276 / 47677 = 55.11%
Model1_64_140:	26319 / 47677 = 55.20%
Model1_64_150:	26354 / 47677 = 55.28%
Model1_64_160:	26321 / 47677 = 55.21%
Model1_64_170:	26355 / 47677 = 55.28%
Model1_64_180:	26327 / 47677 = 55.22%
Model1_64_190:	26323 / 47677 = 55.21%
Model1_64_200:	26288 / 47677 = 55.14%
Done!


## Model8

In [18]:
# Score model8 at different epochs

model = Model8(vendors=vendors_tensor, d_fc=256)
base_path = "Models/give_5/adamw_scheduler/model8_256s_epoch"
load_score_model(model, base_path, test_every=10, epochs=300)

Model_0:	17935 / 47677 = 37.62%
Model_10:	25130 / 47677 = 52.71%
Model_20:	26079 / 47677 = 54.70%
Model_30:	26113 / 47677 = 54.77%
Model_40:	26112 / 47677 = 54.77%
Model_50:	26166 / 47677 = 54.88%
Model_60:	26058 / 47677 = 54.66%
Model_70:	26257 / 47677 = 55.07%
Model_80:	26055 / 47677 = 54.65%
Model_90:	26071 / 47677 = 54.68%
Model_100:	26099 / 47677 = 54.74%
Model_110:	26063 / 47677 = 54.67%
Model_120:	26077 / 47677 = 54.70%
Model_130:	26052 / 47677 = 54.64%
Model_140:	26190 / 47677 = 54.93%
Model_150:	26066 / 47677 = 54.67%
Model_160:	26273 / 47677 = 55.11%
Model_170:	26299 / 47677 = 55.16%
Model_180:	26326 / 47677 = 55.22%
Model_190:	26348 / 47677 = 55.26%
Model_200:	26308 / 47677 = 55.18%
Model_210:	26306 / 47677 = 55.18%
Model_220:	26324 / 47677 = 55.21%
Model_230:	26332 / 47677 = 55.23%
Model_240:	26333 / 47677 = 55.23%
Model_250:	26334 / 47677 = 55.23%
Model_260:	26334 / 47677 = 55.23%
Model_270:	26334 / 47677 = 55.23%
Model_280:	26334 / 47677 = 55.23%
Model_290:	26334 / 47677 

## Model9

In [17]:
# Score model8 at different epochs

model = Model9(vendors=vendors_tensor, d_fc=256)
base_path = "Models/give_5/adamw_scheduler/model9_256s_epoch"
load_score_model(model, base_path, test_every=20, epochs=200)

Model_0:	17330 / 47677 = 36.35%
Model_20:	25648 / 47677 = 53.80%
Model_40:	25924 / 47677 = 54.37%
Model_60:	26018 / 47677 = 54.57%
Model_80:	26040 / 47677 = 54.62%
Model_100:	25886 / 47677 = 54.29%
Model_120:	26060 / 47677 = 54.66%
Model_140:	26294 / 47677 = 55.15%
Model_160:	26302 / 47677 = 55.17%
Model_180:	26328 / 47677 = 55.22%
Done!


## Results Summary

In [19]:
print("SCORING\n=======================================================")
print("Random:\t\t2384 / 47677 = 5.00% ")
baseline1_score = 0
baseline2_score = 0
for i, (c_seq, v_id) in enumerate(test_loader_6):
    target = v_id.item()
    c_seq_list = c_seq.view(-1).tolist()
    baseline1_score += baseline1_scoring(target=target, k_most_popular=most_popular_5)
    baseline2_score += baseline2_scoring(seq=c_seq_list, target=target, k_most_popular=most_popular_5)
print(f'Baseline1:\t{baseline1_score} / {num_trials_6} = {baseline1_score*100/num_trials_6:.2f}%')
print(f'Baseline2:\t{baseline2_score} / {num_trials_6} = {baseline2_score*100/num_trials_6:.2f}%\n')

x = [(Model1, "model1_64s_epoch200.pt"), (Model2, "model2s_epoch200.pt"), (Model3, "model3s_epoch200.pt"), (Model4, "model4_64s_epoch190.pt"), (Model5, "model5_64s_epoch200.pt"), (Model6, "model6_64s_epoch200.pt"), (Model7, "model7_256s_epoch200.pt"), (Model8, "model8_256s_epoch200.pt"), (Model9, "model9_256s_epoch200.pt")]
for j, p in enumerate(x):
    Model, path = p[0], p[1]
    test_model = Model(vendors=vendors_tensor)
    PATH = "Models/give_5/adamw_scheduler/" + path
    checkpoint = torch.load(PATH)
    test_model.load_state_dict(checkpoint['model_state_dict'])
    test_model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=v_id, model=test_model, k=5)
    print(f'Model{j+1}:\t\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
    print(f'\t\tNum parameters: {sum([p.numel() for p in test_model.parameters()])}')
   

SCORING
Random:		2384 / 47677 = 5.00% 
Baseline1:	9233 / 47677 = 19.37%
Baseline2:	24204 / 47677 = 50.77%

Model1:		23041 / 47677 = 48.33%
		Num parameters: 17506
Model2:		16074 / 47677 = 33.71%
		Num parameters: 17506
Model3:		23141 / 47677 = 48.54%
		Num parameters: 16759
Model4:		23069 / 47677 = 48.39%
		Num parameters: 16887
Model5:		25059 / 47677 = 52.56%
		Num parameters: 28698
Model6:		25158 / 47677 = 52.77%
		Num parameters: 28698


# Examine Outputs

In [20]:
# Load best model for testing

PATH = "Models/give_5/adamw_scheduler/model7_256s_epoch150.pt"
checkpoint = torch.load(PATH)
test_model = Model7(vendors_tensor, d_fc=256)
test_model.load_state_dict(checkpoint['model_state_dict'])
test_model.eval()

test_loader_6 = DataLoader(test_sequences_padded_dataset_6, batch_size=1, shuffle=False)
test_iter = iter(test_loader_6)

In [38]:
# Visualize model predictions against baseline predictions

num_tests = 25
print('Test\tInputs\t\t\tTarget\tPreds (Ordered)\t\tModel\tBase2\n=================================================================================')
for i in range(num_tests):
    c_seq, v_id = test_iter.next()

    m_preds = model_topk(c_seq, test_model)
    m_right = "X" if v_id in m_preds == 1 else " "
    
    b_pred = baseline2_scoring(seq=c_seq.view(-1).tolist(), target=v_id, k_most_popular=most_popular_5)
    b_right = "X" if b_pred == 1 else " "
    print(f'{i}:\t{c_seq.tolist()}\t{v_id.item()}\t{m_preds}\t{m_right}\t{b_right}')

Test	Inputs			Target	Preds (Ordered)		Model	Base2
0:	[[0, 0, 0, 0, 0]]	57	[18, 14, 25, 28, 20]	 	 
1:	[[0, 0, 0, 0, 57]]	57	[18, 14, 25, 28, 20]	X	X
2:	[[0, 0, 0, 0, 0]]	62	[18, 14, 25, 28, 20]	 	 
3:	[[0, 0, 0, 0, 0]]	74	[18, 14, 25, 28, 20]	 	 
4:	[[0, 0, 0, 0, 0]]	42	[18, 14, 25, 28, 20]	 	 
5:	[[0, 0, 0, 0, 42]]	60	[18, 14, 25, 28, 20]	 	 
6:	[[0, 0, 0, 42, 60]]	93	[18, 14, 25, 28, 20]	 	 
7:	[[0, 0, 0, 0, 0]]	38	[18, 14, 25, 28, 20]	 	 
8:	[[0, 0, 0, 0, 38]]	83	[18, 14, 25, 28, 20]	X	 
9:	[[0, 0, 0, 0, 0]]	94	[18, 14, 25, 28, 20]	 	 
10:	[[0, 0, 0, 0, 0]]	58	[18, 14, 25, 28, 20]	 	 
11:	[[0, 0, 0, 0, 0]]	45	[18, 14, 25, 28, 20]	 	 
12:	[[0, 0, 0, 0, 45]]	25	[18, 14, 25, 28, 20]	 	X
13:	[[0, 0, 0, 45, 25]]	85	[18, 14, 25, 28, 20]	 	 
14:	[[0, 0, 45, 25, 85]]	31	[18, 14, 25, 28, 20]	 	 
15:	[[0, 45, 25, 85, 31]]	31	[18, 14, 25, 28, 20]	X	X
16:	[[45, 25, 85, 31, 31]]	52	[18, 14, 25, 28, 20]	 	 
17:	[[25, 85, 31, 31, 52]]	31	[18, 14, 25, 28, 20]	X	X
18:	[[85, 31, 31, 52, 31]]	74	[18, 

In [39]:
# The model predicts popular vendors for first time customers (zero tensors)

c_seq = torch.tensor([[0, 0, 0, 0, 0]])
m_preds = model_topk(c_seq, test_model).tolist()

print(f'Most popular:\t{popular_vendors.head(5).index.tolist()}')
print(f'Model(zero):\t{m_preds}')

Most popular:	[28, 25, 14, 19, 18]
Model(zero):	[18, 14, 25, 28, 20]


In [40]:
test_loader_6 = DataLoader(test_sequences_padded_dataset_6, batch_size=1, shuffle=False)
test_iter = iter(test_loader_6)

In [41]:
# Visualize model predictions against baseline predictions

num_tests = 25
print('Test\tInputs\t\t\tTarget\tPreds (Ordered)\t\tModel\tBase2\n=================================================================================')
for i in range(num_tests):
    c_seq, v_id = test_iter.next()

    m_pred = model_scoring(c_seq, v_id, test_model)
    m_right = "X" if m_pred == 1 else " "
    
    b_pred = baseline2_scoring(seq=c_seq.view(-1).tolist(), target=v_id, k_most_popular=most_popular_5)
    b_right = "X" if b_pred == 1 else " "
    print(f'{i}:\t{c_seq.tolist()}\t{v_id.item()}\t{m_preds}\t{m_right}\t{b_right}')

Test	Inputs			Target	Preds (Ordered)		Model	Base2
0:	[[0, 0, 0, 0, 0]]	57	[18, 14, 25, 28, 20]	 	 
1:	[[0, 0, 0, 0, 57]]	57	[18, 14, 25, 28, 20]	X	X
2:	[[0, 0, 0, 0, 0]]	62	[18, 14, 25, 28, 20]	 	 
3:	[[0, 0, 0, 0, 0]]	74	[18, 14, 25, 28, 20]	 	 
4:	[[0, 0, 0, 0, 0]]	42	[18, 14, 25, 28, 20]	 	 
5:	[[0, 0, 0, 0, 42]]	60	[18, 14, 25, 28, 20]	 	 
6:	[[0, 0, 0, 42, 60]]	93	[18, 14, 25, 28, 20]	 	 
7:	[[0, 0, 0, 0, 0]]	38	[18, 14, 25, 28, 20]	 	 
8:	[[0, 0, 0, 0, 38]]	83	[18, 14, 25, 28, 20]	X	 
9:	[[0, 0, 0, 0, 0]]	94	[18, 14, 25, 28, 20]	 	 
10:	[[0, 0, 0, 0, 0]]	58	[18, 14, 25, 28, 20]	 	 
11:	[[0, 0, 0, 0, 0]]	45	[18, 14, 25, 28, 20]	 	 
12:	[[0, 0, 0, 0, 45]]	25	[18, 14, 25, 28, 20]	 	X
13:	[[0, 0, 0, 45, 25]]	85	[18, 14, 25, 28, 20]	 	 
14:	[[0, 0, 45, 25, 85]]	31	[18, 14, 25, 28, 20]	 	 
15:	[[0, 45, 25, 85, 31]]	31	[18, 14, 25, 28, 20]	X	X
16:	[[45, 25, 85, 31, 31]]	52	[18, 14, 25, 28, 20]	 	 
17:	[[25, 85, 31, 31, 52]]	31	[18, 14, 25, 28, 20]	X	X
18:	[[85, 31, 31, 52, 31]]	74	[18, 

In [45]:
test_iter = iter(test_loader_6)

seq_len = 5
hit = torch.zeros([3, seq_len+2], dtype=torch.long)     # Rows: base2_score, model_score, total
                                                        # Cols: 0-5 num_orders, total

for c_seq, v_id in tqdm(test_iter):
    num_orders = torch.count_nonzero(c_seq).item()
    
    b_score = baseline2_scoring(seq=c_seq.view(-1).tolist(), target=v_id, k_most_popular=most_popular_5)
    hit[0, num_orders] += b_score
    hit[0, -1] += b_score
    
    m_score = model_scoring(c_seq, v_id, test_model)
    hit[1, num_orders] += m_score
    hit[1, -1] += m_score

    hit[2, num_orders] += 1
    hit[2, -1] += 1

r0 = ''.join([f'\t{hit[0,i]}' for i in range(seq_len+1)])
r1 = ''.join([f'\t{hit[1,i]}' for i in range(seq_len+1)])
r2 = ''.join([f'\t{hit[2,i]}' for i in range(seq_len+1)])
r3 = ''.join([f'\t{hit[0,i] / hit[2,i] * 100:.1f}%' for i in range(seq_len+1)])
r4 = ''.join([f'\t{hit[1,i] / hit[2,i] * 100:.1f}%' for i in range(seq_len+1)])
r5 = ''.join([f'\t{torch.sum(hit[0,i:]).item() / torch.sum(hit[2,i:]).item() * 100:.1f}%' for i in range(seq_len+1)])
r6 = ''.join([f'\t{torch.sum(hit[1,i:]).item() / torch.sum(hit[2,i:]).item() * 100:.1f}%' for i in range(seq_len+1)])

print('\t\t\t\t# Orders == i')
print('\t\t  0\t  1\t  2\t  3\t  4\t  5')
print('___________________________________________________________________')
print('Base      |' + r0)
print('Model     |' + r1)
print('Total     |' + r2)
print('___________________________________________________________________')
print('=i B/T %     |' + r3)
print('=i M/T %     |' + r4)
print('___________________________________________________________________')
print('>=i B/T % |' + r5)
print('>=i M/T % |' + r6)


100%|██████████| 47677/47677 [03:39<00:00, 216.94it/s]

				# Orders == i
		  0	  1	  2	  3	  4	  5
___________________________________________________________________
Base      |	2070	3246	2648	2252	1887	12101
Model     |	2080	3834	3021	2555	2053	12811
Total     |	10703	6847	4962	3844	3066	18255
___________________________________________________________________
=i B/T %     |	19.3%	47.4%	53.4%	58.6%	61.5%	66.3%
=i M/T %     |	19.4%	56.0%	60.9%	66.5%	67.0%	70.2%
___________________________________________________________________
>=i B/T % |	50.8%	59.9%	62.7%	64.5%	65.6%	66.3%
>=i M/T % |	55.3%	65.7%	67.8%	69.2%	69.7%	70.2%



