# Testing Models

Testing recommender systems is less intuitive than testing other predictive models. There are many common metrics to use, which can be found [here in this useful Medium article](https://medium.com/swlh/rank-aware-recsys-evaluation-metrics-5191bba16832). The idea of these methods are all similar: generate a list of top ranked items to show the user, and your score is based on how many of the items you recommend are relevant to the user. The word "relevant" is not well-defined, but normally means that the user has interacted with this item in the past. 

In our case, our items are the restaurants and the interactions are orders. One issue with our data is that between 30-40% of our users have only ordered from a single restaurant, which makes the above methods tricky/impossible to apply.

Recall that our model takes in a user's past five (5) orders as inputs together with one restaurant R and tries to predict how likely the user will order from R next. We will evaluate our model as follows. Given a user's sequence of five restaurants, and given the target restaurant R which is next in the sequence, we use the model to generate a ranked list of $k=5$ more restaurants that it thinks is the most likely to be ordered from next. If R is among the $k$ generated restaurants, we add $+1$ to the score, otherwise $+0$. 

Since there are only $100$ restaurants to recommend from, it should be very feasible to simply score each one, sort the rankings, and slice the top 5 as recommendations. This is a very privillaged position we are in since many recommender systems are deployed in a context where there are millions of items to recommend from. In that case, one could perform clustering based on the customer and vendor embeddings before ranking within the clusters.  

Each user in the test set may possibly geenerate many length $5+1$ sequences, and we shall use all of these during the evaluation process. In the deployed version of the model, we could simply give recommendations based on the last $5$ orders made by the user.

Baselines:
1) Recommend the $5$ most popular restaurants.
2) Given a sequence of $5$ orders, $m$ of which are unique and not null, recommend those restaurants together with the $5-m$ most popular restaurants. 

We will define 'popularity' of a vendor to be equal to its number of orders in the training data. Note that the unordered set of top five most popular vendors would be the same if we changed the definition of popularity of a vendor to be equal to its number of unique customers (although their rankings are slightly different, see the last cell of "Munging.ipynb"). 

In [1]:
import pandas as pd
import pickle
import torch
from PreprocessingHelpers import CustomDataset
from torch.utils.data import Dataset, DataLoader
from Models.Models import Model1, Model2, Model3, Model4, Model5, Model6, Model7

## Load Test Data

In [2]:
with open("ProcessedData/test_sequences_padded_dataset_6.pkl", "rb") as file:
    test_sequences_padded_dataset_6 = pickle.load(file)
with open("ProcessedData/test_sequences_padded_dataset_4.pkl", "rb") as file:
    test_sequences_padded_dataset_4 = pickle.load(file)

test_loader_6 = DataLoader(test_sequences_padded_dataset_6, batch_size=1, shuffle=True)
test_loader_4 = DataLoader(test_sequences_padded_dataset_4, batch_size=1, shuffle=True)
num_trials_6 = test_sequences_padded_dataset_6.vendor.shape[0]
num_trials_4 = test_sequences_padded_dataset_4.vendor.shape[0]

In [3]:
with open("ProcessedData/vendors_tensor.pkl", "rb") as file:
    vendors_tensor = pickle.load(file)

In [4]:
with open("ProcessedData/popular_vendors.pkl", "rb") as file:
    popular_vendors = pickle.load(file)

## Define Scoring for Model & Baselines

In [5]:
popular_vendors.head(10)

id
28    3237
25    2717
14    2681
19    2453
18    2184
15    2159
75    1377
21    1274
36    1145
6     1078
Name: num_orders, dtype: int64

In [6]:
def get_most_popular(vendors, k):
    return vendors[:k].index.tolist()

In [7]:
most_popular_5 = get_most_popular(popular_vendors, 5)
most_popular_3 = get_most_popular(popular_vendors, 3)

most_popular_5

[28, 25, 14, 19, 18]

In [8]:
def baseline1_scoring(target:int, k_most_popular:list):
    if target in k_most_popular:
        return 1
    else:
        return 0

In [9]:
def baseline2_scoring(seq:list, target:int, k_most_popular:list):
    seq = list(set(seq))    # remove duplicates
    try:
        seq.remove(0)       # remove null-token
    except ValueError:
        pass
    m = len(seq)
    seq = seq + k_most_popular[:-m]
    if target in seq:
        return 1
    else:
        return 0

In [10]:
def model_scoring(seq:torch.tensor, target:int, model, k:int=5):
    seq = seq.view(1, -1)
    y = torch.ones([100, 1], dtype=torch.long)      # 100 vendors
    seq_dupe = y @ seq                              # 100 x 5 matrix
    v_ids = torch.arange(start=1, end=101, dtype=torch.long)
    rankings = model.forward(c_seq=seq_dupe, v_id=v_ids).view(-1)    
    top_k = torch.topk(rankings, k)[1][:5]          # Essentially argmax for top k
    top_k = top_k + 1                               # Shift indices to match vendors
    if target in top_k:
        return 1
    else:
        return 0

# Give 5 Recommendations

## Baselines

In [27]:
# Score baselines

print("SCORING\n=======================================================")
print("Random:\t\t2384 / 47677 = 5.00% ")
baseline1_score = 0
baseline2_score = 0
for i, (c_seq, v_id) in enumerate(test_loader_6):
    target = v_id.item()
    c_seq_list = c_seq.view(-1).tolist()
    baseline1_score += baseline1_scoring(target=target, k_most_popular=most_popular_5)
    baseline2_score += baseline2_scoring(seq=c_seq_list, target=target, k_most_popular=most_popular_5)
print(f'Baseline1:\t{baseline1_score} / {num_trials_6} = {baseline1_score*100/num_trials_6:.2f}%')
print(f'Baseline2:\t{baseline2_score} / {num_trials_6} = {baseline2_score*100/num_trials_6:.2f}%')
print("")

SCORING
Random:		2384 / 47677 = 5.00% 
Baseline1:	9233 / 47677 = 19.37%
Baseline2:	22134 / 47677 = 46.42%



## Model1

In [11]:
# Score model1_64 at different epochs

model = Model1(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 40):
    PATH = "Models/give_5/adam_no_scheduler/model1_64_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model1_64_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
    print("Done!")

SCORING
Random:		2384 / 47677 = 5.00% 
Baseline1:	9233 / 47677 = 19.37%
Baseline2:	22134 / 47677 = 46.42%

Model1_0:	15579 / 47677 = 32.68%
Model1_40:	21454 / 47677 = 45.00%
Model1_80:	22135 / 47677 = 46.43%
Model1_120:	22525 / 47677 = 47.25%
Model1_160:	22844 / 47677 = 47.91%
Model1_200:	22797 / 47677 = 47.82%



In [12]:
# Score model1_64s at different epochs

model1 = Model1(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_5/adamw_scheduler/model1_64s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model1.load_state_dict(checkpoint['model_state_dict'])
    model1.eval()
    model1_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model1_score += model_scoring(seq=c_seq, target=target, model=model1, k=5)
    print(f'Model1_64s_{epoch}:\t{model1_score} / {num_trials_6} = {model1_score*100/num_trials_6:.2f}%')
print("Done!")

Model1_64s_0:	15644 / 47677 = 32.81%
Model1_64s_20:	21455 / 47677 = 45.00%
Model1_64s_40:	21628 / 47677 = 45.36%
Model1_64s_60:	22108 / 47677 = 46.37%
Model1_64s_80:	22378 / 47677 = 46.94%
Model1_64s_100:	22836 / 47677 = 47.90%
Model1_64s_120:	22922 / 47677 = 48.08%
Model1_64s_140:	22967 / 47677 = 48.17%
Model1_64s_160:	23027 / 47677 = 48.30%
Model1_64s_180:	23044 / 47677 = 48.33%
Model1_64s_200:	23041 / 47677 = 48.33%
Done!


In [24]:
# Score model1_128 at different epochs

model1 = Model1(vendors=vendors_tensor, d_fc=128)
for epoch in range(0, 201, 40):
    PATH = "Models/give_5/adam_no_scheduler/model1_128_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model1.load_state_dict(checkpoint['model_state_dict'])
    model1.eval()
    model1_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model1_score += model_scoring(seq=c_seq, target=target, model=model1, k=5)
    print(f'Model1_128_{epoch}:\t{model1_score} / {num_trials_6} = {model1_score*100/num_trials_6:.2f}%')

Model1_0:	15844 / 47677 = 33.23%
Model1_40:	21924 / 47677 = 45.98%
Model1_80:	22116 / 47677 = 46.39%
Model1_120:	22375 / 47677 = 46.93%
Model1_160:	22801 / 47677 = 47.82%
Model1_200:	22547 / 47677 = 47.29%


## Model2

In [16]:
# Score model2s at different epochs

model = Model2(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 101, 10):
    PATH = "Models/give_5/adamw_scheduler/model2s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model2_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

Model2_0:	10297 / 47677 = 21.60%
Model2_10:	14955 / 47677 = 31.37%
Model2_20:	14355 / 47677 = 30.11%
Model2_30:	13962 / 47677 = 29.28%
Model2_40:	13971 / 47677 = 29.30%
Model2_50:	14559 / 47677 = 30.54%
Model2_60:	14728 / 47677 = 30.89%
Model2_70:	14832 / 47677 = 31.11%
Model2_80:	14923 / 47677 = 31.30%
Model2_90:	15288 / 47677 = 32.07%
Model2_100:	15402 / 47677 = 32.30%
Done!


## Model3

In [25]:
# Score model3s at different epochs

model = Model3(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_5/adamw_scheduler/model3s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model3_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

Model3_0:	17206 / 47677 = 36.09%
Model3_20:	21731 / 47677 = 45.58%
Model3_40:	21939 / 47677 = 46.02%
Model3_60:	22209 / 47677 = 46.58%
Model3_80:	22862 / 47677 = 47.95%
Model3_100:	23002 / 47677 = 48.25%
Model3_120:	22947 / 47677 = 48.13%
Model3_140:	23112 / 47677 = 48.48%
Model3_160:	23155 / 47677 = 48.57%
Model3_180:	23144 / 47677 = 48.54%
Model3_200:	23141 / 47677 = 48.54%
Done!


## Model4

In [27]:
# Score model4 at different epochs

model = Model4(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 200, 20):
    PATH = "Models/give_5/adamw_scheduler/model4_64s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model4_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

Model4_0:	15939 / 47677 = 33.43%
Model4_20:	20365 / 47677 = 42.71%
Model4_40:	21525 / 47677 = 45.15%
Model4_60:	21288 / 47677 = 44.65%
Model4_80:	21902 / 47677 = 45.94%
Model4_100:	22149 / 47677 = 46.46%
Model4_120:	22739 / 47677 = 47.69%
Model4_140:	22912 / 47677 = 48.06%
Model4_160:	22943 / 47677 = 48.12%
Model4_180:	23068 / 47677 = 48.38%
Done!


## Model5

In [12]:
# Score model5 at different epochs

model = Model5(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_5/adamw_scheduler/model5_64s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model5_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

Model5_0:	17857 / 47677 = 37.45%
Model5_20:	23909 / 47677 = 50.15%
Model5_40:	24169 / 47677 = 50.69%
Model5_60:	24585 / 47677 = 51.57%
Model5_80:	24437 / 47677 = 51.26%
Model5_100:	24934 / 47677 = 52.30%
Model5_120:	24973 / 47677 = 52.38%
Model5_140:	24989 / 47677 = 52.41%
Model5_160:	25039 / 47677 = 52.52%
Model5_180:	25000 / 47677 = 52.44%
Model5_200:	25059 / 47677 = 52.56%
Done!


## Model6

In [11]:
# Score model6 at different epochs

model = Model6(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_5/adamw_scheduler/model6_64s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model6_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

RuntimeError: Error(s) in loading state_dict for Model6:
	Missing key(s) in state_dict: "fc5.weight", "fc5.bias". 
	size mismatch for fc4.weight: copying a param with shape torch.Size([1, 16]) from checkpoint, the shape in current model is torch.Size([8, 16]).
	size mismatch for fc4.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([8]).

## Model7

In [None]:
# Score model7 at different epochs

model = Model7(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_5/adamw_scheduler/model7_64s_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    model_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_6):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model_score += model_scoring(seq=c_seq, target=target, model=model, k=5)
    print(f'Model7_{epoch}:\t{model_score} / {num_trials_6} = {model_score*100/num_trials_6:.2f}%')
print("Done!")

# Give 3 Recommendations

In [14]:
# Score baselines

print("SCORING\n=======================================================")
print(f"Random:\t\t {num_trials_4 * 3 // 100} / {num_trials_4} = 3.00% ")
baseline1_score = 0
baseline2_score = 0
for i, (c_seq, v_id) in enumerate(test_loader_4):
    target = v_id.item()
    c_seq_list = c_seq.view(-1).tolist()
    baseline1_score += baseline1_scoring(target=target, k_most_popular=most_popular_3)
    baseline2_score += baseline2_scoring(seq=c_seq_list, target=target, k_most_popular=most_popular_3)
print(f'Baseline1:\t{baseline1_score} / {num_trials_4} = {baseline1_score*100/num_trials_4:.2f}%')
print(f'Baseline2:\t{baseline2_score} / {num_trials_4} = {baseline2_score*100/num_trials_4:.2f}%')
print("")

SCORING
Random:		 1430 / 47677 = 3.00% 
Baseline1:	6373 / 47677 = 13.37%
Baseline2:	20131 / 47677 = 42.22%



In [15]:
# Score model1s at different epochs

model1 = Model1(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_3/model1_64s_3r_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model1.load_state_dict(checkpoint['model_state_dict'])
    model1.eval()
    model1_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_4):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model1_score += model_scoring(seq=c_seq, target=target, model=model1, k=3)
    print(f'Model1_{epoch}:\t{model1_score} / {num_trials_4} = {model1_score*100/num_trials_4:.2f}%')
print("Done!")

Model1_0:	12129 / 47677 = 25.44%
Model1_20:	16319 / 47677 = 34.23%
Model1_40:	16770 / 47677 = 35.17%
Model1_60:	17771 / 47677 = 37.27%
Model1_80:	17488 / 47677 = 36.68%
Model1_100:	18172 / 47677 = 38.11%
Model1_120:	18705 / 47677 = 39.23%
Model1_140:	18538 / 47677 = 38.88%
Model1_160:	18741 / 47677 = 39.31%
Model1_180:	18876 / 47677 = 39.59%
Model1_200:	18943 / 47677 = 39.73%
Done!


In [17]:
# Score model2s at different epochs

model2 = Model2(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_3/model2_64s_3r_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model2.load_state_dict(checkpoint['model_state_dict'])
    model2.eval()
    model2_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_4):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model2_score += model_scoring(seq=c_seq, target=target, model=model2, k=3)
    print(f'Model2_{epoch}:\t{model2_score} / {num_trials_4} = {model2_score*100/num_trials_4:.2f}%')
print("Done!")

Model2_0:	8662 / 47677 = 18.17%
Model2_20:	10778 / 47677 = 22.61%
Model2_40:	10713 / 47677 = 22.47%
Model2_60:	11079 / 47677 = 23.24%
Model2_80:	11153 / 47677 = 23.39%
Model2_100:	11165 / 47677 = 23.42%
Model2_120:	11486 / 47677 = 24.09%
Model2_140:	11900 / 47677 = 24.96%
Model2_160:	11624 / 47677 = 24.38%
Model2_180:	11721 / 47677 = 24.58%
Model2_200:	11639 / 47677 = 24.41%
Done!


In [19]:
# Score model3s at different epochs

model3 = Model3(vendors=vendors_tensor, d_fc=64)
for epoch in range(0, 201, 20):
    PATH = "Models/give_3/model3s_3r_epoch"+str(epoch)+".pt"
    checkpoint = torch.load(PATH)
    model3.load_state_dict(checkpoint['model_state_dict'])
    model3.eval()
    model3_score = 0
    for i, (c_seq, v_id) in enumerate(test_loader_4):
        target = v_id.item()
        c_seq_list = c_seq.tolist()
        with torch.no_grad():
            model3_score += model_scoring(seq=c_seq, target=target, model=model3, k=3)
    print(f'Model3_{epoch}:\t{model3_score} / {num_trials_4} = {model3_score*100/num_trials_4:.2f}%')
print("Done!")

Model3_0:	12293 / 47677 = 25.78%
Model3_20:	17339 / 47677 = 36.37%
Model3_40:	17696 / 47677 = 37.12%
Model3_60:	18295 / 47677 = 38.37%
Model3_80:	18954 / 47677 = 39.76%
Model3_100:	19019 / 47677 = 39.89%
Model3_120:	19049 / 47677 = 39.95%
Model3_140:	19261 / 47677 = 40.40%
Model3_160:	19317 / 47677 = 40.52%
Model3_180:	19281 / 47677 = 40.44%
Model3_200:	19320 / 47677 = 40.52%
Done!


# Check

In [16]:
PATH = "Models/give_5/adamw_scheduler/model5_64s_epoch200.pt"
checkpoint = torch.load(PATH)
test_model = Model5(vendors_tensor)
test_model.load_state_dict(checkpoint['model_state_dict'])
test_model.eval()

Model5(
  (id_lookup): Embedding(101, 101)
  (emb_id): Linear(in_features=101, out_features=7, bias=True)
  (vendor_lookup): Embedding(101, 123)
  (emb_ptag): Linear(in_features=43, out_features=6, bias=True)
  (emb_vtag): Linear(in_features=68, out_features=7, bias=True)
  (c_emb): Linear(in_features=25, out_features=5, bias=True)
  (v_emb): Linear(in_features=25, out_features=5, bias=True)
  (fc1): Linear(in_features=24, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=32, bias=True)
  (fc3): Linear(in_features=32, out_features=16, bias=True)
  (fc4): Linear(in_features=16, out_features=8, bias=True)
  (fc5): Linear(in_features=8, out_features=1, bias=True)
)

In [32]:
num_tests = 25
test_iter = iter(test_loader_6)
print('Test\tInputs\t\t\tTarget\tPreds (Ordered)\t\tModel\tBase2\n=================================================================================')
for i in range(num_tests):
    seq, v_id = test_iter.next()

    y = torch.ones([100, 1], dtype=torch.long)      # 100 vendors
    seq_dupe = y @ seq                              # 100 x 5 matrix
    v_ids = torch.arange(start=1, end=101, dtype=torch.long)
    rankings = test_model.forward(c_seq=seq_dupe, v_id=v_ids).view(-1)    
    top_k = torch.topk(rankings, 5)[1][:5]          # Essentially argmax for top k
    top_k = top_k + 1

    m_preds = top_k.tolist()
    m_right = "X" if v_id in m_preds else " "
    
    b_pred = baseline2_scoring(seq=seq.view(-1).tolist(), target=v_id, k_most_popular=most_popular_5)
    b_right = "X" if b_pred == 1 else " "
    print(f'{i}:\t{seq.tolist()}\t{v_id.item()}\t{m_preds}\t{m_right}\t{b_right}')

Test	Inputs			Target	Preds (Ordered)		Model	Base2
0:	[[0, 0, 0, 0, 62]]	20	[62, 17, 36, 59, 25]	 	 
1:	[[0, 0, 0, 0, 18]]	75	[18, 25, 19, 20, 45]	 	 
2:	[[0, 0, 0, 0, 0]]	8	[18, 14, 28, 20, 25]	 	 
3:	[[22, 32, 22, 19, 32]]	22	[32, 22, 19, 25, 65]	X	X
4:	[[19, 81, 81, 86, 98]]	98	[88, 19, 95, 57, 75]	 	X
5:	[[0, 0, 0, 0, 20]]	8	[20, 18, 25, 19, 21]	 	 
6:	[[15, 15, 15, 15, 15]]	15	[15, 26, 5, 98, 28]	X	X
7:	[[0, 0, 0, 0, 0]]	6	[18, 14, 28, 20, 25]	 	 
8:	[[28, 15, 28, 15, 28]]	15	[28, 15, 90, 26, 86]	X	X
9:	[[0, 15, 73, 15, 26]]	15	[15, 28, 78, 86, 98]	X	X
10:	[[11, 98, 98, 98, 98]]	98	[98, 99, 90, 15, 28]	X	X
11:	[[0, 0, 20, 20, 8]]	21	[18, 20, 25, 19, 21]	X	 
12:	[[0, 0, 0, 0, 0]]	41	[18, 14, 28, 20, 25]	 	 
13:	[[0, 0, 0, 0, 19]]	21	[19, 25, 21, 75, 95]	X	 
14:	[[0, 0, 0, 0, 28]]	14	[28, 14, 26, 15, 5]	X	X
15:	[[0, 0, 21, 8, 34]]	80	[8, 25, 34, 19, 21]	 	 
16:	[[0, 0, 0, 0, 0]]	52	[18, 14, 28, 20, 25]	 	 
17:	[[28, 28, 28, 28, 28]]	28	[28, 14, 26, 15, 5]	X	X
18:	[[0, 0, 0, 49, 7]]	3

In [24]:
popular_vendors.head(20)

id
28    3237
25    2717
14    2681
19    2453
18    2184
15    2159
75    1377
21    1274
36    1145
6     1078
26    1074
20    1042
46     975
95     965
3      960
1      905
22     848
30     845
45     842
24     839
Name: num_orders, dtype: int64