# Intro to Recommender Systems Lab

Complete the exercises below to solidify your knowledge and understanding of recommender systems.

For this lab, we are going to be putting together a user similarity based recommender system in a step-by-step fashion. Our data set contains customer grocery purchases, and we will use similar purchase behavior to inform our recommender system. Our recommender system will generate 5 recommendations for each customer based on the purchases they have made.

In [1]:
import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist, squareform

In [2]:
data = pd.read_csv('../data/customer_product_sales.csv')

In [3]:
data.head()

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
0,61288,Rosa,Andersen,134196,229,Bread - Hot Dog Buns,16
1,77352,Myron,Murray,6167892,229,Bread - Hot Dog Buns,20
2,40094,Susan,Stevenson,5970885,229,Bread - Hot Dog Buns,11
3,23548,Tricia,Vincent,6426954,229,Bread - Hot Dog Buns,6
4,78981,Scott,Burch,819094,229,Bread - Hot Dog Buns,20


## Step 1: Create a data frame that contains the total quantity of each product purchased by each customer.

You will need to group by CustomerID and ProductName and then sum the Quantity field.

In [4]:
# group by customer and product
data_cus_pro = data.groupby(['CustomerID', 'ProductName'])[['Quantity']].sum().reset_index()
data_cus_pro

Unnamed: 0,CustomerID,ProductName,Quantity
0,33,Apricots - Dried,1
1,33,Assorted Desserts,1
2,33,Bandage - Flexible Neon,1
3,33,"Bar Mix - Pina Colada, 355 Ml",1
4,33,"Beans - Kidney, Canned",1
...,...,...,...
63623,98200,Vol Au Vents,50
63624,98200,Wasabi Powder,25
63625,98200,Wine - Fume Blanc Fetzer,25
63626,98200,Wine - Hardys Bankside Shiraz,25


In [5]:
# check
data.loc[(data['CustomerID'] == 98200) &
         (data['ProductName'] == 'Vol Au Vents')]

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
4528,98200,Sammy,Rocha,2426936,136,Vol Au Vents,25
4529,98200,Sammy,Rocha,2733410,136,Vol Au Vents,25


In [6]:
len(data['CustomerID'].unique())

1000

In [7]:
len(data['ProductName'].unique())

452

## Step 2: Use the `pivot_table` method to create a product by customer matrix.

The rows of the matrix should represent the products, the columns should represent the customers, and the values should be the quantities of each product purchased by each customer. You will also need to replace nulls with zeros, which you can do using the `fillna` method.

In [9]:
# pivot table for products
data_pivot_pro = pd.pivot_table(data_cus_pro, 
                                values='Quantity', 
                                index=['ProductName'],
                                columns=['CustomerID'], 
                                aggfunc=np.sum, 
                                fill_value=0) # fill_value attribute instead of fillna method
data_pivot_pro

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
ProductName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Anchovy Paste - 56 G Tube,0,0,0,0,0,0,0,1,0,0,...,0,25,0,0,0,0,0,0,0,0
"Appetizer - Mini Egg Roll, Shrimp",0,0,0,0,0,0,0,0,0,0,...,25,25,0,0,0,0,0,0,0,0
Appetizer - Mushroom Tart,0,0,0,0,0,0,0,1,0,0,...,25,0,0,0,0,0,0,0,25,0
Appetizer - Sausage Rolls,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,25,25,25,0,25,0
Apricots - Dried,1,0,0,0,1,0,0,0,0,0,...,0,25,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Yeast Dry - Fermipan,0,0,0,0,0,0,0,0,0,0,...,0,0,0,25,0,0,0,0,0,0
Yoghurt Tubes,0,0,0,0,0,0,0,0,2,0,...,0,0,0,0,0,0,0,25,0,0
"Yogurt - Blueberry, 175 Gr",0,1,0,0,0,0,0,0,0,0,...,25,0,0,25,0,0,0,0,0,0
Yogurt - French Vanilla,1,0,0,1,0,0,2,0,0,1,...,0,0,25,0,0,0,0,0,0,25


In [10]:
# check
data.loc[(data['CustomerID'] == 33) &
         (data['ProductName'] == 'Apricots - Dried')]

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
45323,33,Lindsay,Santana,172592,324,Apricots - Dried,1


In [11]:
# check
data.loc[(data['CustomerID'] == 33) &
         (data['ProductName'] == 'Yoghurt Tubes')]

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity


## Step 3: Create a customer similarity matrix using `squareform` and `pdist`. For the distance metric, choose "euclidean."

In [12]:
# pivot table for customers
# traspose previous matrix or pivot_table to original df where columns is ProductName and index is CustomerID
data_pivot_cus = data_pivot_pro.transpose() 
data_pivot_cus

ProductName,Anchovy Paste - 56 G Tube,"Appetizer - Mini Egg Roll, Shrimp",Appetizer - Mushroom Tart,Appetizer - Sausage Rolls,Apricots - Dried,Apricots - Halves,Apricots Fresh,Arizona - Green Tea,Artichokes - Jerusalem,Assorted Desserts,...,"Wine - White, Colubia Cresh","Wine - White, Mosel Gold","Wine - White, Schroder And Schyl",Wine - Wyndham Estate Bin 777,Wonton Wrappers,Yeast Dry - Fermipan,Yoghurt Tubes,"Yogurt - Blueberry, 175 Gr",Yogurt - French Vanilla,Zucchini - Yellow
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,0,0,0,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0
200,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,0,0
264,0,0,0,0,0,1,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
356,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
412,0,0,0,0,1,0,0,0,0,0,...,0,1,1,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,0,0,0,25,0,50,0,25,0,0,...,0,25,25,0,0,0,0,0,0,0
98069,0,0,0,25,0,25,0,0,0,25,...,0,0,0,0,0,0,0,0,0,0
98159,0,0,0,0,0,0,0,0,0,0,...,0,50,0,0,0,0,25,0,0,0
98185,0,0,25,25,0,25,0,0,0,0,...,0,0,0,25,0,0,0,0,0,0


In [13]:
# dist
data_dist_cus = pdist(X=data_pivot_cus, metric='euclidean')
data_dist_cus

array([ 11.91637529,  10.48808848,  11.22497216, ..., 304.13812651,
       305.16389039, 303.10889132])

In [14]:
# square
data_squa_cus = squareform(data_dist_cus)
data_squa_cus

array([[  0.        ,  11.91637529,  10.48808848, ..., 228.62851966,
        239.        , 229.77380181],
       [ 11.91637529,   0.        ,  11.74734012, ..., 228.01096465,
        239.03765394, 229.70415756],
       [ 10.48808848,  11.74734012,   0.        , ..., 228.08112592,
        238.26665734, 229.77380181],
       ...,
       [228.62851966, 228.01096465, 228.08112592, ...,   0.        ,
        304.13812651, 305.16389039],
       [239.        , 239.03765394, 238.26665734, ..., 304.13812651,
          0.        , 303.10889132],
       [229.77380181, 229.70415756, 229.77380181, ..., 305.16389039,
        303.10889132,   0.        ]])

In [15]:
# data frame
euclid_dist_cus = pd.DataFrame(data_squa_cus,
                               index=data_pivot_cus.index,
                               columns=data_pivot_cus.index)
print(len(euclid_dist_cus)) # number of rows equals to the different customers
euclid_dist_cus

1000


CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,0.000000,11.916375,10.488088,11.224972,11.401754,11.090537,12.409674,11.045361,11.269428,11.489125,...,206.871941,213.180675,225.656819,198.232187,230.913404,220.501701,217.188858,228.628520,239.000000,229.773802
200,11.916375,0.000000,11.747340,12.083046,12.569805,12.288206,12.165525,12.083046,11.874342,12.000000,...,206.310446,212.635839,224.697575,197.139544,230.952376,220.202180,215.728997,228.010965,239.037654,229.704158
264,10.488088,11.747340,0.000000,11.489125,11.224972,11.445523,12.000000,11.401754,11.180340,11.747340,...,206.387984,212.946003,225.435135,197.600607,230.371439,219.136943,216.612557,228.081126,238.266657,229.773802
356,11.224972,12.083046,11.489125,0.000000,12.083046,11.789826,12.328828,11.135529,11.958261,12.165525,...,206.649462,213.082144,225.452878,197.494304,231.038958,219.952268,217.437347,228.098663,238.493186,229.464594
412,11.401754,12.569805,11.224972,12.083046,0.000000,11.704700,12.328828,11.135529,11.789826,11.747340,...,206.900942,211.679002,225.572605,197.630969,230.614397,219.733930,217.446545,227.997807,238.396728,228.927936
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,220.501701,220.202180,219.136943,219.952268,219.733930,219.599636,219.538152,219.924987,219.827205,220.070443,...,283.945417,283.945417,302.076149,272.717803,278.388218,0.000000,273.861279,291.547595,306.186218,307.205143
98069,217.188858,215.728997,216.612557,217.437347,217.446545,217.425849,216.903204,217.294731,217.080630,216.751009,...,283.945417,283.945417,295.803989,283.945417,285.043856,273.861279,0.000000,287.228132,297.909382,294.745653
98159,228.628520,228.010965,228.081126,228.098663,227.997807,228.197283,228.028507,228.181945,227.868383,228.103047,...,283.945417,279.508497,300.000000,290.473751,300.000000,291.547595,287.228132,0.000000,304.138127,305.163890
98185,239.000000,239.037654,238.266657,238.493186,238.396728,239.006276,238.949786,238.468027,238.692271,239.334494,...,301.039864,315.238005,306.186218,292.617498,314.245127,306.186218,297.909382,304.138127,0.000000,303.108891


In [16]:
# normalized euclidean custom
def euclid_dist_cus_norm(df_in):
    df_out = pd.DataFrame(1/(1 + squareform(pdist(df_in, 'euclidean'))),
                          index=data_pivot_cus.index,
                          columns=data_pivot_cus.index)
    return df_out


df_euclid_dist_cus = euclid_dist_cus_norm(data_pivot_cus)
df_euclid_dist_cus

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.000000,0.077421,0.087047,0.081800,0.080634,0.082709,0.074573,0.083020,0.081503,0.080070,...,0.004811,0.004669,0.004412,0.005019,0.004312,0.004515,0.004583,0.004355,0.004167,0.004333
200,0.077421,1.000000,0.078448,0.076435,0.073693,0.075255,0.075956,0.076435,0.077674,0.076923,...,0.004824,0.004681,0.004431,0.005047,0.004311,0.004521,0.004614,0.004367,0.004166,0.004335
264,0.087047,0.078448,1.000000,0.080070,0.081800,0.080350,0.076923,0.080634,0.082100,0.078448,...,0.004822,0.004674,0.004416,0.005035,0.004322,0.004543,0.004595,0.004365,0.004179,0.004333
356,0.081800,0.076435,0.080070,1.000000,0.076435,0.078187,0.075025,0.082403,0.077171,0.075956,...,0.004816,0.004671,0.004416,0.005038,0.004310,0.004526,0.004578,0.004365,0.004175,0.004339
412,0.080634,0.073693,0.081800,0.076435,1.000000,0.078711,0.075025,0.082403,0.078187,0.078448,...,0.004810,0.004702,0.004414,0.005034,0.004318,0.004530,0.004578,0.004367,0.004177,0.004349
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,0.004515,0.004521,0.004543,0.004526,0.004530,0.004533,0.004534,0.004526,0.004528,0.004523,...,0.003509,0.003509,0.003300,0.003653,0.003579,1.000000,0.003638,0.003418,0.003255,0.003245
98069,0.004583,0.004614,0.004595,0.004578,0.004578,0.004578,0.004589,0.004581,0.004585,0.004592,...,0.003509,0.003509,0.003369,0.003509,0.003496,0.003638,1.000000,0.003469,0.003345,0.003381
98159,0.004355,0.004367,0.004365,0.004365,0.004367,0.004363,0.004366,0.004363,0.004369,0.004365,...,0.003509,0.003565,0.003322,0.003431,0.003322,0.003418,0.003469,1.000000,0.003277,0.003266
98185,0.004167,0.004166,0.004179,0.004175,0.004177,0.004167,0.004168,0.004176,0.004172,0.004161,...,0.003311,0.003162,0.003255,0.003406,0.003172,0.003255,0.003345,0.003277,1.000000,0.003288


In [17]:
# normalized cosine custom
def cosine_dist_cus_norm(df_in):
    df_out = pd.DataFrame(1/(1 + squareform(pdist(df_in, 'cosine'))),
                          index=data_pivot_cus.index,
                          columns=data_pivot_cus.index)
    return df_out


df_cosine_dist_cus = cosine_dist_cus_norm(data_pivot_cus)
df_cosine_dist_cus

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.000000,0.530003,0.563492,0.543546,0.542393,0.541054,0.522016,0.530783,0.533096,0.535293,...,0.530306,0.517987,0.527662,0.511471,0.534181,0.513809,0.528795,0.520208,0.529481,0.527142
200,0.530003,1.000000,0.537166,0.535098,0.521226,0.519193,0.556945,0.516729,0.536350,0.541451,...,0.545224,0.533067,0.555246,0.543554,0.530308,0.521925,0.573177,0.537284,0.526167,0.527280
264,0.563492,0.537166,1.000000,0.531963,0.550168,0.525368,0.538967,0.514932,0.537046,0.524193,...,0.546880,0.525550,0.535062,0.531806,0.553085,0.560225,0.548538,0.538340,0.554929,0.527142
356,0.543546,0.535098,0.531963,1.000000,0.525982,0.523986,0.536792,0.540890,0.516901,0.519482,...,0.536321,0.520565,0.533119,0.534066,0.528874,0.530406,0.520138,0.536202,0.544727,0.535974
412,0.542393,0.521226,0.550168,0.525982,1.000000,0.534073,0.542308,0.547743,0.530459,0.543050,...,0.527903,0.565369,0.528847,0.529278,0.541674,0.536686,0.519628,0.538710,0.546943,0.552636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,0.513809,0.521925,0.560225,0.530406,0.536686,0.542873,0.540381,0.533190,0.535224,0.526495,...,0.532674,0.539309,0.522883,0.542570,0.569539,1.000000,0.562050,0.543657,0.531363,0.519127
98069,0.528795,0.573177,0.548538,0.520138,0.519628,0.520835,0.534221,0.525789,0.531918,0.541402,...,0.529233,0.535979,0.530224,0.518742,0.554911,0.562050,1.000000,0.548011,0.542143,0.536728
98159,0.520208,0.537284,0.538340,0.536202,0.538710,0.533856,0.535601,0.535704,0.544926,0.535721,...,0.542777,0.556893,0.535487,0.521507,0.541404,0.543657,0.548011,1.000000,0.543258,0.531447
98185,0.529481,0.526167,0.554929,0.544727,0.546943,0.528842,0.527943,0.548943,0.539210,0.518002,...,0.526429,0.509322,0.537103,0.531411,0.529780,0.531363,0.542143,0.543258,1.000000,0.546336


In [18]:
# normalized seuclidean custom
def seuclid_dist_cus_norm(df_in):
    df_out = pd.DataFrame(1/(1 + squareform(pdist(df_in, 'seuclidean'))),
                          index=data_pivot_cus.index,
                          columns=data_pivot_cus.index)
    return df_out


df_seuclid_dist_cus = seuclid_dist_cus_norm(data_pivot_cus)
df_seuclid_dist_cus

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.000000,0.329359,0.357140,0.343121,0.340994,0.344184,0.319299,0.346258,0.340948,0.333938,...,0.027534,0.027028,0.025730,0.028987,0.024905,0.026012,0.026731,0.025289,0.024077,0.024867
200,0.329359,1.000000,0.332622,0.327441,0.318138,0.323041,0.324492,0.328357,0.329472,0.326230,...,0.027621,0.027098,0.025835,0.029145,0.024898,0.026053,0.026913,0.025364,0.024073,0.024868
264,0.357140,0.332622,1.000000,0.336855,0.343243,0.336939,0.325941,0.339408,0.342488,0.328753,...,0.027615,0.027056,0.025748,0.029084,0.024958,0.026172,0.026805,0.025354,0.024162,0.024864
356,0.343121,0.327441,0.336855,1.000000,0.326678,0.332436,0.319782,0.344991,0.327185,0.321355,...,0.027573,0.027041,0.025757,0.029094,0.024890,0.026076,0.026698,0.025365,0.024126,0.024906
412,0.340994,0.318138,0.343243,0.326678,1.000000,0.332313,0.320112,0.345912,0.332289,0.330626,...,0.027530,0.027215,0.025736,0.029070,0.024929,0.026092,0.026699,0.025359,0.024138,0.024954
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,0.026012,0.026053,0.026172,0.026076,0.026092,0.026114,0.026118,0.026074,0.026095,0.026066,...,0.020359,0.020360,0.019327,0.021130,0.020566,1.000000,0.021195,0.019802,0.018834,0.018737
98069,0.026731,0.026913,0.026805,0.026698,0.026699,0.026701,0.026768,0.026723,0.026744,0.026782,...,0.020438,0.020524,0.019728,0.020497,0.020274,0.021195,1.000000,0.020248,0.019491,0.019565
98159,0.025289,0.025364,0.025354,0.025365,0.025359,0.025342,0.025357,0.025331,0.025378,0.025351,...,0.020298,0.020647,0.019418,0.019984,0.019294,0.019802,0.020248,1.000000,0.019030,0.018993
98185,0.024077,0.024073,0.024162,0.024126,0.024138,0.024083,0.024082,0.024128,0.024110,0.024042,...,0.019093,0.018386,0.018993,0.019770,0.018364,0.018834,0.019491,0.019030,1.000000,0.018921


In [19]:
# normalized correlation custom
def correlation_dist_cus_norm(df_in):
    df_out = pd.DataFrame(1/(1 + squareform(pdist(df_in, 'correlation'))),
                          index=data_pivot_cus.index,
                          columns=data_pivot_cus.index)
    return df_out


df_corr_dist_cus = correlation_dist_cus_norm(data_pivot_cus)
df_corr_dist_cus

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.000000,0.493776,0.529677,0.510360,0.506951,0.507217,0.487410,0.497932,0.499215,0.502499,...,0.493385,0.484376,0.493486,0.479867,0.499870,0.479764,0.493197,0.486457,0.491384,0.492072
200,0.493776,1.000000,0.499733,0.498575,0.482172,0.481921,0.519139,0.480599,0.499131,0.505459,...,0.504719,0.496170,0.517884,0.508954,0.492581,0.484514,0.534491,0.500253,0.484272,0.488735
264,0.529677,0.499733,1.000000,0.497579,0.513573,0.490305,0.503243,0.480936,0.502028,0.490242,...,0.508803,0.490805,0.499750,0.499162,0.517743,0.525278,0.511845,0.503500,0.515736,0.490875
356,0.510360,0.498575,0.497579,1.000000,0.490143,0.489778,0.501933,0.507803,0.482686,0.486354,...,0.499105,0.486660,0.498662,0.502228,0.494239,0.496086,0.484207,0.502204,0.506401,0.500631
412,0.506951,0.482172,0.513573,0.490143,1.000000,0.497577,0.505111,0.512470,0.493947,0.507777,...,0.488109,0.529502,0.492033,0.495264,0.504752,0.500064,0.481259,0.502418,0.505997,0.515015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97928,0.479764,0.484514,0.525278,0.496086,0.500064,0.507963,0.504739,0.499300,0.500268,0.492619,...,0.494577,0.504692,0.487604,0.510055,0.534474,1.000000,0.525583,0.508923,0.492050,0.482921
98069,0.493197,0.534491,0.511845,0.484207,0.481259,0.484219,0.496892,0.490306,0.495330,0.506038,...,0.489353,0.499742,0.493332,0.484621,0.518025,0.525583,1.000000,0.511707,0.501063,0.498897
98159,0.486457,0.500253,0.503500,0.502204,0.502418,0.499195,0.500240,0.502118,0.510333,0.502173,...,0.505074,0.522726,0.500553,0.489177,0.506341,0.508923,0.511707,1.000000,0.504367,0.495580
98185,0.491384,0.484272,0.515736,0.506401,0.505997,0.489667,0.487961,0.511136,0.500094,0.480046,...,0.483728,0.470481,0.497652,0.494943,0.490103,0.492050,0.501063,0.504367,1.000000,0.505914


## Step 4: Check your results by generating a list of the top 5 most similar customers for a specific CustomerID.

In [20]:
# top 5 most similar customers for id 33 for example
idcustomer = 33
similar_cus = pd.DataFrame(df_euclid_dist_cus.loc[idcustomer].sort_values(ascending=False))[1:6]
similar_cus

Unnamed: 0_level_0,33
CustomerID,Unnamed: 1_level_1
3317,0.087047
3535,0.087047
264,0.087047
2503,0.085983
3305,0.085638


In [21]:
# top 5 most similar customers for id 264 for example (33 is most similar customer from 264 and reverse)
idcustomer = 264
similar_cus = pd.DataFrame(df_euclid_dist_cus.loc[idcustomer].sort_values(ascending=False))[1:6]
similar_cus

Unnamed: 0_level_0,264
CustomerID,Unnamed: 1_level_1
1008,0.088152
2617,0.088152
1072,0.08741
33,0.087047
3535,0.086333


In [22]:
# top 5 most similar customers for id 3305 for example (in this case 33 is not most similar, but yes in other way)
idcustomer = 3305
similar_cus = pd.DataFrame(df_euclid_dist_cus.loc[idcustomer].sort_values(ascending=False))[1:6]
similar_cus

Unnamed: 0_level_0,3305
CustomerID,Unnamed: 1_level_1
1072,0.088913
3317,0.08853
883,0.08853
3535,0.08853
3074,0.087779


## Step 5: From the data frame you created in Step 1, select the records for the list of similar CustomerIDs you obtained in Step 4.

In [23]:
# similar customers for id 97928 for example
idcustomer = 97928
similar_cus = pd.DataFrame(df_euclid_dist_cus.loc[idcustomer].sort_values(ascending=False))[1:6]
customers = similar_cus.reset_index()[['CustomerID']]
customers

Unnamed: 0,CustomerID
0,42087
1,18622
2,14012
3,18796
4,13062


In [24]:
data_cus_pro.head()

Unnamed: 0,CustomerID,ProductName,Quantity
0,33,Apricots - Dried,1
1,33,Assorted Desserts,1
2,33,Bandage - Flexible Neon,1
3,33,"Bar Mix - Pina Colada, 355 Ml",1
4,33,"Beans - Kidney, Canned",1


In [25]:
# join df similar customers and data customers and products
records_cus_pro = data_cus_pro.merge(customers)
records_cus_pro

Unnamed: 0,CustomerID,ProductName,Quantity
0,13062,"Appetizer - Mini Egg Roll, Shrimp",4
1,13062,Appetizer - Mushroom Tart,4
2,13062,Assorted Desserts,4
3,13062,Bacardi Breezer - Tropical,4
4,13062,Baking Powder,4
...,...,...,...
359,42087,Wine - Two Oceans Cabernet,11
360,42087,Wine - Vineland Estate Semi - Dry,11
361,42087,Yeast Dry - Fermipan,11
362,42087,"Yogurt - Blueberry, 175 Gr",11


In [26]:
# check number of rows
data_cus_pro.loc[(data_cus_pro['CustomerID'] == 264) |
                 (data_cus_pro['CustomerID'] == 2503) |
                 (data_cus_pro['CustomerID'] == 3305) |
                 (data_cus_pro['CustomerID'] == 3317) |
                 (data_cus_pro['CustomerID'] == 3535)].groupby('CustomerID')[['ProductName']].count()

Unnamed: 0_level_0,ProductName
CustomerID,Unnamed: 1_level_1
264,62
2503,47
3305,54
3317,52
3535,55


In [27]:
# check number of rows 
records_cus_pro.groupby('CustomerID')[['ProductName']].count()

Unnamed: 0_level_0,ProductName
CustomerID,Unnamed: 1_level_1
13062,67
14012,80
18622,70
18796,77
42087,70


## Step 6: Aggregate those customer purchase records by ProductName, sum the Quantity field, and then rank them in descending order by quantity.

This will give you the total number of each product purchased by the 5 most similar customers to the customer you selected in order from most purchased to least.

In [28]:
# top products 5 similar customers (for specific id customer)
top_products = records_cus_pro.groupby(['ProductName'])[['Quantity']].sum().sort_values(by='Quantity', 
                                                                                        ascending=False).reset_index()
top_products

Unnamed: 0,ProductName,Quantity
0,Bouq All Italian - Primerba,35
1,Tea - Jasmin Green,32
2,"Soup - Campbells, Lentil",27
3,Arizona - Green Tea,26
4,"Cheese - Brie,danish",25
...,...,...
251,Cheese Cloth No 100,4
252,Grouper - Fresh,4
253,Bread Crumbs - Japanese Style,4
254,Skirt - 29 Foot,4


## Step 7: Filter the list for products that the chosen customer has not yet purchased and then recommend the top 5 products with the highest quantities that are left.

- Merge the ranked products data frame with the customer product matrix on the ProductName field.
- Filter for records where the chosen customer has not purchased the product.
- Show the top 5 results.

In [29]:
# idcustomer selected previous step
customer = data_cus_pro[(data_cus_pro['CustomerID'] == idcustomer)]
# join products and data customers and products
records_top_pro = customer.merge(top_products, how='right', left_on='ProductName', right_on='ProductName')
# select products not purchased
list_products = list(records_top_pro[records_top_pro['CustomerID'].isnull()][:5]['ProductName'])
list_products

['Soup - Campbells, Lentil',
 'Cheese - Brie,danish',
 'Wiberg Super Cure',
 'Pants Custom Dry Clean',
 'Shrimp - 31/40']

## Step 8: Now that we have generated product recommendations for a single user, put the pieces together and iterate over a list of all CustomerIDs.

- Create an empty dictionary that will hold the recommendations for all customers.
- Create a list of unique CustomerIDs to iterate over.
- Iterate over the customer list performing steps 4 through 7 for each and appending the results of each iteration to the dictionary you created.

In [30]:
# top 5 customer distance euclide

list_customers = []
# go every customer
### to test a few customers -> for idcustom in data_cus_pro['CustomerID'].unique()[:3]:
for idcustom in data_cus_pro['CustomerID'].unique():
    # choose similar customers
    similar_cus = pd.DataFrame(df_euclid_dist_cus.loc[idcustom].sort_values(ascending=False))[1:6]
    similar_customers = similar_cus.reset_index()[['CustomerID']]
    # join similar customers and data customers and products
    records_cus_pro = data_cus_pro.merge(similar_customers)
    # choose top products
    top_products = records_cus_pro.groupby(['ProductName'])[['Quantity']].sum().sort_values(by='Quantity', ascending=False)
    # products not purchased
    customer = data_cus_pro[(data_cus_pro['CustomerID'] == idcustom)]
    # join products and data customers and products
    records_top_pro = customer.merge(top_products, how='right', left_on='ProductName', right_on='ProductName')
    # select products not purchased
    list_products = list(records_top_pro[records_top_pro['CustomerID'].isnull()][:5]['ProductName'])
    # dictionary
    dict_customers = {}
    dict_customers['CustomerID'] = idcustom
    dict_customers['Products'] = list_products
    # add customer
    list_customers.append(dict_customers)
print(len(list_customers)) # 1000 should be total of customers
list_customers

1000


[{'CustomerID': 33,
  'Products': ['Butter - Unsalted',
   'Wine - Ej Gallo Sierra Valley',
   'Soup - Campbells Bean Medley',
   'Wine - Blue Nun Qualitatswein',
   'Chicken - Soup Base']},
 {'CustomerID': 200,
  'Products': ['Soup - Campbells Bean Medley',
   'Muffin - Carrot Individual Wrap',
   'Bay Leaf',
   'Pork - Kidney',
   'Wanton Wrap']},
 {'CustomerID': 264,
  'Products': ['Soupfoamcont12oz 112con',
   'Wine - Two Oceans Cabernet',
   'Bread - Italian Roll With Herbs',
   'Veal - Inside, Choice',
   'Fish - Scallops, Cold Smoked']},
 {'CustomerID': 356,
  'Products': ['Butter - Unsalted',
   'Veal - Inside, Choice',
   'Beets - Candy Cane, Organic',
   'Nut - Chestnuts, Whole',
   'Lamb - Ground']},
 {'CustomerID': 412,
  'Products': ['Olive - Spread Tapenade',
   'Sprouts - Baby Pea Tendrils',
   'Wine - Blue Nun Qualitatswein',
   'Pepper - Black, Whole',
   'Soup - Campbells Bean Medley']},
 {'CustomerID': 464,
  'Products': ['Butter - Unsalted',
   'Sauce - Gravy, Au Ju

##  Step 9: Store the results in a Pandas data frame. The data frame should a column for Customer ID and then a column for each of the 5 product recommendations for each customer.

In [31]:
# to dataframe
df_cus_top_pro = pd.DataFrame(list_customers)
df_cus_top_pro

Unnamed: 0,CustomerID,Products
0,33,"[Butter - Unsalted, Wine - Ej Gallo Sierra Val..."
1,200,"[Soup - Campbells Bean Medley, Muffin - Carrot..."
2,264,"[Soupfoamcont12oz 112con, Wine - Two Oceans Ca..."
3,356,"[Butter - Unsalted, Veal - Inside, Choice, Bee..."
4,412,"[Olive - Spread Tapenade, Sprouts - Baby Pea T..."
...,...,...
995,97928,"[Soup - Campbells, Lentil, Cheese - Brie,danis..."
996,98069,"[Skirt - 29 Foot, Beans - Kidney White, Milk -..."
997,98159,"[Chips Potato Salt Vinegar 43g, Tea - English ..."
998,98185,"[Crackers - Trio, Pernod, Tea - Jasmin Green, ..."


In [32]:
# list products to columns
df_split = pd.DataFrame(df_cus_top_pro['Products'].to_list(), columns=['ProductName1', 
                                                                       'ProductName2', 
                                                                       'ProductName3', 
                                                                       'ProductName4', 
                                                                       'ProductName5'])
df_cus_top_pro1 = pd.concat([df_cus_top_pro, df_split], axis=1)
df_cus_top_pro1

Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Butter - Unsalted, Wine - Ej Gallo Sierra Val...",Butter - Unsalted,Wine - Ej Gallo Sierra Valley,Soup - Campbells Bean Medley,Wine - Blue Nun Qualitatswein,Chicken - Soup Base
1,200,"[Soup - Campbells Bean Medley, Muffin - Carrot...",Soup - Campbells Bean Medley,Muffin - Carrot Individual Wrap,Bay Leaf,Pork - Kidney,Wanton Wrap
2,264,"[Soupfoamcont12oz 112con, Wine - Two Oceans Ca...",Soupfoamcont12oz 112con,Wine - Two Oceans Cabernet,Bread - Italian Roll With Herbs,"Veal - Inside, Choice","Fish - Scallops, Cold Smoked"
3,356,"[Butter - Unsalted, Veal - Inside, Choice, Bee...",Butter - Unsalted,"Veal - Inside, Choice","Beets - Candy Cane, Organic","Nut - Chestnuts, Whole",Lamb - Ground
4,412,"[Olive - Spread Tapenade, Sprouts - Baby Pea T...",Olive - Spread Tapenade,Sprouts - Baby Pea Tendrils,Wine - Blue Nun Qualitatswein,"Pepper - Black, Whole",Soup - Campbells Bean Medley
...,...,...,...,...,...,...,...
995,97928,"[Soup - Campbells, Lentil, Cheese - Brie,danis...","Soup - Campbells, Lentil","Cheese - Brie,danish",Wiberg Super Cure,Pants Custom Dry Clean,Shrimp - 31/40
996,98069,"[Skirt - 29 Foot, Beans - Kidney White, Milk -...",Skirt - 29 Foot,Beans - Kidney White,Milk - 1%,Cheese - Taleggio D.o.p.,Sprouts - Baby Pea Tendrils
997,98159,"[Chips Potato Salt Vinegar 43g, Tea - English ...",Chips Potato Salt Vinegar 43g,Tea - English Breakfast,"Wine - Red, Harrow Estates, Cab",Bread - Raisin Walnut Oval,Wine - Redchard Merritt
998,98185,"[Crackers - Trio, Pernod, Tea - Jasmin Green, ...",Crackers - Trio,Pernod,Tea - Jasmin Green,Pastry - Choclate Baked,"Peas - Pigeon, Dry"


## Step 10: Change the distance metric used in Step 3 to something other than euclidean (correlation, cityblock, consine, jaccard, etc.). Regenerate the recommendations for all customers and note the differences.

In [33]:
# top 5 customer distance cosine

list_customers = []
# go every customer
### to test a few customers -> for idcustom in data_cus_pro['CustomerID'].unique()[:3]:
for idcustom in data_cus_pro['CustomerID'].unique():
    # choose similar customers
    similar_cus = pd.DataFrame(df_cosine_dist_cus.loc[idcustom].sort_values(ascending=False))[1:6]
    similar_customers = similar_cus.reset_index()[['CustomerID']]
    # join similar customers and data customers and products
    records_cus_pro = data_cus_pro.merge(similar_customers)
    # choose top products
    top_products = records_cus_pro.groupby(['ProductName'])[['Quantity']].sum().sort_values(by='Quantity', ascending=False)
    # products not purchased
    customer = data_cus_pro[(data_cus_pro['CustomerID'] == idcustom)]
    # join products and data customers and products
    records_top_pro = customer.merge(top_products, how='right', left_on='ProductName', right_on='ProductName')
    # select products not purchased
    list_products = list(records_top_pro[records_top_pro['CustomerID'].isnull()][:5]['ProductName'])
    # dictionary
    dict_customers = {}
    dict_customers['CustomerID'] = idcustom
    dict_customers['Products'] = list_products
    # add customer
    list_customers.append(dict_customers)

# to dataframe
df_cus_top_pro = pd.DataFrame(list_customers)
# list products to columns
df_split = pd.DataFrame(df_cus_top_pro['Products'].to_list(), columns=['ProductName1', 
                                                                       'ProductName2', 
                                                                       'ProductName3', 
                                                                       'ProductName4', 
                                                                       'ProductName5'])
df_cus_top_pro2 = pd.concat([df_cus_top_pro, df_split], axis=1)
df_cus_top_pro2

Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Knife Plastic - White, Soup - Campbells, Beef...",Knife Plastic - White,"Soup - Campbells, Beef Barley",Onions - Cippolini,Tea - Herbal Sweet Dreams,Muffin - Zero Transfat
1,200,"[Longos - Grilled Salmon With Bbq, Snapple Lem...",Longos - Grilled Salmon With Bbq,Snapple Lemon Tea,General Purpose Trigger,"Thyme - Lemon, Fresh",Tomatoes Tear Drop
2,264,"[Pickerel - Fillets, Water - Mineral, Natural,...",Pickerel - Fillets,"Water - Mineral, Natural",Snapple - Iced Tea Peach,Wine - Ej Gallo Sierra Valley,French Pastry - Mini Chocolate
3,356,"[Bread - English Muffin, Olive - Spread Tapena...",Bread - English Muffin,Olive - Spread Tapenade,Bagel - Plain,Pork - Hock And Feet Attached,Wiberg Super Cure
4,412,"[Salsify, Organic, Durian Fruit, Wine - Hardys...","Salsify, Organic",Durian Fruit,Wine - Hardys Bankside Shiraz,Bread - Raisin Walnut Oval,Gatorade - Xfactor Berry
...,...,...,...,...,...,...,...
995,97928,"[Spoon - Soup, Plastic, Beef - Montreal Smoked...","Spoon - Soup, Plastic",Beef - Montreal Smoked Brisket,Blackberries,Cheese Cloth No 100,Cake - Mini Cheesecake
996,98069,"[Sprouts - Baby Pea Tendrils, Juice - Orange, ...",Sprouts - Baby Pea Tendrils,Juice - Orange,Muffin Batt - Blueberry Passion,Beer - Original Organic Lager,"Nut - Chestnuts, Whole"
997,98159,"[Water, Tap, Bananas, Wine - Redchard Merritt,...","Water, Tap",Bananas,Wine - Redchard Merritt,Chocolate - Compound Coating,Longos - Grilled Salmon With Bbq
998,98185,"[Peas - Pigeon, Dry, Squid U5 - Thailand, Chee...","Peas - Pigeon, Dry",Squid U5 - Thailand,Cheese - Taleggio D.o.p.,"Cheese - Brie, Triple Creme",Crackers - Trio


In [34]:
# top 5 customer distance correlation

list_customers = []
# go every customer
### to test a few customers -> for idcustom in data_cus_pro['CustomerID'].unique()[:3]:
for idcustom in data_cus_pro['CustomerID'].unique():
    # choose similar customers
    similar_cus = pd.DataFrame(df_corr_dist_cus.loc[idcustom].sort_values(ascending=False))[1:6]
    similar_customers = similar_cus.reset_index()[['CustomerID']]
    # join similar customers and data customers and products
    records_cus_pro = data_cus_pro.merge(similar_customers)
    # choose top products
    top_products = records_cus_pro.groupby(['ProductName'])[['Quantity']].sum().sort_values(by='Quantity', ascending=False)
    # products not purchased
    customer = data_cus_pro[(data_cus_pro['CustomerID'] == idcustom)]
    # join products and data customers and products
    records_top_pro = customer.merge(top_products, how='right', left_on='ProductName', right_on='ProductName')
    # select products not purchased
    list_products = list(records_top_pro[records_top_pro['CustomerID'].isnull()][:5]['ProductName'])
    # dictionary
    dict_customers = {}
    dict_customers['CustomerID'] = idcustom
    dict_customers['Products'] = list_products
    # add customer
    list_customers.append(dict_customers)

# to dataframe
df_cus_top_pro = pd.DataFrame(list_customers)
# list products to columns
df_split = pd.DataFrame(df_cus_top_pro['Products'].to_list(), columns=['ProductName1', 
                                                                       'ProductName2', 
                                                                       'ProductName3', 
                                                                       'ProductName4', 
                                                                       'ProductName5'])
df_cus_top_pro3 = pd.concat([df_cus_top_pro, df_split], axis=1)
df_cus_top_pro3

Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Knife Plastic - White, Muffin - Zero Transfat...",Knife Plastic - White,Muffin - Zero Transfat,Banana Turning,Crush - Cream Soda,Veal - Osso Bucco
1,200,"[Otomegusa Dashi Konbu, Milk Powder, Potatoes ...",Otomegusa Dashi Konbu,Milk Powder,Potatoes - Idaho 100 Count,Crackers - Trio,Pail With Metal Handle 16l White
2,264,"[Water - Mineral, Natural, Wine - Toasted Head...","Water - Mineral, Natural",Wine - Toasted Head,Snapple - Iced Tea Peach,Pickerel - Fillets,Garbag Bags - Black
3,356,"[Cheese - Taleggio D.o.p., Coconut - Shredded,...",Cheese - Taleggio D.o.p.,"Coconut - Shredded, Sweet",Cheese - Cheddarsliced,Ocean Spray - Kiwi Strawberry,Olives - Kalamata
4,412,"[Cake - Mini Cheesecake, Butter - Unsalted, Sa...",Cake - Mini Cheesecake,Butter - Unsalted,"Salmon - Atlantic, Skin On",Wine - Hardys Bankside Shiraz,Gloves - Goldtouch Disposable
...,...,...,...,...,...,...,...
995,97928,"[Lettuce - Treviso, Yogurt - Blueberry, 175 Gr...",Lettuce - Treviso,"Yogurt - Blueberry, 175 Gr",Bread - Calabrese Baguette,Extract - Lemon,Coffee - Irish Cream
996,98069,"[Wine - Red, Colio Cabernet, Veal - Inside, So...","Wine - Red, Colio Cabernet",Veal - Inside,Soupfoamcont12oz 112con,Peas - Frozen,Sprouts - Baby Pea Tendrils
997,98159,"[Water, Tap, Bananas, Wine - Redchard Merritt,...","Water, Tap",Bananas,Wine - Redchard Merritt,Chocolate - Compound Coating,Longos - Grilled Salmon With Bbq
998,98185,"[Cheese - Taleggio D.o.p., Squid U5 - Thailand...",Cheese - Taleggio D.o.p.,Squid U5 - Thailand,"Peas - Pigeon, Dry",Salmon Steak - Cohoe 8 Oz,Veal - Osso Bucco


In [35]:
display(df_cus_top_pro1.loc[df_cus_top_pro1['CustomerID'] == 33]) # euclidean
display(df_cus_top_pro2.loc[df_cus_top_pro2['CustomerID'] == 33]) # cosine
display(df_cus_top_pro3.loc[df_cus_top_pro3['CustomerID'] == 33]) # correlation

Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Butter - Unsalted, Wine - Ej Gallo Sierra Val...",Butter - Unsalted,Wine - Ej Gallo Sierra Valley,Soup - Campbells Bean Medley,Wine - Blue Nun Qualitatswein,Chicken - Soup Base


Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Knife Plastic - White, Soup - Campbells, Beef...",Knife Plastic - White,"Soup - Campbells, Beef Barley",Onions - Cippolini,Tea - Herbal Sweet Dreams,Muffin - Zero Transfat


Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
0,33,"[Knife Plastic - White, Muffin - Zero Transfat...",Knife Plastic - White,Muffin - Zero Transfat,Banana Turning,Crush - Cream Soda,Veal - Osso Bucco


In [36]:
display(df_cus_top_pro1.loc[df_cus_top_pro1['CustomerID'] == 98069]) # euclidean
display(df_cus_top_pro2.loc[df_cus_top_pro2['CustomerID'] == 98069]) # cosine
display(df_cus_top_pro3.loc[df_cus_top_pro3['CustomerID'] == 98069]) # correlation

Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
996,98069,"[Skirt - 29 Foot, Beans - Kidney White, Milk -...",Skirt - 29 Foot,Beans - Kidney White,Milk - 1%,Cheese - Taleggio D.o.p.,Sprouts - Baby Pea Tendrils


Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
996,98069,"[Sprouts - Baby Pea Tendrils, Juice - Orange, ...",Sprouts - Baby Pea Tendrils,Juice - Orange,Muffin Batt - Blueberry Passion,Beer - Original Organic Lager,"Nut - Chestnuts, Whole"


Unnamed: 0,CustomerID,Products,ProductName1,ProductName2,ProductName3,ProductName4,ProductName5
996,98069,"[Wine - Red, Colio Cabernet, Veal - Inside, So...","Wine - Red, Colio Cabernet",Veal - Inside,Soupfoamcont12oz 112con,Peas - Frozen,Sprouts - Baby Pea Tendrils


### Notes
- Same client hasn't same suggested products, depends method to calculate distance