# Olist Product Recommendation System
### Part 3 - Comparing Granularities using Market Basket 
#### Author: Olabisi Sunmon | 10th April 2023

### Problem Statement

How can we create a customised product recommendation system using data analysis and machine learning techniques to help Olist customers discover new products and find relevant items for purchase, to boost revenue and customer purchase rates.


During the exploratory data analysis I found that around 55% of customers have made only one purchase. This can pose a challenge for collaborative filtering techniques and market basket analysis as they heavily rely on user purchase history to make accurate recommendations to predict their future behavior and preferences.

### Approach
The data frame will be split into 2 by separating customers who made a single purchase (First-timers) and customers who made multiple purchases (Returners). 
The unpartitioned dataset is what will be used to create recommendations for new costumers with empty online carts and the returners dateframe will be used to build recommendation systems for returning costumers and new costumers with items in their online cart.

-------
In this notebook, I will assess the different levels of granularity for a product recommendation system for returning customers using Market Basket analysis. I will investigate the feasibility of recommending products based on 
- Product Caterogy per State
- Items in a Product Caterogy
- All products
and analyse the effectiveness of each approach.


Market basket analysis is a way to find out what items customers tend to buy together. By analysing transaction data, we can identify patterns and relationships between products. 

-------

### Procedure:

One order will be considered as one unit (a basket)
 
I will be utilising three different methods for market basket analysis due to limitations in computational capacity.

-  Hot encode (Mlxtend) 
-  TransactionEncoder (Mlxtend)
-  Manually 

In [24]:
#Import Package
# data manipulation
import numpy as np
import pandas as pd
import joblib
from sklearn.model_selection import StratifiedShuffleSplit

#Plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#Modeling
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth, association_rules
from mlxtend.preprocessing import TransactionEncoder
from surprise import Dataset
from surprise.reader import Reader
from surprise.prediction_algorithms.matrix_factorization import SVD as FunkSVD
from surprise.model_selection import train_test_split, GridSearchCV, cross_validate
from surprise import accuracy
from surprise.accuracy import rmse
from surprise import NormalPredictor
from itertools import combinations, groupby
from collections import Counter

#Ignore futurewarnings
import warnings
warnings.filterwarnings('ignore')

#### Import data

In [25]:
df = joblib.load('/Users/labisi/Desktop/capstone-project-osunmon1/src/data/Processed/df_processed.pkl')


In [26]:
df.shape

(110750, 19)

In [3]:
# Split the dataframe into 2 dataframes for first time and returners
# Creates a repeater columns of both dateframes

returners = df.groupby('customer_unique_id').filter(lambda x: len(x) > 1)
first_timers = df.groupby('customer_unique_id').filter(lambda x: len(x) == 1)


# Save datasets for easier accessing
joblib.dump(returners,'/Users/labisi/Desktop/capstone-project-osunmon1/src/data/processed/returners_data.pkl')
joblib.dump(first_timers,'/Users/labisi/Desktop/capstone-project-osunmon1/src/data/processed/first_timer_data.pkl')

    
print("Shape of returners dataset:", returners.shape)
print("Shape of first timer dataset:", first_timers.shape)


Shape of returners dataset: (28965, 19)
Shape of first timer dataset: (81785, 19)


## Market Basket

The min_support value of 0.001 will be used in all the market basket models of this analysis, it has been chosen based on the mean support of the "all product" granularity level. Keeping the support value constant aims to maintain fairness and impartiality in the analysis, while also preserving the accuracy and validity of the results.

### Product Caterogy per State
Hot encode (Mlxtend) 

In [4]:
# Create basket for different state using product caterogy
basket_RR = (returners[returners['customer_state'] == 'RR'] 
          .groupby(['order_id','product_category_name_english'])['order_item_id'] 
          .sum().unstack().reset_index().fillna(0) 
          .set_index('order_id'))
basket_BA = (returners[returners['customer_state'] == 'BA'] 
          .groupby(['order_id','product_category_name_english'])['order_item_id'] 
          .sum().unstack().reset_index().fillna(0) 
          .set_index('order_id'))
basket_SP = (returners[returners['customer_state'] == 'SP'] 
          .groupby(['order_id','product_category_name_english'])['order_item_id'] 
          .sum().unstack().reset_index().fillna(0) 
          .set_index('order_id'))
basket_SP.shape

(6363, 69)

In [5]:
# data encoding
# defining the hot encoding function to make the data suitable 
# for the concerned libraries 
def hot_encode(x): 
    if(x<= 0): 
        return 0
    if(x>= 1): 
        return 1
basket_RR = basket_RR.applymap(hot_encode) 
basket_BA = basket_BA.applymap(hot_encode)
basket_SP = basket_SP.applymap(hot_encode) 
print('Basket SP')
basket_SP


Basket SP


product_category_name_english,agro_industry_and_commerce,air_conditioning,art,arts_and_craftmanship,audio,auto,baby,bed_bath_table,books_general_interest,books_imported,...,pet_shop,signaling_and_security,small_appliances,small_appliances_home_oven_and_coffee,sports_leisure,stationery,tablets_printing_image,telephony,toys,watches_gifts
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00018f77f2f0320c557190d7a144bdd3,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
0008288aa423d2a3f00fcb17cd7d8719,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
001d8f0e34a38c37f7dba2a37d4eba8b,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
002c9def9c9b951b1bec6d50753c9891,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
00337fe25a3780b3424d9ad7c5a4b35e,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ffb47bf24e3f64dc0a2059a9181b976a,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ffc16cecff8dc037f60458f28d1c1ba5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
ffd543c2b60842e148a86870dc60e212,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ffd84ab39cd5e873d8dba24342e65c01,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
print("For RR:")
frq_items = apriori(basket_RR, min_support = 0.001, use_colnames = True) # Building the model
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
# Collecting the inferred rules in a dataframe
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
display(rules.head())

#Sorting values for confidence and lift is important because it allows us to identify the most meaningful and relevant association rules in our dataset.

print("For BA:")
frq_items = apriori(basket_BA, min_support = 0.001, use_colnames = True) 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
display(rules.head())

print("For SP:")
frq_items = apriori(basket_SP, min_support = 0.001, use_colnames = True) 
rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 
display(rules.head())

For RR:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction


For BA:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
14,(fashion_sport),(fashio_female_clothing),0.002193,0.002193,0.002193,1.0,456.0,0.002188,inf
15,(fashio_female_clothing),(fashion_sport),0.002193,0.002193,0.002193,1.0,456.0,0.002188,inf
24,"(furniture_decor, sports_leisure)",(pet_shop),0.002193,0.019737,0.002193,1.0,50.666667,0.00215,inf
25,"(furniture_decor, pet_shop)",(sports_leisure),0.002193,0.109649,0.002193,1.0,9.12,0.001953,inf
26,"(sports_leisure, pet_shop)",(furniture_decor),0.002193,0.118421,0.002193,1.0,8.444444,0.001933,inf


For SP:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
2,(home_confort),(bed_bath_table),0.007229,0.150086,0.002672,0.369565,2.462349,0.001587,1.348139
1,(baby),(toys),0.02436,0.02766,0.001572,0.064516,2.332478,0.000898,1.039398
0,(toys),(baby),0.02766,0.02436,0.001572,0.056818,2.332478,0.000898,1.034414
3,(bed_bath_table),(home_confort),0.150086,0.007229,0.002672,0.017801,2.462349,0.001587,1.010763


#### Analysis

For BA:

If a customer buys from the fashion_sport department, then they are very likely to buy from fashio_female_clothing.The higher the 'lift', the higher likelihood that 'consequents' will be bought after 'antecedents'. The lift for this combination is very at 456.

For RR: 

The results of the market basket analysis did not generate any recommendations for the state "RR" , indicating that developing a recommendation system at the level of individual states may not be a viable strategy at this moment in time.



#### Method
The Hot Encode method and Mlxtend library were used successfully in the analysis without anycomputational challenges. Although Hot Encode method produces a sparse matrix of integers (which takes up more memory than boolean values), Mlxtend's Apriori algorithm was still able to run efficiently. This was likely due to the small tables due to the state partition and the high level grouping of product caterogy.

### Items in a product Caterogy
TransactionEncoder (Mlxtend)

In [7]:
# Create baskets for items in product caterogy
my_basket_drinks = returners[returners['product_category_name_english'] == 'drinks'].groupby('order_id')['product_id'].apply(list)
my_basket_flowers = returners[returners['product_category_name_english'] == 'flowers'].groupby('order_id')['product_id'].apply(list)



In [8]:
#Create Boolen matrix

te = TransactionEncoder()
drinks = te.fit_transform(my_basket_drinks)
drinks_df = pd.DataFrame(drinks, columns=te.columns_)

flowers = te.fit_transform(my_basket_flowers)
flowers_df = pd.DataFrame(flowers, columns=te.columns_)

print('Flowers')
flowers_df

Flowers


Unnamed: 0,065554bfe0244b9f5f6414f332106a21,5fd35bd0069ce2a404716901326b1336,7620a27f1d6747511f1c6f0ddb63c0ef,be0e6c61c2bcdd9a4d022ba67fd66189,e89607ddfcf953bc7a85adaca52e122a
0,True,False,False,False,False
1,False,True,False,False,False
2,False,False,True,False,False
3,False,False,False,True,False
4,False,False,False,False,True


In [9]:
print("For Drinks:")
# Building the model 
frq_items = apriori(drinks_df,  min_support = 0.001 ,use_colnames = True) 
# Collecting the inferred rules in a dataframe 
rules_d = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules_d = rules_d.sort_values(['confidence', 'lift'], ascending =[False, False]) 
display(rules_d.head())

print("For Flowers:")
# Building the model 
frq_items = apriori(flowers_df,  min_support = 0.001 ,use_colnames = True) 
# Collecting the inferred rules in a dataframe 
rules_f = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules_f = rules_f.sort_values(['confidence', 'lift'], ascending =[False, False]) 
display(rules_f.head())



For Drinks:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
4,(7d873ae7ba4e637167c1a8d544bd6af8),(262855d4dd5b0d39f786a3c86c285c6a),0.014706,0.014706,0.014706,1.0,68.0,0.01449,inf
5,(262855d4dd5b0d39f786a3c86c285c6a),(7d873ae7ba4e637167c1a8d544bd6af8),0.014706,0.014706,0.014706,1.0,68.0,0.01449,inf
6,(c0a0b5aa4507363e601eb90082c9c008),(262855d4dd5b0d39f786a3c86c285c6a),0.014706,0.014706,0.014706,1.0,68.0,0.01449,inf
7,(262855d4dd5b0d39f786a3c86c285c6a),(c0a0b5aa4507363e601eb90082c9c008),0.014706,0.014706,0.014706,1.0,68.0,0.01449,inf
8,(c0e452663c284f3f8e578f390dc3ab21),(262855d4dd5b0d39f786a3c86c285c6a),0.014706,0.014706,0.014706,1.0,68.0,0.01449,inf


For Flowers:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction


#### Analysis
For Drinks:

Based on the market basket analysis, there appears to be a strong positive association between the items '262855d4dd5b0d39f786a3c86c285c6a' and '7d873ae7ba4e637167c1a8d544bd6af8', suggesting that customers who purchase one of these items are likely to be interested in the other.

With a lift value of 68, this association is particularly strong. Recommending a product bundle containing both items could potentially be an effective strategy for encouraging customers to purchase them together. Therefore, it may be beneficial for Olist to consider this as it could help to increase sales and improve customer satisfaction by providing customers with a convenient and attractive offer that meets their needs and preferences.

For Flowers:

The market basket analysis did not create any recommendations for the "flowers" product category, suggesting that implementing a recommendation system at the level of 'product caterogy may be premature. This granuality should been revisted when olist has a larger amount of order history data.


#### Method

To avoid overburdening the computational resources for this model, the TransactionEncoder property was utilised. There are many items in some product categories, which will result in large sparse matrices. TransactionEncoder() was used to create a boolean sparse matrix that occupies much less memory compared to an integer sparse matrix. Previously, an integer sparse matrix was used, but it was not able to undergo the apriori transformation due to limited computational power for this level of granularity.

### All Products
Manual

In [10]:
all_products = df.set_index('order_id')['product_id']
all_products.head()

order_id
e481f51cbdc54678b7cc49136f2d6af7    87285b34884572647811a353c7ac498a
53cdb2fc8bc7dce0b6741e2150273451    595fac2a385ac33a80bd5114aec74eb8
47770eb9100c2d0c44946d9cf07ec65d    aa4383b373c6aca5d8797843e5594415
949d5b44dbf5de918fe9c16f97b45f8a    d0b61bfb1de832b15ba9d266ca96e5b0
ad21c59c0840e6cb83a9ceb5573f8159    65266b2da20d04dbe00c5c2d3bb7859e
Name: product_id, dtype: object

In [11]:
stats = all_products.value_counts().to_frame('frequency')
stats.head()

Unnamed: 0,frequency
aca2eb7d00ea1a7b8ebd4e68314663af,524
422879e10f46682990de24d770e7f83d,486
99a4788cb24856965c36a24e339b6058,482
389d119b48cf3043d311335e499d9c6b,391
368c6c730842d78016ad823897a372db,388


In [12]:
# calc support
stats['support'] = stats/ len(set(all_products.index))*100
stats.head()
     

Unnamed: 0,frequency,support
aca2eb7d00ea1a7b8ebd4e68314663af,524,0.54291
422879e10f46682990de24d770e7f83d,486,0.503538
99a4788cb24856965c36a24e339b6058,482,0.499394
389d119b48cf3043d311335e499d9c6b,391,0.40511
368c6c730842d78016ad823897a372db,388,0.402002


In [13]:
mean = np.mean(stats['support'])
print(f'mean: {mean}')

max = np.max(stats['support']) 
print(f'max: {max}')

min = np.min(stats['support'])
print(f'min: {min}')

median = np.median(stats['support'])
print(f'median: {median}')

mean: 0.003566772091228142
max: 0.5429095392521525
min: 0.0010360869069697568
median: 0.0010360869069697568


In [14]:

# min_support level
min_support = 0.001 #using the median
product_over_support = stats[stats['support'] >= min_support].index
orders_over_support = all_products[all_products.isin(product_over_support)]
orders_over_support
     

order_id
e481f51cbdc54678b7cc49136f2d6af7    87285b34884572647811a353c7ac498a
53cdb2fc8bc7dce0b6741e2150273451    595fac2a385ac33a80bd5114aec74eb8
47770eb9100c2d0c44946d9cf07ec65d    aa4383b373c6aca5d8797843e5594415
949d5b44dbf5de918fe9c16f97b45f8a    d0b61bfb1de832b15ba9d266ca96e5b0
ad21c59c0840e6cb83a9ceb5573f8159    65266b2da20d04dbe00c5c2d3bb7859e
                                                  ...               
63943bddc261676b46f01ca7ac2f7bd8    f1d4ce8c6dd66c47bbaa8c6781c2a923
83c1379a015df1e13d02aae0204711ab    b80910977a37536adeddd63663f916ad
11c177c8e97725db2631073c19f07b62    d1c427060a0f73f6b889a5c7c61f2ac4
11c177c8e97725db2631073c19f07b62    d1c427060a0f73f6b889a5c7c61f2ac4
66dea50a8b16d9b4dee7af250b4be1a5    006619bbed68b000c8ba3f8725d5409e
Name: product_id, Length: 110750, dtype: object

In [15]:
#filter out order that only had 1 item order on the order_id
order_count = all_products.index.value_counts()
order_count.head()

5a3b1c29a49756e75f1ef513383c0c12    22
8272b63d03f5f79c56e9e4120aec44ef    21
1b15974a0141d54e36626dca3fdc731a    20
428a2f660dc84138d969ccd69a0ab6d5    15
9ef13efd6949e4573a18964dd1bbe7f5    15
Name: order_id, dtype: int64

In [16]:

# create a func that will generate our itemsets and send our new order dataset through the generator
def generator(order):
    order = order.reset_index().values
    for order_id, product in groupby(order, lambda x: x[0]):
        product_list = [item[1] for item in product]
        for item_pair in combinations(product_list, 2):
            yield item_pair

itempair_gen = generator(orders_over_support)
itempair_gen

<generator object generator at 0x7fd5828c9c80>

In [17]:
itempair = pd.Series(Counter(itempair_gen)).to_frame('freqAC')
itempair['supportAC'] = itempair['freqAC'] / len(orders_over_support) *100
itempair = itempair[itempair['supportAC'] >= min_support]
itempair.head()

Unnamed: 0,Unnamed: 1,freqAC,supportAC
08574b074924071f4e201e151b152b4e,08574b074924071f4e201e151b152b4e,17,0.01535
f48eb5c2fde13ca63664f0bb05f55346,f48eb5c2fde13ca63664f0bb05f55346,2,0.001806
a659cb33082b851fb87a33af8f0fff29,a659cb33082b851fb87a33af8f0fff29,21,0.018962
a5a0e71a81ae65aa335e71c06261e260,a5a0e71a81ae65aa335e71c06261e260,10,0.009029
75d6b6963340c6063f7f4cfcccfe6a30,75d6b6963340c6063f7f4cfcccfe6a30,3,0.002709


In [18]:
# create table for association rules and compute relevant metrics
itempair = itempair.reset_index().rename(columns={'level_0' : 'antecedents', 'level_1': 'consequents'})
itempair = (itempair
     .merge(stats.rename(columns={'freq': 'freqA', 'support': 'antecedent support'}), left_on='antecedents', right_index=True)
     .merge(stats.rename(columns={'freq': 'freqC', 'support': 'consequents support'}), left_on='consequents', right_index=True))

In [19]:
itempair['confidenceAtoC'] = itempair['supportAC'] / itempair['antecedent support']
itempair['confidenceCtoA'] = itempair['supportAC'] / itempair['consequents support']
itempair['lift'] = itempair['supportAC'] / (itempair['antecedent support'] * itempair['consequents support'])
itempair = itempair[['antecedents', 'consequents','antecedent support', 'consequents support', 'confidenceAtoC','lift']]   

In [20]:
itempair.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequents support,confidenceAtoC,lift
0,08574b074924071f4e201e151b152b4e,08574b074924071f4e201e151b152b4e,0.116042,0.116042,0.132279,1.139926
1,f48eb5c2fde13ca63664f0bb05f55346,f48eb5c2fde13ca63664f0bb05f55346,0.007253,0.007253,0.248996,34.331898
2,a659cb33082b851fb87a33af8f0fff29,a659cb33082b851fb87a33af8f0fff29,0.019686,0.019686,0.963221,48.930087
3,a5a0e71a81ae65aa335e71c06261e260,a5a0e71a81ae65aa335e71c06261e260,0.009325,0.009325,0.968317,103.843394
4,75d6b6963340c6063f7f4cfcccfe6a30,75d6b6963340c6063f7f4cfcccfe6a30,0.097392,0.097392,0.027813,0.285581


In [21]:
# to find items that have confidence > 0.0001
conf = itempair
market_basket = conf[(conf.confidenceAtoC > 0.0001)]
market_basket.set_index('antecedents', inplace = True)
market_basket.reset_index(inplace=True)
market_basket = market_basket.sort_values('lift', ascending = False)

In [22]:
market_basket

Unnamed: 0,antecedents,consequents,antecedent support,consequents support,confidenceAtoC,lift
2378,270516a3f41dc035aa87d220228f844c,05b515fdc76e888aada3c6d66c201dff,0.010361,0.010361,8.714853,841.131493
1278,c2b534c5a4a6cbfc41aeaf362fb0c060,7a5b821fca01c5a75fa33c06f249e0f5,0.005180,0.003108,2.614456,841.131493
2235,710b7c26b7a742f497bba45fab91a25f,a9d9db064d4afd4458eb3e139fe29167,0.006217,0.006217,5.228912,841.131493
2397,62995b7e571f5760017991632bbfd311,ac1ad58efc1ebf66bfadc09f29bdedc0,0.006217,0.006217,5.228912,841.131493
630,4006da5107400e5ac48dbcc829a36c42,3c7c75671b25b927f05e68b233263e5f,0.001036,0.003108,2.614456,841.131493
...,...,...,...,...,...,...
231,7c1bd920dbdf22470b68bde975dd3ccf,154e7e31ebfa092203795c972e5804a6,0.243480,0.302537,0.011125,0.036773
50,368c6c730842d78016ad823897a372db,b0961721fd839e9982420e807758a2a6,0.402002,0.130547,0.004492,0.034411
51,389d119b48cf3043d311335e499d9c6b,b0961721fd839e9982420e807758a2a6,0.405110,0.130547,0.004458,0.034147
63,368c6c730842d78016ad823897a372db,389d119b48cf3043d311335e499d9c6b,0.402002,0.405110,0.013477,0.033266


#### Analysis

Based on the market basket analysis, there appears to be a strong positive association between the items '270516a3f41dc035aa87d220228f844c' and '05b515fdc76e888aada3c6d66c201dff', suggesting that customers who purchase one of these items are likely to be interested in the other.

#### Method

The dataset has not been further partitioned, and due to computational limitations when using Mlxtend library, a manual process was used for this example.
Source; https://pythondata.com/market-basket-analysis-with-python-and-pandas/

Based on the results obtained thus far, I plan to develop a recommendation system function that utilises the "all product" level of granularity. This approach may be more effective than selecting data at a specific level of granularity, as the latter has not produced optimal results thus far, as seen in the earlier analysis.

In [23]:
#Recommend product based on association rule what item is in the basket ,market basket tableand  metric of interest
# Input basket
mybasket = ['d1c427060a0f73f6b889a5c7c61f2ac4']

# metric
metric = "lift"

def product_recs(basket, df, metric):
   
    # Randomly select an item from the basket
    random_item = np.random.choice(basket, 1)[0]
    print(f"Since you like {random_item}, you might like: ")

    # Find rules where the item is in the antecedent
    rule_filter = df["antecedents"].apply(lambda x: x) == random_item

    # Filter the dataframe using rule_filter and sort by the selected metric
    filtered = df[rule_filter].sort_values(by=metric, ascending=False)

    # Randomly return one of the top 10 items from the filtered dataframe
    recommendation = filtered.head(10).sample()["consequents"]

    return recommendation


product_recs(mybasket, market_basket, metric)


Since you like d1c427060a0f73f6b889a5c7c61f2ac4, you might like: 


122    52c80cedd4e90108bf4fa6a206ef6b03
Name: consequents, dtype: object

To ensure that the best-fitted recommendations are given, I suggest that Olist uses the product recommedation system (product_recs) to generate recommendations for customers when they are at the online checkout section.This should presented as bundle deal to encourage the costumer to purchase the recommendation. As Market basket analysis is most effective when applied to the final combination of products.This recommendation is suitable for both new and existing customers, provided they have items in their online shopping cart.