#Market Basket Analysis in Python

####Market basket analysis is used by companies to identify items that are frequently purchased together. Notice, when you visit the grocery store, how baby formula and diapers are always sold in the same aisle. Similarly, bread, butter, and jam are all placed near each other so that customers can easily purchase them together. The technique uncovers hidden correlations that cannot be identified by the human eye by using a set of statistical rules to identify product combinations that occur frequently in transactions.

[This is the dataset](https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset)

Reading the Dataset

In [1]:
import pandas as pd
data=pd.read_csv("/content/Groceries_dataset.csv")
data

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk
...,...,...,...
38760,4471,08-10-2014,sliced cheese
38761,2022,23-02-2014,candy
38762,1097,16-04-2014,cake bar
38763,1510,03-12-2014,fruit/vegetable juice


Data Preparation for Market Basket Analysis

In [5]:
data['single_transaction']=data['Member_number'].astype(str)+'_'+data['Date'].astype(str)

data.head()

Unnamed: 0,Member_number,Date,itemDescription,single_trtansaction,single_transaction
0,1808,21-07-2015,tropical fruit,1808_21-07-2015,1808_21-07-2015
1,2552,05-01-2015,whole milk,2552_05-01-2015,2552_05-01-2015
2,2300,19-09-2015,pip fruit,2300_19-09-2015,2300_19-09-2015
3,1187,12-12-2015,other vegetables,1187_12-12-2015,1187_12-12-2015
4,3037,01-02-2015,whole milk,3037_01-02-2015,3037_01-02-2015


In [7]:
df2=pd.crosstab(data['single_transaction'],data['itemDescription'])
df2.tail()

itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
single_transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4999_24-01-2015,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0
4999_26-12-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5000_09-03-2014,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5000_10-02-2015,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5000_16-11-2014,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
def encode(item_freq):
    res = 0
    if item_freq > 0:
        res = 1
    return res

basket_input = df2.applymap(encode)

  basket_input = df2.applymap(encode)


Build the Apriori Algorithm for Market Basket Analysis

In [12]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

frequent_itemsets = apriori(basket_input, min_support=0.001, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift")

rules.head()




Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(bottled water),(UHT-milk),0.060683,0.021386,0.001069,0.017621,0.823954,1.0,-0.000228,0.996168,-0.185312,0.013201,-0.003847,0.033811
1,(UHT-milk),(bottled water),0.021386,0.060683,0.001069,0.05,0.823954,1.0,-0.000228,0.988755,-0.179204,0.013201,-0.011373,0.033811
2,(other vegetables),(UHT-milk),0.122101,0.021386,0.002139,0.017515,0.818993,1.0,-0.000473,0.99606,-0.201119,0.01513,-0.003956,0.058758
3,(UHT-milk),(other vegetables),0.021386,0.122101,0.002139,0.1,0.818993,1.0,-0.000473,0.975443,-0.184234,0.01513,-0.025175,0.058758
4,(sausage),(UHT-milk),0.060349,0.021386,0.001136,0.018826,0.880298,1.0,-0.000154,0.997391,-0.126418,0.014096,-0.002616,0.035976


“antecedents” and “consequents” columns show items that are frequently purchased together.

In [13]:
rules.sort_values(["support", "confidence","lift"],axis = 0, ascending = False).head(8)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
622,(rolls/buns),(whole milk),0.110005,0.157923,0.013968,0.126974,0.804028,1.0,-0.003404,0.96455,-0.214986,0.055,-0.036752,0.107711
623,(whole milk),(rolls/buns),0.157923,0.110005,0.013968,0.088447,0.804028,1.0,-0.003404,0.97635,-0.224474,0.055,-0.024222,0.107711
694,(yogurt),(whole milk),0.085879,0.157923,0.011161,0.129961,0.82294,1.0,-0.002401,0.967861,-0.190525,0.047975,-0.033206,0.100317
695,(whole milk),(yogurt),0.157923,0.085879,0.011161,0.070673,0.82294,1.0,-0.002401,0.983638,-0.203508,0.047975,-0.016634,0.100317
550,(soda),(other vegetables),0.097106,0.122101,0.009691,0.099794,0.817302,1.0,-0.002166,0.975219,-0.198448,0.046252,-0.02541,0.089579
551,(other vegetables),(soda),0.122101,0.097106,0.009691,0.079365,0.817302,1.0,-0.002166,0.980729,-0.202951,0.046252,-0.019649,0.089579
648,(sausage),(whole milk),0.060349,0.157923,0.008955,0.148394,0.939663,1.0,-0.000575,0.988811,-0.063965,0.042784,-0.011316,0.102551
649,(whole milk),(sausage),0.157923,0.060349,0.008955,0.056708,0.939663,1.0,-0.000575,0.99614,-0.070851,0.042784,-0.003875,0.102551


-Rolls and milk

-Yogurt and milk

-Sausages and milk

-Soda and vegetables