# Market Basket Analysis

The main goal of market basket analysis in marketing is to provide the retailer with the information necessary to understand the buyer’s purchasing behaviour, which can help the retailer make incorrect decisions.

[Source link](https://thecleverprogrammer.com/2020/11/16/apriori-algorithm-using-python/)


## Importing libraries

In [1]:
import pandas as pd
import plotly.express as px
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

## Loading the dataset

In [2]:
pd.read_csv("data/Groceries_dataset.csv").head(2)

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk


In [3]:
data = pd.read_csv("data/Groceries_dataset.csv", parse_dates=["Date"], dayfirst=True)
data.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,2015-07-21,tropical fruit
1,2552,2015-01-05,whole milk
2,2300,2015-09-19,pip fruit
3,1187,2015-12-12,other vegetables
4,3037,2015-02-01,whole milk


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38765 entries, 0 to 38764
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Member_number    38765 non-null  int64         
 1   Date             38765 non-null  datetime64[ns]
 2   itemDescription  38765 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 908.7+ KB


## Data Exploration

### Top 10 most selling products:

In [5]:
five_top = data["itemDescription"].value_counts().sort_values(ascending=False)[:5]
fig = px.bar(x=five_top.index, y=five_top.values)
fig.update_layout(
    title_text="Top 10 frequently sold products (Graphical Representation)",
    xaxis_title="Products",
    yaxis_title="Count",
    width=800,
)
fig.show()

### Exploring the higher sales

In [6]:
data["Month-Year"] = data["Date"].dt.strftime("%m-%Y")
fig = px.bar(
    data_frame=data["Month-Year"].value_counts(ascending=False),
    color=data["Month-Year"].value_counts(ascending=False),
    labels={"value":"Count", "index":"Date","color":"Meter"},
)

fig.update_layout(title_text="Exploring higher sales by the date")

fig.show()

**Observations:**

From the above visualizations we can observe that:

- Milk is bought the most, followed by vegetables.
- Most shopping takes place in 08-2015 / 01-2015, while 02-2014 / 03-2014 is the least demanding.

## Apriori Algorithm

### Transactions

In [7]:
basket = data.groupby(["Member_number", "itemDescription"])["Date"].agg("count")
transactions = basket.unstack(fill_value=0)
transactions

itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
Member_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,2,1,0
1001,0,0,0,0,0,0,0,0,1,0,...,0,0,0,1,0,1,0,2,0,0
1002,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1003,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1004,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,3,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4996,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4997,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,0,0
4998,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4999,0,0,0,0,0,0,0,0,0,2,...,0,0,0,1,0,0,0,0,1,0


In [8]:
def one_hot_encoder(x):
    if x > 0:
        return True
    else:
        return False

transactions = transactions.applymap(one_hot_encoder)
transactions

itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
Member_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False
1001,False,False,False,False,False,False,False,False,True,False,...,False,False,False,True,False,True,False,True,False,False
1002,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
1003,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1004,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4996,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4997,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,True,False,False
4998,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4999,False,False,False,False,False,False,False,False,False,True,...,False,False,False,True,False,False,False,False,True,False


### Frequent itemsets

In [9]:
frequent_itemsets = apriori(
    df=transactions,
    min_support=0.1,
    use_colnames=True,
)
frequent_itemsets.sort_values(by="support", ascending=False)

Unnamed: 0,support,itemsets
27,0.458184,(whole milk)
16,0.376603,(other vegetables)
20,0.349666,(rolls/buns)
24,0.313494,(soda)
28,0.282966,(yogurt)
25,0.23371,(tropical fruit)
21,0.230631,(root vegetables)
2,0.213699,(bottled water)
22,0.206003,(sausage)
32,0.19138,"(other vegetables, whole milk)"


## Association rules

In [10]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(bottled water),(whole milk),0.213699,0.458184,0.112365,0.52581,1.147597,0.014452,1.142615,0.163569
1,(whole milk),(bottled water),0.458184,0.213699,0.112365,0.245241,1.147597,0.014452,1.04179,0.237376
2,(rolls/buns),(other vegetables),0.349666,0.376603,0.146742,0.419663,1.114335,0.015056,1.074197,0.157772
3,(other vegetables),(rolls/buns),0.376603,0.349666,0.146742,0.389646,1.114335,0.015056,1.065502,0.164589
4,(soda),(other vegetables),0.313494,0.376603,0.124166,0.396072,1.051695,0.006103,1.032237,0.071601
5,(other vegetables),(soda),0.376603,0.313494,0.124166,0.3297,1.051695,0.006103,1.024178,0.078849
6,(other vegetables),(whole milk),0.376603,0.458184,0.19138,0.508174,1.109106,0.018827,1.101643,0.157802
7,(whole milk),(other vegetables),0.458184,0.376603,0.19138,0.417693,1.109106,0.018827,1.070564,0.181562
8,(yogurt),(other vegetables),0.282966,0.376603,0.120318,0.425204,1.12905,0.013752,1.084553,0.159406
9,(other vegetables),(yogurt),0.376603,0.282966,0.120318,0.319482,1.12905,0.013752,1.05366,0.18335
