In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori,association_rules

# Association Rule for Store Dataset

In this case study, we will explore how association rule can be used to analyze the items that are usualy purcased together.

you can refer to this article to find out about apriori and association rule:
https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

## Load Data

We will use the dataset of the transaction in a certain store. You can get the dataset here: 
https://gist.githubusercontent.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751/raw/72de943e040b8bd0d087624b154d41b2ba9d9b60/retail_dataset.csv

In [2]:
df = pd.read_csv("https://gist.githubusercontent.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751/raw/72de943e040b8bd0d087624b154d41b2ba9d9b60/retail_dataset.csv")
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,


# Get the set of product that has been purchased


Get the unique product that has been purchased

In [3]:
df['0'].unique()

array(['Bread', 'Cheese', 'Meat', 'Eggs', 'Wine', 'Bagel', 'Pencil',
       'Diaper', 'Milk'], dtype=object)

## Preprocess Data

In this step, we will transform our dataset so that we will have a one hot encoding based on the purchased products.

In [48]:
#create an itemset based on the products
itemset = df.values.tolist()
itemset = [[item for item in row if pd.notna(item)] for row in itemset]

# encoding the feature
encoder = TransactionEncoder().fit(itemset)
te = encoder.transform(itemset)
te = te.astype(int)

In [50]:
# create new dataframe from the encoded features
df_encoded = pd.DataFrame(te, columns=encoder.columns_)

# show the new dataframe
df_encoded

Unnamed: 0,Bagel,Bread,Cheese,Diaper,Eggs,Meat,Milk,Pencil,Wine
0,0,1,1,1,1,1,0,1,1
1,0,1,1,1,0,1,1,1,1
2,0,0,1,0,1,1,1,0,1
3,0,0,1,0,1,1,1,0,1
4,0,0,0,0,0,1,0,1,1
...,...,...,...,...,...,...,...,...,...
310,0,1,1,0,1,0,0,0,0
311,0,0,0,0,0,1,1,1,0
312,0,1,1,1,1,1,0,1,1
313,0,0,1,0,0,1,0,0,0


Since, the encoded dataframe consist of the empty column. We will drop the NaN column or select all columns other than the first column.

In [55]:
df_encoded.columns

Index(['Bagel', 'Bread', 'Cheese', 'Diaper', 'Eggs', 'Meat', 'Milk', 'Pencil',
       'Wine'],
      dtype='object')

## Apriori Algorithm

We will use appriori algorithm to determine the frequently purchased products. 
For this case study, we will min_support=0.2

In [56]:
apr = apriori(df_encoded, min_support=0.2, use_colnames=True)
apr



Unnamed: 0,support,itemsets
0,0.425397,(Bagel)
1,0.504762,(Bread)
2,0.501587,(Cheese)
3,0.406349,(Diaper)
4,0.438095,(Eggs)
5,0.47619,(Meat)
6,0.501587,(Milk)
7,0.361905,(Pencil)
8,0.438095,(Wine)
9,0.279365,"(Bread, Bagel)"


Then, we will generate association rule of the frequent itemset based on confidence level with the threshold=0.6

In [57]:
ruleset = association_rules(apr, metric="confidence", min_threshold=0.6)
ruleset

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Bagel),(Bread),0.425397,0.504762,0.279365,0.656716,1.301042,0.064641,1.44265,0.402687
1,(Eggs),(Cheese),0.438095,0.501587,0.298413,0.681159,1.358008,0.07867,1.563203,0.469167
2,(Cheese),(Meat),0.501587,0.47619,0.32381,0.64557,1.355696,0.084958,1.477891,0.526414
3,(Meat),(Cheese),0.47619,0.501587,0.32381,0.68,1.355696,0.084958,1.55754,0.500891
4,(Cheese),(Milk),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148,0.350053
5,(Milk),(Cheese),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148,0.350053
6,(Wine),(Cheese),0.438095,0.501587,0.269841,0.615942,1.227986,0.050098,1.297754,0.330409
7,(Eggs),(Meat),0.438095,0.47619,0.266667,0.608696,1.278261,0.05805,1.338624,0.387409
8,"(Cheese, Meat)",(Eggs),0.32381,0.438095,0.215873,0.666667,1.521739,0.074014,1.685714,0.507042
9,"(Cheese, Eggs)",(Meat),0.298413,0.47619,0.215873,0.723404,1.519149,0.073772,1.893773,0.487091


Provide explanation about __antecedent support__, __consequent support__, __support__, __confidence__, __lift__, __leverage__ and __conviction__

#### Antecedent Support:
Persentase transaksi yang mengandung item yang ada di bagian "if" aturan asosiasi (antecedent). Jika antecedent adalah "Bagel" antecedent support adalah berapa banyak transaksi yang mengandung Bagel.

#### Consequent Support: 
Persentase transaksi yang mengandung item yang ada di bagian "then" aturan asosiasi (consequent). Jika consequent adalah "Bread" consequent support adalah berapa banyak transaksi yang mengandung Bread.

#### Support: 
Persentase transaksi yang mengandung kedua antecedent dan consequent. Jika kita memiliki aturan "Bagel -> Roti" support adalah berapa banyak transaksi yang mengandung keduanya.

#### Confidence:
Seberapa sering aturan terbukti benar, diukur sebagai persentase dari transaksi yang mengandung antecedent juga mengandung consequent. Confidence aturan "Bagel -> Roti" akan memberi tahu seberapa sering Roti dibeli setelah pembelian Bagel.

#### Lift: 
Seberapa banyak aturan "membantu" kita memahami hubungan antara antecedent dan consequent dibandingkan dengan jika hubungan itu acak. Jika lift > 1, itu menunjukkan bahwa pembelian antecedent meningkatkan kemungkinan pembelian consequent.

#### Leverage: 
Seberapa banyak aturan tersebut memberikan "nilai tambah" dibandingkan dengan situasi acak. Itu mengukur sejauh mana pembelian antecedent dan consequent bersamaan lebih sering daripada yang diharapkan secara acak. Nilai leverage yang positif menunjukkan bahwa ada korelasi positif antara antecedent dan consequent.

#### Conviction: 
Seberapa kuat kita percaya bahwa aturan tersebut benar, terlepas dari seberapa sering itu terjadi.
Contoh: Jika conviction tinggi untuk aturan "Bagel -> Roti" itu berarti kita yakin bahwa pembelian roti memengaruhi pembelian susu.