# Association Rule Mining

It is a rule-based machine learning technique used to find patterns (relationships, structures) in the data.

Association analysis applications are among the most common applications in data science. It will also coincide as referral systems.

These applications may have come up in the following ways, such as "the person who bought that product also bought this product", "those who viewed that ad also looked at these ads", or "the next video recommended for you".

These scenarios are the most frequently encountered scenarios within the scope of e-commerce data science data mining studies.


# Apriori Algorithm

It is the most used method in this field.

Association rule analysis is carried out by examining some metrics:
X: product
Y: product
N: total trade

*Support: It gives the fraction of transactions which contains product X and Y. Basically Support tells us about the frequently bought products or the combination of products bought frequently.
Support(X, Y) = Freq(X,Y)/N

*Confidence: It tells us how often the products X and Y occur together, given the number times X occurs.
Confidence(X, Y) = Freq(X,Y) / Freq(X)

*Lift: Lift indicates the strength of a rule over the random occurrence of A and B. It basically tells us the strength of any rule.
Lift = Support (X, Y)/( Support(X) * Support(Y) ) 



## 1. Understanding the Data

In [None]:
#!pip install mlxtend

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('GroceryStoreDataSet.csv', names=['products'], header=None)
df

Unnamed: 0,products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"
5,"BREAD,TEA,BOURNVITA"
6,"MAGGI,TEA,CORNFLAKES"
7,"MAGGI,BREAD,TEA,BISCUIT"
8,"JAM,MAGGI,BREAD,TEA"
9,"BREAD,MILK"


In [3]:
df.values

array([['MILK,BREAD,BISCUIT'],
       ['BREAD,MILK,BISCUIT,CORNFLAKES'],
       ['BREAD,TEA,BOURNVITA'],
       ['JAM,MAGGI,BREAD,MILK'],
       ['MAGGI,TEA,BISCUIT'],
       ['BREAD,TEA,BOURNVITA'],
       ['MAGGI,TEA,CORNFLAKES'],
       ['MAGGI,BREAD,TEA,BISCUIT'],
       ['JAM,MAGGI,BREAD,TEA'],
       ['BREAD,MILK'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,COCK'],
       ['BREAD,SUGER,BISCUIT'],
       ['COFFEE,SUGER,CORNFLAKES'],
       ['BREAD,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,SUGER'],
       ['BREAD,COFFEE,SUGER'],
       ['TEA,MILK,COFFEE,CORNFLAKES']], dtype=object)

In [4]:
df.shape

(20, 1)

## 2. Data Preparation¶

I will use the Apriori algorithm to perform an association analysis.

The apriori method of the mlxtend library accepts the dataset as a True-False dataframe. I will use the data conversion methods of the mlxtend library again to convert the data. Therefore, I will convert the raw data set to the format that these methods will require.

In [5]:
# Step1: I converted the data into list format. I separated the objects in each line with ','.

data = list(df["products"].apply(lambda x : x.split(',')))
data

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [6]:
# Step2: I will apply the method of converting the data of mlxtend library into True-False
# dataframe.
# First, I install the mlxtend library for those who do not have it installed.

from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit(data).transform(data)
df = pd.DataFrame(te_data,columns=te.columns_)
df

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True
5,False,True,True,False,False,False,False,False,False,False,True
6,False,False,False,False,False,True,False,True,False,False,True
7,True,False,True,False,False,False,False,True,False,False,True
8,False,False,True,False,False,False,True,True,False,False,True
9,False,False,True,False,False,False,False,False,True,False,False


## 3. Implementing Apriori Algorithm

In the output of the Apriori algorithm, we get the frequencies of each combination in the whole data set. For example, in the output below, only the frequency (frequency) of "BISCUIT" in the whole dataset is 0.35, while the frequency (frequency) of "BISCUIT and BREAD" in the whole dataset is 0.20.

The apriori algorithm was given a min_support value of 0.2. Thus, product associations that are below 0.2 support value in combinations have been eliminated. If the verbose argument is 1, it will tell us how many combinations there are. In our example, 42 combinations were formed. In the last case, we have 16 combinations. Thus, our combination of 42-16 = 26 remained below the value of 0.2 support and was considered as an insignificant rate that we would not add to our comments.

In [7]:
from mlxtend.frequent_patterns import apriori
freq_items = apriori(df,min_support=0.20,use_colnames = True, verbose = 1)
freq_items

Processing 42 combinations | Sampling itemset size 3


Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.4,(COFFEE)
4,0.3,(CORNFLAKES)
5,0.25,(MAGGI)
6,0.25,(MILK)
7,0.3,(SUGER)
8,0.35,(TEA)
9,0.2,"(BREAD, BISCUIT)"


In [8]:
freq_items.sort_values("support", ascending =False)

Unnamed: 0,support,itemsets
2,0.65,(BREAD)
3,0.4,(COFFEE)
0,0.35,(BISCUIT)
8,0.35,(TEA)
4,0.3,(CORNFLAKES)
7,0.3,(SUGER)
5,0.25,(MAGGI)
6,0.25,(MILK)
1,0.2,(BOURNVITA)
9,0.2,"(BREAD, BISCUIT)"


In [9]:
df_c = df[['BISCUIT','CORNFLAKES','COFFEE']]
from mlxtend.frequent_patterns import apriori
freq_items = apriori(df_c,min_support=0.00001,use_colnames = True, verbose = 1)
freq_items

Processing 3 combinations | Sampling itemset size 3


Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.3,(CORNFLAKES)
2,0.4,(COFFEE)
3,0.15,"(CORNFLAKES, BISCUIT)"
4,0.1,"(COFFEE, BISCUIT)"
5,0.2,"(CORNFLAKES, COFFEE)"
6,0.1,"(CORNFLAKES, COFFEE, BISCUIT)"


## 4. Frequent Patterns

I will apply the association analysis to the combination of mlxtend's association_rules method and the data set that we have support values. I will interpret my latest output according to the values of "support" and "confidence" and suggest a sample action idea.

Interpretation of Sample Association Analysis Output:

The probability of BISCUIT and BREAD being seen together is 20% since support = 0.20.
When BISCUIT is taken, the probability of getting BREAD is around 57% since confidence = 0.571429.

By giving "min_threshold = 0.3", it is ensured that the values with "confidence" value below 0.3 are not brought.

In [10]:
from mlxtend.frequent_patterns import association_rules
df_res = association_rules(freq_items, metric = "confidence", min_threshold = 0.3)
df_res

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(CORNFLAKES),(BISCUIT),0.3,0.35,0.15,0.5,1.428571,0.045,1.3
1,(BISCUIT),(CORNFLAKES),0.35,0.3,0.15,0.428571,1.428571,0.045,1.225
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
3,(COFFEE),(CORNFLAKES),0.4,0.3,0.2,0.5,1.666667,0.08,1.4
4,"(CORNFLAKES, COFFEE)",(BISCUIT),0.2,0.35,0.1,0.5,1.428571,0.03,1.3
5,"(CORNFLAKES, BISCUIT)",(COFFEE),0.15,0.4,0.1,0.666667,1.666667,0.04,1.8
6,"(COFFEE, BISCUIT)",(CORNFLAKES),0.1,0.3,0.1,1.0,3.333333,0.07,inf
7,(CORNFLAKES),"(COFFEE, BISCUIT)",0.3,0.1,0.1,0.333333,3.333333,0.07,1.35


In [11]:
df_res.sort_values("confidence", ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
6,"(COFFEE, BISCUIT)",(CORNFLAKES),0.1,0.3,0.1,1.0,3.333333,0.07,inf
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
5,"(CORNFLAKES, BISCUIT)",(COFFEE),0.15,0.4,0.1,0.666667,1.666667,0.04,1.8
0,(CORNFLAKES),(BISCUIT),0.3,0.35,0.15,0.5,1.428571,0.045,1.3
3,(COFFEE),(CORNFLAKES),0.4,0.3,0.2,0.5,1.666667,0.08,1.4
4,"(CORNFLAKES, COFFEE)",(BISCUIT),0.2,0.35,0.1,0.5,1.428571,0.03,1.3
1,(BISCUIT),(CORNFLAKES),0.35,0.3,0.15,0.428571,1.428571,0.045,1.225
7,(CORNFLAKES),"(COFFEE, BISCUIT)",0.3,0.1,0.1,0.333333,3.333333,0.07,1.35


## 5. Preparation for Data Filtering

In this section, taking the lowest and highest confidence values, these values will be used in data filtering and the idea of action will be proposed.

Let's find the highest confidence value. The output shows that the highest confidence value is 0.80.

In [12]:
conf_max = df_res['confidence'].max()
conf_max

1.0

Let's find the lowest confidence value. The output shows that the lowest confidence value is 0.307.

In [13]:
conf_min = df_res["confidence"].min()
conf_min

0.33333333333333337

## 6. Data Filtering

Data with the lowest, highest and 0.5 confidence value are filtered. It is ranked in ascending order according to "confidence" value.

In [14]:
df_filt = df_res[ (df_res["confidence"] == conf_min) | (df_res["confidence"] == conf_max) | (df_res["confidence"] == 0.5 )]
df_filt.sort_values("confidence", ascending = True)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
7,(CORNFLAKES),"(COFFEE, BISCUIT)",0.3,0.1,0.1,0.333333,3.333333,0.07,1.35
0,(CORNFLAKES),(BISCUIT),0.3,0.35,0.15,0.5,1.428571,0.045,1.3
3,(COFFEE),(CORNFLAKES),0.4,0.3,0.2,0.5,1.666667,0.08,1.4
4,"(CORNFLAKES, COFFEE)",(BISCUIT),0.2,0.35,0.1,0.5,1.428571,0.03,1.3
6,"(COFFEE, BISCUIT)",(CORNFLAKES),0.1,0.3,0.1,1.0,3.333333,0.07,inf


## Insights

1) The probability of buying both Coffee and cornflakes together is 20% given that support = 0.2 while the probability of getting cornflakes when buying coffee is 50% since confidence is at 0.5. The lift is greater than 1 so it is worth it for the store to implement an action like separating coffee and cornflakes stand so that those buying coffee will need to pass through other products before getting into the cornflakes aisle. <br>

2) The probability of buying Coffee, Cornflakes, and Biscuit together is 10% while the probability of buying Biscuit when buying both cornflakes and coffee is 50%. The lift is also greater tan 1 so it can be recommended that Cornflakes and coffee products to be displayed in very close location while having the biscuit aisle far away from them so the customer can be exposed to more store products that gives higher chances of purchase.

3) In line with insight 2, if we check the probability of buying cornflakes when gettign both biscuit and coffee, it is actually 100%. Given this and the value of lift at 3.33, it will be a better option to implement the following store rule: biscuits and coffees should be in the same aisle while the cornflakes should be displayed very far from those two and preferably close to more expensive or less purchased products.