<big><b>Association Rule Mining</big></b>

Association rule mining is a technique to identify underlying relations between different items. For example, a Super Market where customers can buy variety of items. Usually, there is a pattern in what the customers buy the items, like mothers with babies buy baby products such as milk and diapers. Damsels may buy makeup items etc. To be precise, transactions involves a pattern. More profit can be generated if the relationship between the items purchased in different transactions can be identified.

There are many methods to perform association rule mining. Here I'm using Apriori algorithm.

<b>Apriori algorithm:</b>

There are three major components of Apriori algorithm:

1. Support
2. Confidence
3. Lift

<b>Support:</b> It refers to the default popularity of an item and can be calculated by finding number of transactions containing a particular item divided by total number of transactions.

<b>Formula to find support:</b> Support(item) = (Transactions containing (item))/(Total Transactions)

For Example if out of 200 transactions, 50 transactions contain toys then the support for item toy can be calculated as:

Support(toy) = (Transactions containing toy)/(Total Transactions)

Support(toy) = 50/200 = 40%

<b>Confidence:</b>It refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by total number of transactions where A is bought. 

Mathematically, it can be represented as:

<b>Confidence(A→B) =</b> (Transactions containing both (A and B))/(Transactions containing A)
Coming back to our problem, we had 20 transactions where toy and battery were bought together. While for 80 transactions, battery are bought. Then we can find likelihood of buying toy when a battery is bought can be represented as confidence of battery -> toy and can be mathematically written as:

Confidence(battery→toy) = (Transactions containing both (Burger and Ketchup))/(Transactions containing A)

Confidence(battery→toy) = 20/80 =  = 25%


<b>Lift(A -> B)</b>: refers to the increase in the ratio of sale of B when A is sold. Lift(A –> B) can be calculated by dividing Confidence(A -> B) divided by Support(B). 

<b>Mathematically</b> it can be represented as:


Lift(A→B) = (Confidence (A→B))/(Support (B))
                          

In [70]:
#A dataset has been defined here in which the algorithm will work on.

dataset = [['shoe', 'dress', 'bag', 'earring', 'lipstick', 'pant'],
 ['cosmetics', 'dress', 'bag', 'earring', 'lipstick', 'pant'],
 ['shoe','kajal','earring','lipstick'],
 ['shoe', 'top','tshirt', 'earring', 'pant'],
 ['corn','dress','eyeliner','earring','ice cream','lipstick']]

<b>Firstly we are going to import the necessary Libraries</b>
 

In [53]:
import pandas as pd

In [54]:
from mlxtend.preprocessing import TransactionEncoder

In [55]:
dataset

[['milk', 'onion', 'nutmeg', 'kidney beans', 'egg', 'yogurt'],
 ['dill', 'onion', 'nutmeg', 'kidney beans', 'egg', 'yogurt'],
 ['milk', 'apple', 'kidney beans', 'egg'],
 ['milk', 'unicorn', 'corn', 'kidney beans', 'yogurt'],
 ['corn', 'onion', 'onio', 'kidney beans', 'ice cream', 'egg']]

In [56]:
#Using and TransactionEncoder object, we can transform this dataset into an array format.
t = TransactionEncoder()

In [57]:
# using fit method, the TransactionEncoder learns the unique labels in the dataset..
t_ary=t.fit(dataset).transform(dataset)

In [58]:
#converting to dataframe..
df = pd.DataFrame(te_ary, columns = te.columns_)

In [69]:
df

Unnamed: 0,bag,corn,cosmetics,dress,earring,eyeliner,ice cream,kajal,lipstick,pant,shoe,top,tshirt
0,True,False,False,True,True,False,False,False,True,True,True,False,False
1,True,False,True,True,True,False,False,False,True,True,False,False,False
2,False,False,False,False,True,False,False,True,True,False,True,False,False
3,False,False,False,False,True,False,False,False,False,True,True,True,True
4,False,True,False,True,True,True,True,False,True,False,False,False,False


In [60]:
from mlxtend.frequent_patterns import apriori

In [61]:
#apriori function used to find the frequent items bought..
frequent_item = apriori(df, min_support=0.6, use_colnames=True)

In [62]:
frequent_item

Unnamed: 0,support,itemsets
0,0.6,(dress)
1,1.0,(earring)
2,0.8,(lipstick)
3,0.6,(pant)
4,0.6,(shoe)
5,0.6,"(earring, dress)"
6,0.6,"(dress, lipstick)"
7,0.8,"(earring, lipstick)"
8,0.6,"(earring, pant)"
9,0.6,"(earring, shoe)"


In [63]:
from mlxtend.frequent_patterns import association_rules

In [64]:
#association_rules function will generate association rules from frequent item
res = association_rules(frequent_item,metric = "confidence", min_threshold = 0.7 )
res

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(dress),(earring),0.6,1.0,0.6,1.0,1.0,0.0,inf
1,(dress),(lipstick),0.6,0.8,0.6,1.0,1.25,0.12,inf
2,(lipstick),(dress),0.8,0.6,0.6,0.75,1.25,0.12,1.6
3,(earring),(lipstick),1.0,0.8,0.8,0.8,1.0,0.0,1.0
4,(lipstick),(earring),0.8,1.0,0.8,1.0,1.0,0.0,inf
5,(pant),(earring),0.6,1.0,0.6,1.0,1.0,0.0,inf
6,(shoe),(earring),0.6,1.0,0.6,1.0,1.0,0.0,inf
7,"(earring, dress)",(lipstick),0.6,0.8,0.6,1.0,1.25,0.12,inf
8,"(earring, lipstick)",(dress),0.8,0.6,0.6,0.75,1.25,0.12,1.6
9,"(lipstick, dress)",(earring),0.6,1.0,0.6,1.0,1.0,0.0,inf


In [65]:
res = res[['antecedents','consequents','support','confidence','lift']]

In [66]:
print(res)

            antecedents          consequents  support  confidence  lift
0               (dress)            (earring)      0.6        1.00  1.00
1               (dress)           (lipstick)      0.6        1.00  1.25
2            (lipstick)              (dress)      0.6        0.75  1.25
3             (earring)           (lipstick)      0.8        0.80  1.00
4            (lipstick)            (earring)      0.8        1.00  1.00
5                (pant)            (earring)      0.6        1.00  1.00
6                (shoe)            (earring)      0.6        1.00  1.00
7      (earring, dress)           (lipstick)      0.6        1.00  1.25
8   (earring, lipstick)              (dress)      0.6        0.75  1.25
9     (lipstick, dress)            (earring)      0.6        1.00  1.00
10              (dress)  (earring, lipstick)      0.6        1.00  1.25
11           (lipstick)     (earring, dress)      0.6        0.75  1.25


In [67]:
filtr = res[res['confidence']>=0.78]

In [68]:
print(filtr)

          antecedents          consequents  support  confidence  lift
0             (dress)            (earring)      0.6         1.0  1.00
1             (dress)           (lipstick)      0.6         1.0  1.25
3           (earring)           (lipstick)      0.8         0.8  1.00
4          (lipstick)            (earring)      0.8         1.0  1.00
5              (pant)            (earring)      0.6         1.0  1.00
6              (shoe)            (earring)      0.6         1.0  1.00
7    (earring, dress)           (lipstick)      0.6         1.0  1.25
9   (lipstick, dress)            (earring)      0.6         1.0  1.00
10            (dress)  (earring, lipstick)      0.6         1.0  1.25
