# Association Rule Learning

## People who bought also bought ...
- Yes, People who bought bread also bought milk.... 
- People who bought phone also bought headphones....
- People who bought burger also bought french fries....

That is what Association Rule Learning will help us figure out!

We are going to look into 2 major Association Rule Learning models.
1. Apriori
2. Eclat

# Apriori

- Apriori algorithm is dependence on 3 values.
    1. Support
        
        <img src="../static/apriori_arl_support.png" alt="apriori_arl_support.png" width="400">
    2. Confidence
        
        <img src="../static/apriori_arl_confidence.png" alt="apriori_arl_confidence.png" width="400">
    3. Lift
        
        <img src="../static/apriori_arl_lift.png" alt="apriori_arl_lift.png" width="400">

## Algorithm

- **Step 1** : Choose the minimum support value.
- **Step 2** : Choose the minimum confidence value.
- **Step 3** : Select all the **subset of transaction** which have higher support value than minimum support. 
- **Step 4** : Select all the **rules of subset** which have higher confidence value than minimum confidence.
- **Step 5** : Sort the rules by decreasing lift. 
- **END** : Your rules are prepared. 

## Manual Example

| Transaction ID (N=6) | Products |
|:--------------:|:--------:|
| T0001 | 'Milk', 'Bread', 'Jam', 'Butter' |
| T0002 | 'Milk', 'Bread' |
| T0003 | 'Milk', 'Bread', 'Butter' |
| T0004 | 'Milk', 'Bread' |
| T0005 | 'Bread', 'Butter' |
| T0006 | 'Milk', 'Jam' |

- **Step 1** : Choose the minimum support value.

    (Minimum Support = 0.3)

- **Step 2** : Choose the minimum confidence value.

    (Minimum Confidence = 0.7)

- **Step 3** : Select all the **subset of transaction** which have higher support value than minimum support (0.3). 

    | Products | Support(X) = feq(X)/N | Picked |
    |:--------:|:---------------------:|:------:|
    | Bread, Butter | 3/6 = 0.5 | ✔️ |
    | Jam, Milk | 2/6 = 0.333 | ✔️ |
    | Butter, Jam | 1/6 = 0.166 | ❌ |
    | Bread, Milk | 4/6 = 0.66 | ✔️ |
    | Butter, Milk | 2/6 = 0.333 | ✔️ |
    | Bread, Jam | 1/6 = 0.166 | ❌ |

- **Step 4** : Select all the **rules of subset** which have higher confidence value than minimum confidence (0.7).

    | Products | Confidence(X → Y) = feq(X, Y)/feq(X) | Picked |
    |:--------:|:-----------------------------------:|:------:|
    | Bread → Butter | 3/5 = 0.6 | ❌ |
    | Butter → Bread | 3/3 = 1.0 | ✔️ |
    |----------------|-----------|----|
    | Jam → Milk | 2/2 = 1.0 | ✔️ |
    | Milk → Jam | 2/5 = 0.4 | ❌ |
    |----------------|-----------|----|
    | Bread → Milk | 4/5 = 0.8 | ✔️ |
    | Milk → Bread | 4/5 = 0.8 | ✔️ |
    |----------------|-----------|----|
    | Butter → Milk | 2/3 = 0.666 | ❌ |
    | Milk → Butter | 2/5 = 0.4 | ❌ |
    |----------------|-----------|----|

- **Step 5** : Sort the rules by decreasing lift. 

    | Products | Lift(X, Y) = Confidence(X, Y)/Support(Y) | Order |
    |:--------:|:-----------------------------------:|:------:|
    | Butter → Bread | 1.0 / (5/6) = 1.2 | 1 |
    | Jam → Milk | 1.0 / (5/6) = 1.12 | 2 |
    | Bread → Milk | 0.8 / (5/6) = 0.96 | 3 |
    | Milk → Bread | 0.8 / (5/6) = 0.96 | 4 |
    

## Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [2]:
dataset = pd.read_csv(r'../dataset/Market_Basket_Optimisation.csv', header = None)
transactions = [[str(value).lower() for value in row] for row in dataset.values]

In [3]:
from apyori import apriori
rules = apriori(transactions = transactions, max_length=2)

In [4]:
results = list(rules)

In [5]:
min_length=2
data_values = []
for i in results:
    if len(i.items) >= min_length:
        for ostats in i.ordered_statistics:
            data_values.append([ostats.items_base, ostats.items_add, i.support, ostats.confidence, ostats.lift])
            # if ostats.items_base and ostats.items_add and 'nan' not in ostats.items_base and 'nan' not in ostats.items_add:
            #     data_values.append([ostats.items_base, ostats.items_add, i.support, ostats.confidence, ostats.lift])
resultsinDataFrame = pd.DataFrame(data_values, columns=["Left Side", "Right Side", "Support", "Confidence", "Lift"])
resultsinDataFrame = resultsinDataFrame.sort_values('Lift', ascending=False)
resultsinDataFrame.to_csv('../dataset/output.csv', index=False)
resultsinDataFrame

Unnamed: 0,Left Side,Right Side,Support,Confidence,Lift
20,(spaghetti),(nan),0.17411,1.0,1.000133
13,(milk),(nan),0.129583,1.0,1.000133
2,(nan),(chocolate),0.163845,0.163867,1.000133
19,(nan),(spaghetti),0.17411,0.174133,1.000133
4,(eggs),(nan),0.179709,1.0,1.000133
5,(nan),(eggs),0.179709,0.179733,1.000133
7,(french fries),(nan),0.170911,1.0,1.000133
8,(nan),(french fries),0.170911,0.170933,1.000133
1,(chocolate),(nan),0.163845,1.0,1.000133
14,(nan),(milk),0.129583,0.1296,1.000133
