# Association Rule Learning

## People who bought also bought ...
- Yes, People who bought bread also bought milk.... 
- People who bought phone also bought headphones....
- People who bought burger also bought french fries....

That is what Association Rule Learning will help us figure out!

We are going to look into 2 major Association Rule Learning models.
1. Apriori
2. Eclat

# Apriori

- Apriori algorithm is dependence on 3 values.
    1. Support
        
        <img src="../static/apriori_arl_support.png" alt="apriori_arl_support.png" width="400">
    2. Confidence
        
        <img src="../static/apriori_arl_confidence.png" alt="apriori_arl_confidence.png" width="400">
    3. Lift
        
        <img src="../static/apriori_arl_lift.png" alt="apriori_arl_lift.png" width="400">

## Algorithm

- **Step 1** : Choose the minimum support value.
- **Step 2** : Choose the minimum confidence value.
- **Step 3** : Select all the **subset of transaction** which have higher support value than minimum support. 
- **Step 4** : Select all the **rules of subset** which have higher confidence value than minimum confidence.
- **Step 5** : Sort the rules by decreasing lift. 
- **END** : Your rules are prepared. 

## Manual Example

| Transaction ID (N=6) | Products |
|:--------------:|:--------:|
| T0001 | 'Milk', 'Bread', 'Jam', 'Butter' |
| T0002 | 'Milk', 'Bread' |
| T0003 | 'Milk', 'Bread', 'Butter' |
| T0004 | 'Milk', 'Bread' |
| T0005 | 'Bread', 'Butter' |
| T0006 | 'Milk', 'Jam' |

- **Step 1** : Choose the minimum support value.

    (Minimum Support = 0.3)

- **Step 2** : Choose the minimum confidence value.

    (Minimum Confidence = 0.7)

- **Step 3** : Select all the **subset of transaction** which have higher support value than minimum support (0.3). 

| Products | Support(X) = feq(X)/N | Picked |
|:--------:|:---------------------:|:------:|
| Bread, Butter | 3/6 = 0.5 | ✔️ |
| Jam, Milk | 2/6 = 0.333 | ✔️ |
| Butter, Jam | 1/6 = 0.166 | ❌ |
| Bread, Milk | 4/6 = 0.66 | ✔️ |
| Butter, Milk | 2/6 = 0.333 | ✔️ |
| Bread, Jam | 1/6 = 0.166 | ❌ |

- **Step 4** : Select all the **rules of subset** which have higher confidence value than minimum confidence (0.7).

| Products | Confidence(X → Y) = feq(X, Y)/feq(X) | Picked |
|:--------:|:-----------------------------------:|:------:|
| Bread → Butter | 3/5 = 0.6 | ❌ |
| Butter → Bread | 3/3 = 1.0 | ✔️ |
| Jam → Milk | 2/2 = 1.0 | ✔️ |
| Milk → Jam | 2/5 = 0.4 | ❌ |
| Bread → Milk | 4/5 = 0.8 | ✔️ |
| Milk → Bread | 4/5 = 0.8 | ✔️ |
| Butter → Milk | 2/3 = 0.666 | ❌ |
| Milk → Butter | 2/5 = 0.4 | ❌ |

- **Step 5** : Sort the rules by decreasing lift. 

| Products | Lift(X, Y) = Confidence(X, Y)/Support(Y) | Order |
|:--------:|:-----------------------------------:|:------:|
| Butter → Bread | 1.0 / (5/6) = 1.2 | 1 |
| Jam → Milk | 1.0 / (5/6) = 1.12 | 2 |
| Bread → Milk | 0.8 / (5/6) = 0.96 | 3 |
| Milk → Bread | 0.8 / (5/6) = 0.96 | 4 |
    

## Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
%pip install apyori

Note: you may need to restart the kernel to use updated packages.


## Data Preprocessing

- Data preprocess for apriori algorithm is different the normal steps. 

### Steps:
1. Load the dataset with/without header.
1. Convert pandas data frame to list of list containing only string values. (Keep empty values as nan)


In [3]:
# Load the dataset with/without header.
dataset = pd.read_csv(r'../dataset/Market_Basket_Optimisation.csv', header = None)

# Convert pandas data frame to list of list containing only string values. (Keep empty values as nan)
transactions = [[str(value).lower() for value in row] for row in dataset.values]

## Train Apriori ARL Model

In [4]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, max_length = 2)
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

## Visualize result

If we check the `results` we can understand that it is returning RelationRecord object as shown in below.
```
RelationRecord(
    items=frozenset({'chicken', 'light cream'}), 
    support=0.004532728969470737, 
    ordered_statistics=[
        OrderedStatistic(
            items_base=frozenset({'light cream'}), 
            items_add=frozenset({'chicken'}), 
            confidence=0.29059829059829057, 
            lift=4.84395061728395
        )
    ]
),
```
We need to loop through all the `results` and create new data frame having this information "Left Side", "Right Side", "Support", "Confidence", "Lift".

So, that we can perform sort on "Lift" value on top 10 columns.


In [5]:
data_values = []
min_length=2
for result in results:
    if len(result.items) >= 2 and 'nan' not in list(result.items):
        for ostats in result.ordered_statistics:
                data_values.append([tuple(ostats.items_base)[0], tuple(ostats.items_add)[0], result.support, ostats.confidence, ostats.lift])     
resultsinDataFrame = pd.DataFrame(data_values, columns=["Left Side", "Right Side", "Support", "Confidence", "Lift"])
resultsinDataFrame = resultsinDataFrame.nlargest(n=10, columns="Lift")
resultsinDataFrame

Unnamed: 0,Left Side,Right Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
