# Association Rule

Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item on another data item and maps accordingly so that it can be more profitable

When Walmart, a chain store in the United States, studied the shopping behavior of customers, the study showed that diapers and beers are bought together. Because, as it turns out, fathers are often tasked with shopping while mothers stay with the baby

## The Algorithm
The Apriori Algorithm, used for the first phase of the Association Rules, is the most popular and classical algorithm in the frequent old parts. These algorithm properties and data are evaluated with Boolean Association Rules. In this algorithm, there are product clusters that pass frequently, and then strong relationships between these products and other products are sought.

The importance of an Association Rules can be determined by 3 parameters that are used to identify the strength of the algorithm. Namely,

- Support
- Confidence
- Lift

We will use the `mlextend` library for this demonstration

In [None]:
# %pip install mlxtend

In [2]:
""" 
Demonstrate the association rules using Apriori algorithm
"""
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import pandas as pd

def create_sample_data():
    """Create sample transaction data"""
    data = {
        'TID': [1, 2, 3, 4, 5],
        'Items': [
            ['bread', 'milk', 'eggs'],
            ['bread', 'diapers'],

            ['milk', 'diapers', 'beer'],
            ['bread', 'milk', 'diapers', 'beer'],
            ['bread', 'milk', 'diapers']
        ]
    }
    return pd.DataFrame(data)

def prepare_transactions(df):
    """Convert transaction data to one-hot encoded format"""
    transactions = pd.get_dummies(df['Items'].apply(pd.Series).stack()).groupby(level=0).sum()

    return transactions

def generate_rules(transactions, min_support=0.2, min_confidence=0.6):
    """Generate association rules using Apriori"""
    frequent_itemsets = apriori(transactions, min_support=min_support, use_colnames=True)
    rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)
    return rules

In [12]:
df = create_sample_data()
transactions = prepare_transactions(df)
rules = generate_rules(transactions)
print(rules.sort_values('confidence', ascending=False))

               antecedents      consequents  antecedent support  \
0                   (beer)        (diapers)                 0.4   
11           (beer, bread)           (milk)                 0.2   
23  (beer, bread, diapers)           (milk)                 0.2   
22     (beer, bread, milk)        (diapers)                 0.2   
21                  (eggs)    (bread, milk)                 0.2   
20            (eggs, milk)          (bread)                 0.2   
19           (bread, eggs)           (milk)                 0.2   
15                  (beer)  (milk, diapers)                 0.4   
13         (beer, diapers)           (milk)                 0.4   
1                   (beer)           (milk)                 0.4   
12            (beer, milk)        (diapers)                 0.4   
10           (beer, bread)        (diapers)                 0.2   
9                   (eggs)           (milk)                 0.2   
4                   (eggs)          (bread)                 0.



In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [3]:
df = pd.read_csv('../xdata/GroceryStoreDataset.csv', names=['products'], sep=',')
df.head()

Unnamed: 0,products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


In [4]:
df.shape

(20, 1)

Split products and create list

In [5]:
data = list(df["products"].apply(lambda x:x.split(",") ))
data

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

Transform the list with one-shot encoding

In [6]:
from mlxtend.preprocessing import TransactionEncoder
a = TransactionEncoder()
a_data = a.fit(data).transform(data)
df = pd.DataFrame(a_data,columns=a.columns_)
df = df.replace(False,0)
df



ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Set a threshold value for the support value and calculate the support value.

In [13]:
df = apriori(df, min_support = 0.2, use_colnames = True, verbose = 1)
df

Processing 42 combinations | Sampling itemset size 3




Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.4,(COFFEE)
4,0.3,(CORNFLAKES)
5,0.25,(MAGGI)
6,0.25,(MILK)
7,0.3,(SUGER)
8,0.35,(TEA)
9,0.2,"(BISCUIT, BREAD)"


Let's view our interpretation values using the Associan rule function.

In [14]:
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.6)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,1.0,0.0375,1.75,0.25,0.285714,0.428571,0.553846
1,(SUGER),(BREAD),0.3,0.65,0.2,0.666667,1.025641,1.0,0.005,1.05,0.035714,0.266667,0.047619,0.487179
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,1.0,0.08,1.8,0.571429,0.4,0.444444,0.583333
3,(SUGER),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,1.0,0.08,1.8,0.571429,0.4,0.444444,0.583333
4,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,1.0,0.1125,3.25,0.75,0.5,0.692308,0.685714
