<a href="https://colab.research.google.com/github/hbisgin/datamining/blob/main/AssociationRuleMiningExample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Sample dataset
data = {
    'Transaction': ['T1', 'T2', 'T3', 'T4', 'T5'],
    'Items': [['milk', 'bread', 'eggs'],
              ['bread', 'butter', 'eggs'],
              ['milk', 'bread', 'butter', 'eggs'],
              ['milk', 'bread'],
              ['milk', 'eggs']]
}

df = pd.DataFrame(data)

# Convert the list of items into a one-hot encoded DataFrame
one_hot_encoded = df['Items'].str.join('|').str.get_dummies()

print(one_hot_encoded.head())

# Apply Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(one_hot_encoded, min_support=0.4, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate association rules
association_rules_df = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

print("\nAssociation Rules:")
association_rules_df.head()


   bread  butter  eggs  milk
0      1       0     1     1
1      1       1     1     0
2      1       1     1     1
3      1       0     0     1
4      0       0     1     1
Frequent Itemsets:
    support               itemsets
0       0.8                (bread)
1       0.4               (butter)
2       0.8                 (eggs)
3       0.8                 (milk)
4       0.4        (butter, bread)
5       0.6          (eggs, bread)
6       0.6          (milk, bread)
7       0.4         (eggs, butter)
8       0.6           (milk, eggs)
9       0.4  (eggs, butter, bread)
10      0.4    (milk, eggs, bread)

Association Rules:


  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(butter),(bread),0.4,0.8,0.4,1.0,1.25,0.08,inf,0.333333
1,(eggs),(bread),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
2,(bread),(eggs),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
3,(milk),(bread),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
4,(bread),(milk),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25


**Example 1 -- Generating Association Rules from Frequent Itemsets**

The `generate_rules` takes dataframes of frequent itemsets as produced by the apriori, fpgrowth, or `fpmax` functions in mlxtend.association. To demonstrate the usage of the `generate_rules` method, we first create a pandas DataFrame of frequent itemsets as generated by the `fpgrowth` function:

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth


dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
print(te_ary)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True)
### alternatively:
#frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
#frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)

frequent_itemsets

The `generate_rules()` function allows you to (1) specify your metric of interest and (2) the according threshold. Currently implemented measures are `confidence` and `lift`. Let's say you are interested in rules derived from the frequent itemsets only if the level of confidence is above the 70 percent threshold `(min_threshold=0.7`):

In [7]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0,0.0
1,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf,0.0
2,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
3,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0
4,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
6,"(Kidney Beans, Eggs)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0
7,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,0.0
8,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
9,(Eggs),"(Kidney Beans, Onion)",0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0


**Example 2 -- Rule Generation and Selection Criteria**

If you are interested in rules according to a different metric of interest, you can simply adjust the metric and min_threshold arguments . E.g. if you are only interested in rules that have a lift score of >= 1.2, you would do the following:

In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules