# Association Rule Mining using FP-Growth

This notebook demonstrates how to extract frequent itemsets and association rules from transaction data using the FP-Growth algorithm, inspired by the Kaggle notebook by Mohammed Derouiche.

---

**Dataset**: Cleaned version of the [UCI Online Retail Dataset](https://archive.ics.uci.edu/ml/datasets/online+retail)  
**Goal**: Generate association rules in the form `antecedents → consequents` to be used in a recommendation system API.


In [2]:
# Import necessary libraries
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import fpgrowth, association_rules

In [3]:
# Load the dataset
df = pd.read_csv("../data/transactions_fpgrowth.csv")

In [4]:
# Convert the 'items' column to a list of transactions
transactions = df["items"].str.split(",")

In [5]:
# Transform the transactions into a one-hot encoded DataFrame
te = TransactionEncoder()
te_data = te.fit_transform(transactions)
df_trans = pd.DataFrame(te_data, columns=te.columns_)

In [6]:
# Apply the FP-Growth algorithm to find frequent itemsets
frequent_itemsets = fpgrowth(df_trans, min_support=0.01, use_colnames=True)

In [7]:
# Generate association rules from the frequent itemsets
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

In [8]:
# Resulting rules sorted by lift
rules.sort_values("lift", ascending=False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
590,"(set 3 retrospot tea, coffee)",(sugar),0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
591,(sugar),"(set 3 retrospot tea, coffee)",0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
592,(set 3 retrospot tea),"(sugar, coffee)",0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
589,"(sugar, coffee)",(set 3 retrospot tea),0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
585,(set 3 retrospot tea),(sugar),0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
584,(sugar),(set 3 retrospot tea),0.013005,0.013005,0.013005,1.0,76.896266,1.0,0.012835,inf,1.0,1.0,1.0,1.0
907,(regency tea plate green),(regency tea plate pink),0.014569,0.012087,0.0109,0.748148,61.895899,1.0,0.010724,3.922595,0.99839,0.691781,0.745067,0.824967
906,(regency tea plate pink),(regency tea plate green),0.012087,0.014569,0.0109,0.901786,61.895899,1.0,0.010724,10.033475,0.995881,0.691781,0.900334,0.824967
583,(coffee),(sugar),0.017213,0.013005,0.013005,0.755486,58.094044,1.0,0.012781,4.036558,1.0,0.755486,0.752264,0.877743
593,(coffee),"(sugar, set 3 retrospot tea)",0.017213,0.013005,0.013005,0.755486,58.094044,1.0,0.012781,4.036558,1.0,0.755486,0.752264,0.877743


In [9]:
# keep three columns: antecedents, consequents, and confidence
rules_tidy = (
    rules[["antecedents", "consequents", "confidence"]]
    .rename(columns={
        "antecedents":  "antecedent",
        "consequents":  "consequent"
    })
)

In [10]:
# One rule can have multiple antecedents and consequents, so we need to tidy the data
rules_tidy["antecedent"] = rules_tidy["antecedent"].apply(lambda s: next(iter(s)))
rules_tidy["consequent"] = rules_tidy["consequent"].apply(lambda s: next(iter(s)))

In [11]:
# sort
rules_tidy = rules_tidy.sort_values("confidence", ascending=False)

In [12]:
# Save the tidy rules to a CSV file
output_path = "../data/rules.csv"
rules_tidy.to_csv(output_path, index=False)

print(f"Saved {len(rules_tidy):,} rules  →  {output_path}")
rules_tidy.head()

Saved 952 rules  →  ../data/rules.csv


Unnamed: 0,antecedent,consequent,confidence
584,sugar,set 3 retrospot tea,1.0
586,set 3 retrospot tea,coffee,1.0
585,set 3 retrospot tea,sugar,1.0
582,sugar,coffee,1.0
591,sugar,set 3 retrospot tea,1.0
