# Notebook 4 — Association Rule Mining (Apriori)

**Dataset:** Online Retail (Transaction data)

**Purpose:** Prepare transactional data, run Apriori algorithm, and extract association rules.

## Setup
Install `mlxtend` if needed (`pip install mlxtend`) — this notebook uses `mlxtend.frequent_patterns` functions.
Ensure you have the CSV (e.g., `OnlineRetail.csv`) in the working directory.

In [None]:
# Example pipeline for Apriori
import pandas as pd

CSV = 'OnlineRetail.csv'
try:
    df = pd.read_csv(CSV, encoding='ISO-8859-1')
    print('Loaded Online Retail:', df.shape)
    display(df.head())
except Exception as e:
    print('Could not load OnlineRetail.csv — please ensure the file exists in the working directory.\n', e)


In [None]:
from mlxtend.frequent_patterns import apriori, association_rules

# Prepare basket matrix
try:
    # Basic cleaning & creating basket pivot table (example assumes 'InvoiceNo', 'Description', 'Quantity')
    df_clean = df.dropna(subset=['InvoiceNo','Description'])
    df_clean = df_clean[df_clean['InvoiceNo'].astype(str).str.startswith('C') == False]  # remove canceled
    basket = (df_clean
              .groupby(['InvoiceNo', 'Description'])['Quantity']
              .sum().unstack().fillna(0))
    # Encode quantities > 0 as 1
    basket = basket.applymap(lambda x: 1 if x > 0 else 0)
    frequent_itemsets = apriori(basket, min_support=0.02, use_colnames=True)
    rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
    display(frequent_itemsets.sort_values('support', ascending=False).head())
    display(rules.sort_values('lift', ascending=False).head())
except Exception as e:
    print('Apriori step failed — check dataframe columns and types.\n', e)


## Notes
- Adjust `min_support` and `min_threshold` depending on dataset size.
- For large datasets, consider sampling or using transactions grouped by `InvoiceNo` only for top products.
