# 🛒 Market Basket Analysis with Association Rules

In this notebook, we generate product association rules using the Apriori algorithm. These rules help us understand which products are frequently bought together — supporting product bundling, cross-selling, and shelf placement strategies.


## 📦 Load Preprocessed Transactions

We'll start by loading the grouped transactional data generated in the previous notebook.


In [43]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Load the grouped transactions
df = pd.read_csv("../data/market_basket_grouped.csv")

# Preview structure
df.head()


Unnamed: 0,TransactionID,Item
0,T0001,['Bread']
1,T0002,"['Eggs', 'Tomatoes', 'Butter']"
2,T0003,['Beef']
3,T0004,"['Apples', 'Bread', 'Beef', 'Chicken', 'Milk']"
4,T0005,"['Tomatoes', 'Bread', 'Eggs', 'Bananas', 'Appl..."


## 🧹 Prepare Data for Itemset Mining

We need to convert the list of items in each transaction into a one-hot encoded format using `TransactionEncoder`.


In [44]:
from ast import literal_eval

# Convert stringified lists into actual lists
df['Item'] = df['Item'].apply(literal_eval)

# One-hot encode the transactions
te = TransactionEncoder()
te_ary = te.fit(df['Item']).transform(df['Item'])
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

df_encoded.head()


Unnamed: 0,Apples,Bananas,Beef,Bread,Butter,Cheese,Chicken,Eggs,Milk,Tomatoes
0,False,False,False,True,False,False,False,False,False,False
1,False,False,False,False,True,False,False,True,False,True
2,False,False,True,False,False,False,False,False,False,False
3,True,False,True,True,False,False,True,False,True,False
4,True,True,False,True,False,False,False,True,False,True


## 📊 Generate Frequent Itemsets

Using the Apriori algorithm, we’ll identify itemsets that occur frequently across transactions (min support = 0.2).


In [45]:
from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(df_encoded, min_support=0.05, use_colnames=True)
frequent_itemsets.sort_values(by="support", ascending=False).head()


Unnamed: 0,support,itemsets
9,0.344,(Tomatoes)
2,0.338,(Beef)
1,0.32,(Bananas)
8,0.32,(Milk)
6,0.318,(Chicken)


## 📈 Generate Association Rules

We use the `mlxtend` library’s Apriori and `association_rules` methods to extract patterns from transaction data.

- `min_support`: 0.05 (minimum 5% of all transactions)
- `metric`: lift
- `min_threshold`: 1.0 (to ensure positive correlation)

The resulting rules contain:
- **antecedents** and **consequents**
- **support**: frequency of rule in the dataset
- **confidence**: likelihood that consequent is purchased given antecedent
- **lift**: how much more likely the items are to be bought together than randomly


In [46]:
from mlxtend.frequent_patterns import association_rules

# Generate rules from frequent itemsets
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

# Sort rules by lift
rules = rules.sort_values(by="lift", ascending=False)
rules.head()


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
18,(Chicken),(Beef),0.318,0.338,0.132,0.415094,1.22809,1.0,0.024516,1.131806,0.272327,0.251908,0.116457,0.402813
19,(Beef),(Chicken),0.338,0.318,0.132,0.390533,1.22809,1.0,0.024516,1.11901,0.280555,0.251908,0.106353,0.402813
28,(Milk),(Eggs),0.32,0.272,0.106,0.33125,1.217831,1.0,0.01896,1.088598,0.263041,0.218107,0.081387,0.360478
29,(Eggs),(Milk),0.272,0.32,0.106,0.389706,1.217831,1.0,0.01896,1.114217,0.245698,0.218107,0.102509,0.360478
26,(Chicken),(Cheese),0.318,0.296,0.108,0.339623,1.147374,1.0,0.013872,1.066057,0.188335,0.213439,0.061964,0.352244


## 💾 Save Rules to CSV

Export the generated association rules to a CSV file for further use or dashboarding.


In [47]:
rules.to_csv("../data/market_basket_rules.csv", index=False)
print("Rules exported successfully.")

Rules exported successfully.


## ⭐ Optional: Filter Top Rules

We can filter the rules to focus on the most relevant ones — for example:

- `lift > 1.2`: Strong positive correlation
- `confidence > 60%`: Likely to co-occur

The top rules give us the most actionable patterns.


In [48]:
top_rules = rules[(rules['lift'] > 1.0) & (rules['confidence'] > 0.3)]
top_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift
18,(Chicken),(Beef),0.132,0.415094,1.22809
19,(Beef),(Chicken),0.132,0.390533,1.22809
28,(Milk),(Eggs),0.106,0.33125,1.217831
29,(Eggs),(Milk),0.106,0.389706,1.217831
26,(Chicken),(Cheese),0.108,0.339623,1.147374
27,(Cheese),(Chicken),0.108,0.364865,1.147374
30,(Tomatoes),(Eggs),0.106,0.30814,1.132866
31,(Eggs),(Tomatoes),0.106,0.389706,1.132866
6,(Tomatoes),(Apples),0.12,0.348837,1.118068
7,(Apples),(Tomatoes),0.12,0.384615,1.118068


## ✅ Next Steps

- Visualize association rules using network graphs or bar charts
- Deliver a Power BI report to showcase key rules
- Apply filtering by time period, product categories, or regions
