# Modularized with utils.dm
This notebook now relies on `utils/dm.py` for synthetic transactions generation and Apriori mining (mlxtend or fallback), preserving artifacts.

# Section 2: Data Mining
## Task 3 Part B: Association Rule Mining (10 Marks)
This stand-alone notebook generates synthetic transactional data, runs Apriori (mlxtend if available, else a minimal fallback), extracts rules with specified thresholds, and provides analytical commentary.

In [1]:
# 1. Imports & Configuration
import numpy as np
import pandas as pd
from pathlib import Path
import random, json

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
ARTIFACT_DIR = Path('artifacts')
ARTIFACT_DIR.mkdir(exist_ok=True)
print('Configuration initialized.')

Configuration initialized.


In [None]:
# Imports & setup via utils.dm
import sys
from pathlib import Path
ROOT = Path.cwd()
for parent in [ROOT] + list(ROOT.parents):
    if (parent / 'utils' / 'dm.py').exists():
        sys.path.insert(0, str(parent))
        break
from utils import dm
import numpy as np, random
np.random.seed(42); random.seed(42)
ARTIFACT_DIR = Path('artifacts'); ARTIFACT_DIR.mkdir(exist_ok=True)
MIN_SUPPORT = 0.2
MIN_CONF = 0.5
print('Configuration initialized.')

In [None]:
# 2. Synthetic Transaction Generation (via utils.dm)
transactions = dm.generate_synthetic_transactions(n_transactions=45, rng_seed=42)
print('First 5 transactions sample:')
for t in transactions[:5]:
    print(t)
print('Total transactions:', len(transactions))

First 5 transactions sample:
['bananas', 'beer', 'diapers', 'pasta', 'tomatoes']
['bananas', 'butter', 'cereal', 'chicken', 'coffee', 'onions', 'pasta', 'tea', 'yogurt']
['bread', 'cereal', 'chips']
['bananas', 'bread', 'coffee', 'milk', 'soda', 'tea']
['apples', 'beer', 'butter', 'chips', 'coffee', 'diapers', 'onions', 'rice', 'soda', 'tea']
Total transactions: 45


In [None]:
# 3. Apriori Mining (via utils.dm)
rules = dm.apriori_rules(transactions, min_support=MIN_SUPPORT, min_confidence=MIN_CONF)
import pandas as pd, numpy as np
if rules is None or (hasattr(rules, 'empty') and rules.empty):
    print('No rules found at given thresholds. Consider lowering support/confidence.')
    rules_sorted = pd.DataFrame(columns=['antecedents','consequents','support','confidence','lift'])
else:
    # Sort and persist top 5 by lift; string-ify sets for CSV
    rules_sorted = rules.sort_values('lift', ascending=False).head(5).reset_index(drop=True)
    def set_to_str(x):
        if isinstance(x, (set, frozenset)):
            return ','.join(sorted(list(x)))
        return x
    out_df = rules_sorted.copy()
    if 'antecedents' in out_df.columns:
        out_df['antecedents'] = out_df['antecedents'].apply(set_to_str)
    if 'consequents' in out_df.columns:
        out_df['consequents'] = out_df['consequents'].apply(set_to_str)
    out_df.to_csv(Path('artifacts') / 'top5_rules_partB.csv', index=False)
    print('Saved top 5 rules to artifacts/top5_rules_partB.csv')

display(rules_sorted)

Using mlxtend Apriori implementation.
Total candidate rules: 66


Unnamed: 0,antecedents,consequents,support,confidence,lift,leverage,conviction
0,"(beer, soda)",(diapers),0.2,0.9,2.53125,0.120988,6.444444
1,(diapers),"(beer, soda)",0.2,0.5625,2.53125,0.120988,1.777778
2,(beer),"(diapers, coffee)",0.222222,0.588235,2.406417,0.129877,1.834921
3,"(diapers, coffee)",(beer),0.222222,0.909091,2.406417,0.129877,6.844444
4,"(diapers, tea)",(beer),0.2,0.9,2.382353,0.116049,6.222222


Saved top 5 rules to artifacts/top5_rules_partB.csv


# 4. Analysis
A representative high-lift rule such as {bread} → {milk} implies that the presence of bread meaningfully increases the probability that milk appears in the same basket relative to baseline frequency. Practically, retailers can exploit this by: (1) cross-promoting milk near bread aisles, (2) bundling discounts to increase average basket value, and (3) ensuring synchronized replenishment to avoid stockouts that would reduce rule utility. Lift’s normalization over marginal supports helps filter out spurious popularity-driven associations. Still, rules must be validated over time: seasonality, promotions, and changing customer habits can erode rule strength. A/B testing recommendations based on the rule (e.g., suggesting milk at online checkout after bread is added) quantifies uplift in conversion. Additionally, combining rules with customer segmentation may personalize which associations to prioritize for distinct shopper cohorts.

In [None]:
# 5. Metadata & Persistence
metadata = {
    'n_transactions': len(transactions),
    'min_support': MIN_SUPPORT,
    'min_confidence': MIN_CONF,
    'rules_found': int(0 if rules is None else (0 if (hasattr(rules, 'empty') and rules.empty) else len(rules))),
    'top5_rules_count': int(len(rules_sorted))
}
dm.save_json(metadata, Path('artifacts') / 'task3b_metadata.json')
print('Saved metadata to artifacts/task3b_metadata.json')
import pandas as pd
display(pd.DataFrame([metadata]))

Saved metadata to artifacts/task3b_metadata.json


Unnamed: 0,n_transactions,min_support,min_confidence,rules_found,top5_rules_count
0,45,0.2,0.5,66,5


### Part B Complete
Generated transactions, mined rules with Apriori (or fallback), exported top 5 by lift, and provided actionable analysis.