<div style="font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; color: #2c3e50; line-height: 1.6; max-width: 900px; margin: auto; border: 1px solid #e1e4e8; border-radius: 15px; background-color: #ffffff; overflow: hidden; box-shadow: 0 10px 30px rgba(0,0,0,0.1);">

<div style="background: linear-gradient(135deg, #1e3c72 0%, #2a5298 100%); padding: 40px 30px; color: white; text-align: center;">
    <h1 style="margin: 0; font-size: 2.8em; font-weight: 800; letter-spacing: -1px;">üõí Association Rule Mining</h1>
    <div style="width: 60px; height: 4px; background: #ffcc00; margin: 20px auto; border-radius: 2px;"></div>
    <p style="font-size: 1.1em; opacity: 0.9; max-width: 600px; margin: auto;">
        Strategic Basket Analysis to Drive Revenue and Optimize Customer Experience
    </p>
</div>

<div style="padding: 30px;">
    
<h3 style="color: #1e3c72; border-bottom: 2px solid #f0f2f5; padding-bottom: 10px; margin-top: 0;">üéØ Analysis Objectives</h3>
    <p>In this phase of the project, we leverage transaction data to decode consumer behavior with two primary goals:</p>
    
<div style="display: flex; gap: 20px; margin: 25px 0;">
        <div style="flex: 1; background: #fff9db; padding: 20px; border-radius: 10px; border-left: 5px solid #fab005;">
            <strong style="color: #862e1b; font-size: 1.1em;">üí∞ Profit Maximization</strong><br>
            <span style="font-size: 0.95em;">Identifying high-value product bundles and cross-selling opportunities to increase Average Order Value (AOV).</span>
        </div>
        <div style="flex: 1; background: #e7f5ff; padding: 20px; border-radius: 10px; border-left: 5px solid #228be6;">
            <strong style="color: #1864ab; font-size: 1.1em;">‚ú® User Experience</strong><br>
            <span style="font-size: 0.95em;">Streamlining the customer journey through intuitive recommendations and intelligent store layouts.</span>
        </div>
    </div>

<h3 style="color: #1e3c72; border-bottom: 2px solid #f0f2f5; padding-bottom: 10px;">üõ†Ô∏è Methodological Framework</h3>
    <p>We will implement and compare four industry-standard models to extract frequent itemsets:</p>
    
    

<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin-top: 15px;">
        <div style="padding: 10px 15px; background: #f8f9fa; border-radius: 6px; font-family: monospace; border: 1px solid #e9ecef;">‚Ä¢ Apriori Algorithm</div>
        <div style="padding: 10px 15px; background: #f8f9fa; border-radius: 6px; font-family: monospace; border: 1px solid #e9ecef;">‚Ä¢ Eclat (Equivalence Class Transformation)</div>
        <div style="padding: 10px 15px; background: #f8f9fa; border-radius: 6px; font-family: monospace; border: 1px solid #e9ecef;">‚Ä¢ FP-Growth (Frequent Pattern)</div>
        <div style="padding: 10px 15px; background: #f8f9fa; border-radius: 6px; font-family: monospace; border: 1px solid #e9ecef;">‚Ä¢ UP-Tree (Utility Pattern)</div>
    </div>

<div style="margin-top: 35px; padding: 15px; background: #f1f3f5; border-radius: 8px; text-align: center; font-style: italic; color: #495057;">
        "Following each model, we will perform a deep-dive into the extracted rules to translate data patterns into actionable business insights."
    </div>

</div>
</div>

We start first with preparing the data for Association Mining Algorithms

<h2>Appriori Algorithm</h2>

Appriori generates frequent itemsets using a level wise approach and prunes combinaisons that do not satisfy the minimum support threshold.





In [None]:
import os
import pandas as pd
from collections import Counter
from itertools import combinations

PROJECT_ROOT = os.path.abspath("..")
PROCESSED_DIR = os.path.join(PROJECT_ROOT, "data", "processed")

transactions_df = pd.read_parquet(
    os.path.join(PROCESSED_DIR, "transactions.parquet")
)

print("Loaded transactions_df:", transactions_df.shape)
transactions_df.head()


In [None]:
#basket_sizes = transactions_df["items"].apply(len)
#basket_sizes.describe(percentiles=[0.5, 0.75, 0.9, 0.95])

In [None]:
item_counter = Counter()

for items in transactions_df["items"]:
    item_counter.update(set(items))

item_freq_df = (
    pd.DataFrame(item_counter.items(), columns=["item", "count"])
    .sort_values("count", ascending=False)
    .reset_index(drop=True)
)

item_freq_df.head(15)


In [None]:
pair_counter = Counter()

for items in transactions_df["items"]:
    unique_items = sorted(set(items))
    for pair in combinations(unique_items, 2):
        pair_counter[pair] += 1

pair_freq_df = (
    pd.DataFrame(pair_counter.items(), columns=["item_pair", "count"])
    .sort_values("count", ascending=False)
    .reset_index(drop=True)
)

pair_freq_df.head(10)


In [None]:
num_transactions = len(transactions_df)

# Map item -> count
item_count_map = dict(item_freq_df[["item", "count"]].values)

rules = []

for (item_a, item_b), pair_count in pair_counter.items():
    support = pair_count / num_transactions

    confidence_a_to_b = pair_count / item_count_map[item_a]
    confidence_b_to_a = pair_count / item_count_map[item_b]

    lift = support / (
        (item_count_map[item_a] / num_transactions) *
        (item_count_map[item_b] / num_transactions)
    )

    rules.append({
        "antecedent": item_a,
        "consequent": item_b,
        "pair_count": pair_count,
        "support": support,
        "confidence": confidence_a_to_b,
        "lift": lift
    })

rules_df = pd.DataFrame(rules)
rules_df.sort_values("lift", ascending=False).head(10)


In [None]:
filtered_rules_df = rules_df[
    (rules_df["support"] >= 0.001) &      # appears in at least 0.1% of orders
    (rules_df["confidence"] >= 0.3) &     # decent implication strength
    (rules_df["lift"] >= 1.5)             # real positive association
].sort_values("lift", ascending=False)

print("Filtered rules:", filtered_rules_df.shape)
filtered_rules_df.head(15)


In [None]:
OUTPUT_DIR = os.path.join(PROJECT_ROOT, "outputs")
os.makedirs(OUTPUT_DIR, exist_ok=True)

filtered_rules_df.to_csv(
    os.path.join(OUTPUT_DIR, "association_rules_named.csv"),
    index=False
)

print("‚úÖ Saved association_rules_named.csv")
