#Demonstration - Market Basket Analysis

##Scenario
The retail store manager is unsure how to arrange products on shelves to increase cross-selling.

- For example: If a customer buys bread, are they also likely to buy butter or jam?

- We need to analyze transactional data and find associations among products so that related items can be placed close together on shelves.



##Step 1: Import Libraries

- We need pandas for data handling, mlxtend for association rule algorithms, itertools for Eclat, and ast to parse string lists from the dataset.

In [17]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules, fpgrowth
import itertools
import ast

##Step 2: Load Dataset

- This reads the transaction data from a CSV file. Each row represents a customer transaction containing multiple items.



In [18]:
df_raw = pd.read_csv("market_basket_transactions.csv")
df_raw.head()

Unnamed: 0,Transaction
0,"['nuts', 'crackers']"
1,"['tea', 'coffee', 'water', 'soda']"
2,"['tissue', 'sponges', 'bleach']"
3,"['eggs', 'milk', 'bread']"
4,"['coffee', 'soda', 'juice', 'water']"


##Step 3: Convert String to Transactions

- The CSV stores items as strings like "['bread', 'milk']". We use ast.literal_eval() to convert those into actual Python lists for processing.

In [19]:
transactions = df_raw['Transaction'].apply(ast.literal_eval).tolist()
transactions[:5]

[['nuts', 'crackers'],
 ['tea', 'coffee', 'water', 'soda'],
 ['tissue', 'sponges', 'bleach'],
 ['eggs', 'milk', 'bread'],
 ['coffee', 'soda', 'juice', 'water']]

##Step 4: One-Hot Encode Transactions

- Association rule algorithms need binary format: each row is a transaction, and each column is an item (True/False if present). TransactionEncoder handles that conversion.

In [20]:
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_array, columns=te.columns_)
df.head()

Unnamed: 0,bleach,bread,butter,cheese,chips,chocolate,coffee,cookies,crackers,detergent,...,pasta,salad,sauce,soap,soda,sponges,tea,tissue,water,wine
0,False,False,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,True,False,False,False,...,False,False,False,False,True,False,True,False,True,False
2,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,True,False,False
3,False,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,True,False,False,False,...,False,False,False,False,True,False,False,False,True,False


##Step 5: Apriori Algorithm

- We apply Apriori to find frequent itemsets (min 5% support), and generate rules (confidence ≥ 50%) that help us identify strong associations for shelf placement.

In [21]:
frequent_ap = apriori(df, min_support=0.05, use_colnames=True)
rules_ap = association_rules(frequent_ap, metric="confidence", min_threshold=0.5)
rules_ap.sort_values(by="confidence", ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
29,"(coffee, water)",(soda),0.064,0.148,0.052,0.8125,5.489865,1.0,0.042528,4.544,0.873767,0.325,0.77993,0.581926
15,(soda),(coffee),0.148,0.164,0.112,0.756757,4.61437,1.0,0.087728,3.436889,0.919349,0.56,0.709039,0.719842
30,"(water, soda)",(coffee),0.072,0.164,0.052,0.722222,4.403794,1.0,0.040192,3.0096,0.832891,0.282609,0.66773,0.519648
14,(coffee),(soda),0.164,0.148,0.112,0.682927,4.61437,1.0,0.087728,2.687077,0.936945,0.56,0.627848,0.719842
19,(jam),(eggs),0.1,0.144,0.064,0.64,4.444444,1.0,0.0496,2.377778,0.861111,0.355556,0.579439,0.542222


##Step 6: FP-Growth Algorithm

- FP-Growth is a faster alternative to Apriori. It uses a prefix-tree (FP-tree) to compress the dataset and avoids generating candidate itemsets. Same goal, but better for big data.

In [22]:
frequent_fp = fpgrowth(df, min_support=0.05, use_colnames=True)
rules_fp = association_rules(frequent_fp, metric="confidence", min_threshold=0.5)
rules_fp.sort_values(by="confidence", ascending=False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
4,"(coffee, water)",(soda),0.064,0.148,0.052,0.8125,5.489865,1.0,0.042528,4.544,0.873767,0.325,0.77993,0.581926
2,(soda),(coffee),0.148,0.164,0.112,0.756757,4.61437,1.0,0.087728,3.436889,0.919349,0.56,0.709039,0.719842
5,"(water, soda)",(coffee),0.072,0.164,0.052,0.722222,4.403794,1.0,0.040192,3.0096,0.832891,0.282609,0.66773,0.519648
1,(coffee),(soda),0.164,0.148,0.112,0.682927,4.61437,1.0,0.087728,2.687077,0.936945,0.56,0.627848,0.719842
29,(jam),(eggs),0.1,0.144,0.064,0.64,4.444444,1.0,0.0496,2.377778,0.861111,0.355556,0.579439,0.542222


##Step 7: Eclat Algorithm (Custom Implementation)

- Since mlxtend doesn't support Eclat, we implement it manually. Eclat works by checking the intersection of items — if a combination appears in at least 5% of transactions, it's considered frequent.

In [23]:
def get_support(itemset, df):
    return df[list(itemset)].all(axis=1).mean()

items = df.columns.tolist()
eclat_results = []

for i in range(1, 3):  # 2-itemsets and 3-itemsets
    for combo in itertools.combinations(items, i + 1):
        support = get_support(combo, df)
        if support >= 0.05:
            eclat_results.append((combo, support))

eclat_df = pd.DataFrame(eclat_results, columns=["Itemset", "Support"])
eclat_df.sort_values(by="Support", ascending=False).head()


Unnamed: 0,Itemset,Support
19,"(coffee, soda)",0.112
15,"(chocolate, cookies)",0.092
16,"(chocolate, crackers)",0.092
12,"(chips, cookies)",0.088
17,"(chocolate, nuts)",0.088


##Step 8: Compare Outcomes

- We now compare the top rules from each method to see which item combinations occur frequently — these will inform our shelf layout strategy.

In [24]:
print("Top Apriori Rules:\n", rules_ap[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head())
print("\nTop FP-Growth Rules:\n", rules_fp[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head())
print("\nTop Eclat Itemsets:\n", eclat_df.head())

Top Apriori Rules:
    antecedents consequents  support  confidence      lift
0  (detergent)    (bleach)    0.068    0.531250  3.689236
1     (tissue)    (bleach)    0.068    0.566667  3.935185
2       (milk)     (bread)    0.060    0.555556  4.480287
3        (jam)    (butter)    0.056    0.560000  3.888889
4       (wine)    (cheese)    0.068    0.515152  4.154448

Top FP-Growth Rules:
        antecedents consequents  support  confidence      lift
0           (nuts)  (crackers)    0.088    0.511628  2.842377
1         (coffee)      (soda)    0.112    0.682927  4.614370
2           (soda)    (coffee)    0.112    0.756757  4.614370
3          (water)      (soda)    0.072    0.545455  3.685504
4  (coffee, water)      (soda)    0.052    0.812500  5.489865

Top Eclat Itemsets:
                Itemset  Support
0  (bleach, detergent)    0.068
1       (bleach, soap)    0.052
2     (bleach, tissue)    0.068
3      (bread, butter)    0.056
4        (bread, eggs)    0.060


##Step 9: Shelf Placement Insights

- We're not hardcoding shelf suggestions.

- These recommendations come from the actual mined rules, ensuring relevance to the dataset.

- Each rule is translated into clear shelf-placement advice (e.g., "If bread is bought, butter should be nearby").

In [25]:
# Convert frozensets to readable strings
def readable(rule):
    return ', '.join(sorted(list(rule)))

# Select top Apriori rules with high confidence and lift
ap_top = rules_ap.sort_values(by=['confidence', 'lift'], ascending=False).head(5)
print(" Apriori-Based Recommendations:")
for _, row in ap_top.iterrows():
    print(f"- If customer buys [{readable(row['antecedents'])}] ➜ Suggest placing [{readable(row['consequents'])}] nearby")

# Select top FP-Growth rules
fp_top = rules_fp.sort_values(by=['confidence', 'lift'], ascending=False).head(5)
print("\n FP-Growth-Based Recommendations:")
for _, row in fp_top.iterrows():
    print(f"- If customer buys [{readable(row['antecedents'])}] ➜ Suggest placing [{readable(row['consequents'])}] nearby")

# Top Eclat combinations with highest support
print("\n Eclat-Based Frequent Combinations:")
for _, row in eclat_df.sort_values(by='Support', ascending=False).head(5).iterrows():
    items = ', '.join(row['Itemset'])
    print(f"- [{items}] often bought together — keep close on shelves (Support: {row['Support']:.2f})")

 Apriori-Based Recommendations:
- If customer buys [coffee, water] ➜ Suggest placing [soda] nearby
- If customer buys [soda] ➜ Suggest placing [coffee] nearby
- If customer buys [soda, water] ➜ Suggest placing [coffee] nearby
- If customer buys [coffee] ➜ Suggest placing [soda] nearby
- If customer buys [jam] ➜ Suggest placing [eggs] nearby

 FP-Growth-Based Recommendations:
- If customer buys [coffee, water] ➜ Suggest placing [soda] nearby
- If customer buys [soda] ➜ Suggest placing [coffee] nearby
- If customer buys [soda, water] ➜ Suggest placing [coffee] nearby
- If customer buys [coffee] ➜ Suggest placing [soda] nearby
- If customer buys [jam] ➜ Suggest placing [eggs] nearby

 Eclat-Based Frequent Combinations:
- [coffee, soda] often bought together — keep close on shelves (Support: 0.11)
- [chocolate, cookies] often bought together — keep close on shelves (Support: 0.09)
- [chocolate, crackers] often bought together — keep close on shelves (Support: 0.09)
- [chips, cookies] often