# ðŸ›’ Association Rules Mining on Online Retail Transactions
## Background & Scenario
Imagine you are a Data Analyst at an e-commerce company. Management wants to understand customer buying patterns to improve product recommendations and cross-selling strategies. You have been provided with a dataset where each row represents a transaction (basket of items purchased together).
Your task is to apply Association Rule Mining to discover relationships between items.


## Objective
â€¢	Pre-process the dataset into a suitable format.

â€¢	Apply the Apriori Algorithm to extract frequent itemsets.

â€¢	Generate Association Rules with support, confidence, and lift.

â€¢	Interpret the rules to gain insights into customer behavior.


##  1. Import Required Libraries

In [1]:

# 1. Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

sns.set(style="whitegrid", palette="muted", font_scale=1.1)


## 2. Load & Explore Dataset

In [5]:

# 2. Load Dataset
# Your file has 1 column with transactions (comma-separated items)
file_path="D:\Data sciences\Assignments\Assignment files\Assignment files Extracs\Association Rules\Online retail.xlsx"
df = pd.read_excel(file_path, header=None)

# Convert each row into a list of items
transactions = df[0].apply(lambda x: x.split(",")).tolist()

print("Sample Transaction")
print(transactions[0])


Sample Transaction
['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']


âœ… Each row is a basket of purchased items.

In [7]:

# 3. Convert to Basket Format using TransactionEncoder
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)

basket = pd.DataFrame(te_array, columns=te.columns_)

print(" Basket Format Created")
print(basket.head())


 Basket Format Created
    asparagus  almonds  antioxydant juice  asparagus  avocado  babies food  \
0       False     True               True      False     True        False   
1       False    False              False      False    False        False   
2       False    False              False      False    False        False   
3       False    False              False      False     True        False   
4       False    False              False      False    False        False   

   bacon  barbecue sauce  black tea  blueberries  ...  turkey  vegetables mix  \
0  False           False      False        False  ...   False            True   
1  False           False      False        False  ...   False           False   
2  False           False      False        False  ...   False           False   
3  False           False      False        False  ...    True           False   
4  False           False      False        False  ...   False           False   

   water spray  white

âœ… Matrix created â†’ Rows = Transactions, Columns = Items (True/False).

## 4. Generate Frequent Itemsets (Apriori)


In [8]:

# 4. Generate Frequent Itemsets (Apriori)
frequent_itemsets = apriori(basket, min_support=0.02, use_colnames=True)

print(" Frequent Itemsets Extracted")
print(frequent_itemsets.sort_values(by="support", ascending=False).head(10))


 Frequent Itemsets Extracted
     support             itemsets
34  0.238368      (mineral water)
13  0.179709               (eggs)
44  0.174110          (spaghetti)
17  0.170911       (french fries)
9   0.163845          (chocolate)
24  0.132116          (green tea)
33  0.129583               (milk)
25  0.098254        (ground beef)
22  0.095321  (frozen vegetables)
38  0.095054           (pancakes)


âœ… Mineral water, eggs, spaghetti, and french fries are the most common products.

## 5. Generate Association Rules


In [9]:

# 5. Generate Association Rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

rules = rules.sort_values(by="confidence", ascending=False)

print(" Association Rules Generated")
print(rules.head(10))


 Association Rules Generated
            antecedents      consequents  antecedent support  \
79               (soup)  (mineral water)            0.050527   
72          (olive oil)  (mineral water)            0.065858   
62        (ground beef)  (mineral water)            0.098254   
65        (ground beef)      (spaghetti)            0.098254   
28        (cooking oil)  (mineral water)            0.051060   
11            (chicken)  (mineral water)            0.059992   
54  (frozen vegetables)  (mineral water)            0.095321   
68               (milk)  (mineral water)            0.129583   
83           (tomatoes)  (mineral water)            0.068391   
75           (pancakes)  (mineral water)            0.095054   

    consequent support   support  confidence      lift  representativity  \
79            0.238368  0.023064    0.456464  1.914955               1.0   
72            0.238368  0.027596    0.419028  1.757904               1.0   
62            0.238368  0.040928    0.

## 6. Rule Interpretation

In [10]:

#  6. Example Interpretation
for idx, row in rules.head(5).iterrows():
    print(f"Rule: {set(row['antecedents'])} â†’ {set(row['consequents'])}")
    print(f" - Support: {row['support']:.3f}")
    print(f" - Confidence: {row['confidence']:.3f}")
    print(f" - Lift: {row['lift']:.3f}")
    print("")


Rule: {'soup'} â†’ {'mineral water'}
 - Support: 0.023
 - Confidence: 0.456
 - Lift: 1.915

Rule: {'olive oil'} â†’ {'mineral water'}
 - Support: 0.028
 - Confidence: 0.419
 - Lift: 1.758

Rule: {'ground beef'} â†’ {'mineral water'}
 - Support: 0.041
 - Confidence: 0.417
 - Lift: 1.748

Rule: {'ground beef'} â†’ {'spaghetti'}
 - Support: 0.039
 - Confidence: 0.399
 - Lift: 2.291

Rule: {'cooking oil'} â†’ {'mineral water'}
 - Support: 0.020
 - Confidence: 0.394
 - Lift: 1.654



### Example rules from our output:

### â€¢	Rule 1: {Soup} â†’ {Mineral Water}

o	Support = 0.023 â†’ 2.3% of transactions include both.

o	Confidence = 0.456 â†’ If someone buys soup, 45.6% chance they also buy mineral water.

o	Lift = 1.915 â†’ Soup buyers are 1.9x more likely to buy mineral water.


### â€¢	Rule 2: {Ground Beef} â†’ {Spaghetti}

o	Support = 0.039 â†’ 3.9% of transactions include both.

o	Confidence = 0.399 â†’ 39.9% of ground beef buyers also buy spaghetti.

o	Lift = 2.291 â†’ Strong positive association â†’ Spaghetti and ground beef go together.


## 7. Interview Questions
### Q1. What is Lift and why is it important?

â€¢	Lift measures how strongly two items are related.

â€¢	Lift > 1 = positive association (they go together).

â€¢	Lift = 1 = no relation.

â€¢	Lift < 1 = negative association.

â€¢	Important because it tells us whether a rule is meaningful or just coincidence.


### Q2. What is Support and Confidence?

â€¢	Support(Aâ†’B): Probability of transactions containing both A and B.

â€¢	Confidence(Aâ†’B): Probability of B given A.

â€¢	Example: If 100 baskets, and 20 have milk & bread â†’ Support = 20/100 = 0.2.
If 25 baskets have milk, and 20 of them also have bread â†’ Confidence = 20/25 = 0.8.


### Q3. What are some limitations of Association Rules?

â€¢	Too many rules â†’ hard to interpret.

â€¢	Rare item problem â†’ low-frequency items ignored.

â€¢	Computationally expensive on big datasets.

â€¢	Not all rules are useful in practice.


## ðŸ“Œ Conclusion
â€¢	Mineral water is the most common item â†’ often appears with many products.

â€¢	Ground beef & spaghetti form a strong pair â†’ useful for cross-selling (recipe bundles).

â€¢	Rules provide valuable insights for recommendation systems and marketing campaigns.
