# Association Rule Mining – Apriori Algorithm

---

## 1. Definition
Association Rule Mining is an **unsupervised learning technique** used to find **relationships or patterns** between items in large datasets, often used in **market basket analysis**.  

- Example: "Customers who buy **bread and butter** also often buy **jam**."

The **Apriori Algorithm** is one of the most popular algorithms for mining **frequent itemsets** and generating **association rules**.

---

## 2. Key Concepts
1. **Itemset**: A collection of one or more items (e.g., {milk, bread}).  
2. **Support**: The fraction of transactions containing the itemset.  
   \[
   \text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total transactions}}
   \]  
3. **Confidence**: How often items in B appear in transactions that contain A.  
   \[
   \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)}
   \]  
4. **Lift**: Measures how much more likely B is bought when A is bought, compared to random chance.  
   \[
   \text{Lift}(A \rightarrow B) = \frac{\text{Confidence}(A \rightarrow B)}{\text{Support}(B)}
   \]

---

## 3. Apriori Algorithm Steps
1. **Generate candidate itemsets** of length 1.  
2. **Prune itemsets** that do not satisfy minimum support.  
3. Generate candidate itemsets of length k from frequent itemsets of length k-1.  
4. Repeat until no more frequent itemsets are found.  
5. Generate **association rules** from frequent itemsets using minimum confidence threshold.

---

## 4. Applications
- Market basket analysis (retail, e-commerce)  
- Recommendation systems  
- Cross-selling and upselling strategies  
- Web usage mining (pages frequently visited together)

---

## 5. Advantages
- Simple and interpretable rules  
- Finds hidden patterns in transactional data  

---

## 6. Limitations
- Can generate a **large number of rules** for big datasets  
- Sensitive to **support and confidence thresholds**  
- Only finds **frequent patterns**; rare but important patterns may be missed  

---

## 7. Implementation in Python
- Libraries: `mlxtend` provides `apriori` and `association_rules` functions.  
- Steps:
  1. Convert transactional data to **one-hot encoded format**  
  2. Use `apriori()` to find frequent itemsets  
  3. Use `association_rules()` to extract rules with **confidence, support, lift**


In [1]:
# ==============================
# Association Rule Mining – Apriori Algorithm
# ==============================

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# 1. Create a small transactional dataset
transactions = [
    ['milk', 'bread', 'butter'],
    ['bread', 'butter'],
    ['milk', 'bread'],
    ['milk', 'bread', 'butter', 'jam'],
    ['bread', 'butter', 'jam'],
    ['milk', 'bread', 'butter'],
    ['bread', 'jam'],
    ['milk', 'bread', 'butter']
]

# 2. Convert transactions to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

print("One-hot encoded transaction data:\n")
print(df)

# 3. Apply Apriori algorithm to find frequent itemsets (min_support=0.5)
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print("\nFrequent Itemsets:\n")
print(frequent_itemsets)

# 4. Generate association rules (min_threshold=0.7 for confidence)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("\nAssociation Rules:\n")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

# 5. Optional: Sort rules by lift
rules_sorted = rules.sort_values(by='lift', ascending=False)
print("\nRules sorted by lift:\n")
print(rules_sorted[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


One-hot encoded transaction data:

   bread  butter    jam   milk
0   True    True  False   True
1   True    True  False  False
2   True   False  False   True
3   True    True   True   True
4   True    True   True  False
5   True    True  False   True
6   True   False   True  False
7   True    True  False   True

Frequent Itemsets:

   support               itemsets
0    1.000                (bread)
1    0.750               (butter)
2    0.625                 (milk)
3    0.750        (bread, butter)
4    0.625          (milk, bread)
5    0.500         (milk, butter)
6    0.500  (milk, bread, butter)

Association Rules:

      antecedents      consequents  support  confidence      lift
0         (bread)         (butter)    0.750        0.75  1.000000
1        (butter)          (bread)    0.750        1.00  1.000000
2          (milk)          (bread)    0.625        1.00  1.000000
3          (milk)         (butter)    0.500        0.80  1.066667
4   (milk, bread)         (butter)    0.50

  cert_metric = np.where(certainty_denom == 0, 0, certainty_num / certainty_denom)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
