# Frequent Itemset Mining & Association Rules

## **Objective**
 In this activity, students will:
 1. Implement the Apriori Algorithm to mine frequent itemsets.
 2. Generate association rules from the frequent itemsets.
 3. Calculate support, confidence, and lift metrics.
 4. Apply the algorithm on a sample dataset and analyze the results.

#1 Implement the Apriori Algorithm to mine frequent itemsets

In [1]:
from itertools import combinations

def generate_candidates(frequent_itemsets, k):
    """Generate candidate itemsets of size k from frequent itemsets of size k-1"""
    candidates = set()
    frequent_items = list(frequent_itemsets.keys())
    for i in range(len(frequent_items)):
        for j in range(i+1, len(frequent_items)):
            union_set = frequent_items[i] | frequent_items[j]
            if len(union_set) == k:
                candidates.add(frozenset(union_set))
    return candidates


In [2]:
def get_frequent_itemsets(transactions, candidates, min_support):
    """Filter candidates to get frequent itemsets based on support threshold"""
    itemset_counts = {}
    for transaction in transactions:
        for candidate in candidates:
            if candidate.issubset(transaction):
                itemset_counts[candidate] = itemset_counts.get(candidate, 0) + 1
    total_transactions = len(transactions)
    return {itemset: count / total_transactions for itemset, count in itemset_counts.items() if count / total_transactions >= min_support}

In [3]:
def apriori(transactions, min_support):
    """Apriori Algorithm to mine frequent itemsets"""
    single_items = {frozenset([item]) for transaction in transactions for item in transaction}
    frequent_itemsets = get_frequent_itemsets(transactions, single_items, min_support)
    all_frequent_itemsets = frequent_itemsets.copy()
    k = 2
    while frequent_itemsets:
        candidates = generate_candidates(frequent_itemsets, k)
        frequent_itemsets = get_frequent_itemsets(transactions, candidates, min_support)
        all_frequent_itemsets.update(frequent_itemsets)
        k += 1
    return all_frequent_itemsets

In [4]:
def generate_association_rules(frequent_itemsets, transactions, min_confidence):
    """Generate association rules from frequent itemsets"""
    rules = []
    for itemset in frequent_itemsets:
        if len(itemset) > 1:
            for i in range(1, len(itemset)):
                for antecedent in combinations(itemset, i):
                    antecedent = frozenset(antecedent)
                    consequent = itemset - antecedent
                    support_itemset = frequent_itemsets[itemset]
                    support_antecedent = frequent_itemsets[antecedent]
                    confidence = support_itemset / support_antecedent
                    if confidence >= min_confidence:
                        rules.append((antecedent, consequent, confidence))
    return rules


In [30]:

# Sample dataset: Market Basket Transactions
transactions = [
    {'Milk', 'Diapers', 'Beer', 'Bread'},
    {'Milk', 'Diapers', 'Beer'},
    {'Milk', 'Diapers'},
    {'Milk', 'Bread'},
    {'Diapers', 'Beer'},
    {'Diapers', 'Bread'},
    {'Beer', 'Bread'},
    {'Beer'},
    {'Bread'},
    {'Milk', 'Bread'},
    {'Milk', 'Diapers', 'Beer', 'Bread'},
    {'Milk', 'Diapers', 'Beer'},
    {'Milk', 'Diapers'},
    {'Milk', 'Bread'},
    {'Diapers', 'Beer'},
]

# Set support and confidence thresholds
min_support = 0.4
min_confidence = 0.6


In [31]:
# Step 1: Mine Frequent Itemsets
frequent_itemsets = apriori(transactions, min_support)
print("Frequent Itemsets:")
for itemset, support in frequent_itemsets.items():
    print(f"{set(itemset)}: {support:.2f}")

Frequent Itemsets:
{'Milk'}: 0.60
{'Bread'}: 0.53
{'Diapers'}: 0.60
{'Beer'}: 0.53
{'Beer', 'Diapers'}: 0.40
{'Milk', 'Diapers'}: 0.40


In [32]:
# Step 2: Generate Association Rules
association_rules = generate_association_rules(frequent_itemsets, transactions, min_confidence)
print("\nAssociation Rules:")
for antecedent, consequent, confidence in association_rules:
    print(f"{set(antecedent)} -> {set(consequent)} (Confidence: {confidence:.2f})")


Association Rules:
{'Beer'} -> {'Diapers'} (Confidence: 0.75)
{'Diapers'} -> {'Beer'} (Confidence: 0.67)
{'Milk'} -> {'Diapers'} (Confidence: 0.67)
{'Diapers'} -> {'Milk'} (Confidence: 0.67)



# **Class Activity Questions:**
 1. Run the provided script and analyze the output. Which itemsets are frequent?
 2. Modify the support threshold and observe how the frequent itemsets change.
 3. Adjust the confidence level and check how the association rules change.
 4. Add new transactions to the dataset and rerun the script. What impact does this have?
 5. What real-world applications of association rule mining can you think of beyond market basket analysis?


1. The frequent itemsets are those that meet/exceed the minimum threshold of 0.4 . The single items that are frequent are milk, bread, beer, diapers. The 2-item frequent itemsets are {Beer,milk},{Beer,diapers},{milk,bread},{milk,diapers}. The 3 item frequent itemset  is {beer,milk,diapers}

2. Modifying the support threshold to 0.6 significantly lowers the number of frequent itemsets.

- {'Milk'}: 0.80
- {'Diapers'}: 0.80
- {'Beer'}: 0.60
- {'Milk', 'Diapers'}: 0.60
- {'Beer', 'Diapers'}: 0.60

3. Leaving the support level at 0.4 and modifying confidence level to 0.9. The assocition rule doesnt change, becasue right now all rules have a confidence of 1, so whether the confidene is 0.75 or 0.9, they confidence of 1 is higher so they remain.



Changing support to 0.4, and confidence to 0.9 only gives us 1 assocaiton rule

{'Beer'} -> {'Diapers'} (Confidence: 1.00)

4. adding the following transactions with support 0.4 and confidence 0.75
- {'Diapers', 'Bread'},
- {'Beer', 'Bread'},
- {'Beer'},
-  {'Bread'},
- {'Milk', 'Bread'},
-  {'Milk', 'Diapers', 'Beer', 'Bread'},
- {'Milk', 'Diapers', 'Beer'},
-  {'Milk', 'Diapers'},
-  {'Milk', 'Bread'},
-  {'Diapers', 'Beer'},


this gives us the following frequent itemsets:

* {'Milk'}: 0.60
* {'Bread'}: 0.53
* {'Diapers'}: 0.60
* {'Beer'}: 0.53
* {'Beer', 'Diapers'}: 0.40
* {'Milk', 'Diapers'}: 0.40

### and the following asociation rule : Association Rules:
{'Beer'} -> {'Diapers'} (Confidence: 0.75)



Lowering confidence threshold to 0.6 give us
Association Rules:
{'Beer'} -> {'Diapers'} (Confidence: 0.75)

{'Diapers'} -> {'Beer'} (Confidence: 0.67)

{'Milk'} -> {'Diapers'} (Confidence: 0.67)

{'Diapers'} -> {'Milk'} (Confidence: 0.67)



Adding more transactions caused some previously frequent itemsets to drop below the threshold while increasing the frequency of others (like Bread and Beer).


Lowering the confidence threshold resulted in more association rules, meaning that some strong but slightly less confident rules became visible.


Some patterns remain consistent, such as the strong association between Beer and Diapers, which is a well-known market basket pattern.

5. Here are some other  real world applications of association rule mining

## Healthcare & Medical Diagnosis
- Disease Prediction: Identifying common symptom-disease relationships (e.g., if a patient has symptoms A and B, they may have disease X).
- Patient Risk Analysis: Detecting factors that contribute to high-risk patients (e.g., diabetes and high blood pressure often leading to heart disease).


## Fraud Detection
- Credit Card Fraud: Identifying unusual spending patterns that may indicate fraudulent activity (e.g., a sudden purchase of high-value items in multiple locations).

## Cybersecurity & Intrusion Detection
- Malware Detection: Identifying combinations of activities or behaviors that indicate potential cyber threats.
- User Authentication: Detecting abnormal login behaviors that deviate from a user’s normal activity.

## Recommendation Systems
- E-commerce & Retail: Suggesting products frequently bought together (e.g., Amazon’s “Customers who bought this also bought…”).

## Social Media & User Behavior Analysis
- Friend Suggestions: Identifying people with similar connections and interactions (e.g., Facebook’s “People You May Know”).
- Content Personalization: Finding common patterns in likes, shares, and comments to suggest relevant posts.