# Association Rule - Simplified Version

###What is an Association Rule?
Imagine you work in a grocery store. You notice that whenever someone buys bread, they often buy butter too. An association rule is a way for computers to find these patterns in lots of shopping data. The rule is written like this:

***If a customer buys bread, then they are likely to buy butter.***

In computer terms, we write it as:
Bread → Butter

**Why is this useful?**

Stores use these rules to put bread and butter close together, or to suggest butter when you buy bread online. This helps them sell more and makes shopping easier for you

# Apriori Algorithm - Simplified Version

###What is the Apriori Algorithm?

The Apriori Algorithm is a clever way for computers to find these association rules quickly, even in huge lists of shopping data. It works step by step:

* Step 1: Look for items that are bought often (like bread, milk, or butter).

* Step 2: Combine these to see which pairs (or triples) are often bought together.

* Step 3: If a group of items (like bread and butter) is bought often, any smaller group inside it (like just bread) must also be bought often. This helps the computer skip checking groups that aren’t popular.


**Real-life Example:**

Let’s say in a week, many people buy popcorn, milk, and cereal together. The Apriori algorithm finds that not only is "popcorn, milk, cereal" a popular combo, but also "popcorn, milk" and "milk, cereal" are popular pairs. So, if a customer buys popcorn and milk, the store can recommend cereal.

## Why is it called "Apriori"?
Because it uses the idea that if a big group is popular, all the smaller groups inside it must be popular too. This saves a lot of time for the computer

# What is Market Basket Analysis

Market Basket Analysis is like being a detective for shopping carts. It helps stores figure out what products people like to buy together. The goal is to find patterns, like:

* "People who buy chips also buy soda."

* "If someone buys shampoo, they often buy conditioner too."

#Sample Coding Trial

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Example dataset: Each list is a shopping basket (transaction)
dataset = [
    ['milk', 'bread', 'butter'],
    ['bread', 'diapers', 'beer', 'eggs'],
    ['milk', 'diapers', 'beer', 'cola'],
    ['bread', 'milk', 'diapers', 'beer'],
    ['bread', 'milk', 'diapers', 'cola']
]

# Step 1: Convert dataset into a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 2: Apply Apriori to find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

# Step 3: Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Step 4: Display the rules
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

  antecedents consequents  support  confidence    lift
0      (beer)   (diapers)      0.6        1.00  1.2500
1   (diapers)      (beer)      0.6        0.75  1.2500
2   (diapers)     (bread)      0.6        0.75  0.9375
3     (bread)   (diapers)      0.6        0.75  0.9375
4      (milk)     (bread)      0.6        0.75  0.9375
5     (bread)      (milk)      0.6        0.75  0.9375
6   (diapers)      (milk)      0.6        0.75  0.9375
7      (milk)   (diapers)      0.6        0.75  0.9375


| Term           | Meaning                                                                                   | How to Interpret the Value                                                                                                               |
|----------------|------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| **Antecedents**| The "if" part of the rule (the item(s) you start with).                                  | If a rule says "bread → butter", then "bread" is the antecedent.                                                                        |
| **Consequents**| The "then" part of the rule (the item(s) that may follow).                               | In "bread → butter", "butter" is the consequent.                                                                                        |
| **Support**    | How often the combination (antecedent + consequent) appears in all transactions.          | High support (e.g., 0.10 or 10%) means this combo is common. Low support means it’s rare.                                               |
| **Confidence** | How often the consequent appears when the antecedent is present (conditional probability).| High confidence (e.g., 0.80 or 80%) means that when you see the antecedent, the consequent is very likely to also be present.           |
| **Lift**       | How much more likely the consequent is to appear with the antecedent than by chance.      | Lift > 1: strong positive association; Lift ≈ 1: no association; Lift < 1: negative association.                                        |


# With out Built-In Functions

## Generate Synthetic Market Basket Data

In [None]:
import random

items = ['bread', 'butter', 'milk', 'eggs', 'cheese', 'tea', 'coffee', 'sugar', 'flour', 'rice', 'chocolate', 'cookies', 'juice', 'soda', 'diapers']
num_records = 2000
min_items = 2
max_items = 5

basket_data = []
for _ in range(num_records):
    basket_len = random.randint(min_items, max_items)
    basket = random.sample(items, basket_len)
    basket_data.append(basket)


## Apriori Algorithm Implementation

In [None]:
from collections import defaultdict
from itertools import combinations

min_support = 0.05  # 5%
min_confidence = 0.6
num_transactions = len(basket_data)
min_support_count = min_support * num_transactions

# Helper: Count support for candidate itemsets
def get_support_count(candidates, transactions):
    support_count = defaultdict(int)
    for transaction in transactions:
        transaction_set = set(transaction)
        for candidate in candidates:
            if candidate.issubset(transaction_set):
                support_count[candidate] += 1
    return support_count

# Step 1: Find frequent 1-itemsets
candidate_1_itemsets = [frozenset([item]) for item in items]
support_count_1 = get_support_count(candidate_1_itemsets, basket_data)
frequent_1_itemsets = set()
for itemset, count in support_count_1.items():
    if count >= min_support_count:
        frequent_1_itemsets.add(itemset)

# Step 2: Generate candidates for larger itemsets
def generate_candidates(frequent_itemsets_k_minus_1):
    candidates = set()
    f_list = list(frequent_itemsets_k_minus_1)
    k = len(next(iter(f_list))) + 1
    for i in range(len(f_list)):
        for j in range(i+1, len(f_list)):
            union_set = f_list[i] | f_list[j]
            if len(union_set) == k:
                candidates.add(union_set)
    return candidates

# Step 3: Iteratively find all frequent itemsets
current_frequent_itemsets = frequent_1_itemsets
all_frequent_itemsets = {}
k = 1
while current_frequent_itemsets:
    for itemset in current_frequent_itemsets:
        if itemset not in all_frequent_itemsets:
            all_frequent_itemsets[itemset] = support_count_1[itemset] if k == 1 else 0
    k += 1
    candidates_k = generate_candidates(current_frequent_itemsets)
    if not candidates_k:
        break
    support_count_k = get_support_count(candidates_k, basket_data)
    current_frequent_itemsets = set()
    for itemset, count in support_count_k.items():
        if count >= min_support_count:
            current_frequent_itemsets.add(itemset)
            all_frequent_itemsets[itemset] = count

## Generate Association Rules

In [None]:
rules = []
for itemset in all_frequent_itemsets:
    if len(itemset) > 1:
        for i in range(1, len(itemset)):
            for antecedent in combinations(itemset, i):
                antecedent = frozenset(antecedent)
                consequent = itemset - antecedent
                if antecedent in all_frequent_itemsets:
                    support_itemset = all_frequent_itemsets[itemset] / num_transactions
                    support_antecedent = all_frequent_itemsets[antecedent] / num_transactions
                    confidence = support_itemset / support_antecedent
                    if confidence >= min_confidence:
                        support_consequent = all_frequent_itemsets.get(consequent, 0) / num_transactions if consequent in all_frequent_itemsets else 0
                        lift = confidence / support_consequent if support_consequent > 0 else 0
                        rules.append({
                            'antecedents': antecedent,
                            'consequents': consequent,
                            'support': support_itemset,
                            'confidence': confidence,
                            'lift': lift
                        })

## Display Prominent Rules (Clear Groups)

In [None]:
# Sort and filter rules
prominent_rules = [rule for rule in rules if rule['support'] >= min_support and rule['confidence'] >= 0.75]
prominent_rules = sorted(prominent_rules, key=lambda x: (x['confidence'], x['support']), reverse=True)

# print(f"Number of prominent rules: {len(prominent_rules)}")
for rule in prominent_rules[:10]:  # Show top 10
    ant = '-'.join(sorted(rule['antecedents']))
    cons = '-'.join(sorted(rule['consequents']))
    print(f"Rule: {ant} -> {cons}, Support: {rule['support']:.2f}, Confidence: {rule['confidence']:.2f}, Lift: {rule['lift']:.2f}")

In [None]:
import random
from collections import defaultdict
from itertools import combinations

def generate_market_baskets(num_records=2000, min_items=2, max_items=5):
    items = ['bread', 'butter', 'milk', 'eggs', 'cheese', 'tea', 'coffee', 'sugar', 'flour', 'rice', 'chocolate', 'cookies', 'juice', 'soda', 'diapers']
    basket_data = []
    for _ in range(num_records):
        basket_len = random.randint(min_items, max_items)
        basket = set(random.sample(items, basket_len))
        # Add some intentional associations
        if random.random() < 0.4:
            basket.update(['bread', 'butter'])  # 40% of baskets have both
        if random.random() < 0.3:
            basket.update(['tea', 'sugar'])     # 30% of baskets have both
        basket_data.append(list(basket))
    return basket_data, items

def apriori_runner(min_support=0.05, min_confidence=0.6):
    basket_data, items = generate_market_baskets()
    num_transactions = len(basket_data)
    min_support_count = min_support * num_transactions

    def get_support_count(candidates, transactions):
        support_count = defaultdict(int)
        for transaction in transactions:
            transaction_set = set(transaction)
            for candidate in candidates:
                if candidate.issubset(transaction_set):
                    support_count[candidate] += 1
        return support_count

    candidate_1_itemsets = [frozenset([item]) for item in items]
    support_count_1 = get_support_count(candidate_1_itemsets, basket_data)
    frequent_1_itemsets = set()
    for itemset, count in support_count_1.items():
        if count >= min_support_count:
            frequent_1_itemsets.add(itemset)

    def generate_candidates(frequent_itemsets_k_minus_1):
        candidates = set()
        f_list = list(frequent_itemsets_k_minus_1)
        if not f_list:
            return candidates
        k = len(next(iter(f_list))) + 1
        for i in range(len(f_list)):
            for j in range(i+1, len(f_list)):
                union_set = f_list[i] | f_list[j]
                if len(union_set) == k:
                    candidates.add(union_set)
        return candidates

    current_frequent_itemsets = frequent_1_itemsets
    all_frequent_itemsets = {}
    k = 1
    while current_frequent_itemsets:
        for itemset in current_frequent_itemsets:
            if itemset not in all_frequent_itemsets:
                all_frequent_itemsets[itemset] = support_count_1[itemset] if k == 1 else 0
        k += 1
        candidates_k = generate_candidates(current_frequent_itemsets)
        if not candidates_k:
            break
        support_count_k = get_support_count(candidates_k, basket_data)
        current_frequent_itemsets = set()
        for itemset, count in support_count_k.items():
            if count >= min_support_count:
                current_frequent_itemsets.add(itemset)
                all_frequent_itemsets[itemset] = count

    # Generate rules
    rules = []
    for itemset in all_frequent_itemsets:
        if len(itemset) > 1:
            for i in range(1, len(itemset)):
                for antecedent in combinations(itemset, i):
                    antecedent = frozenset(antecedent)
                    consequent = itemset - antecedent
                    if antecedent in all_frequent_itemsets:
                        support_itemset = all_frequent_itemsets[itemset] / num_transactions
                        support_antecedent = all_frequent_itemsets[antecedent] / num_transactions
                        confidence = support_itemset / support_antecedent
                        if confidence >= min_confidence:
                            support_consequent = all_frequent_itemsets.get(consequent, 0) / num_transactions if consequent in all_frequent_itemsets else 0
                            lift = confidence / support_consequent if support_consequent > 0 else 0
                            rules.append({
                                'antecedents': antecedent,
                                'consequents': consequent,
                                'support': support_itemset,
                                'confidence': confidence,
                                'lift': lift
                            })

    # Sort and filter rules
    prominent_rules = [rule for rule in rules if rule['support'] >= min_support and rule['confidence'] >= 0.75]
    prominent_rules = sorted(prominent_rules, key=lambda x: (x['confidence'], x['support']), reverse=True)

    print(f"Number of prominent rules: {len(prominent_rules)}")
    for rule in prominent_rules[:10]:  # Show top 10
        ant = '-'.join(sorted(rule['antecedents']))
        cons = '-'.join(sorted(rule['consequents']))
        print(f"Rule: {ant} -> {cons}, Support: {rule['support']:.2f}, Confidence: {rule['confidence']:.2f}, Lift: {rule['lift']:.2f}")

# To run:
apriori_runner(min_support=0.05, min_confidence=0.6)


Number of prominent rules: 34
Rule: bread-milk-sugar -> butter, Support: 0.05, Confidence: 0.85, Lift: 1.56
Rule: bread-coffee -> butter, Support: 0.10, Confidence: 0.84, Lift: 1.56
Rule: butter-flour -> bread, Support: 0.10, Confidence: 0.84, Lift: 1.57
Rule: bread-milk -> butter, Support: 0.11, Confidence: 0.84, Lift: 1.56
Rule: butter-chocolate -> bread, Support: 0.10, Confidence: 0.84, Lift: 1.56
Rule: bread-juice -> butter, Support: 0.12, Confidence: 0.84, Lift: 1.55
Rule: bread-diapers -> butter, Support: 0.10, Confidence: 0.83, Lift: 1.54
Rule: butter-milk-sugar -> bread, Support: 0.05, Confidence: 0.83, Lift: 1.55
Rule: butter-juice -> bread, Support: 0.12, Confidence: 0.83, Lift: 1.54
Rule: bread-rice -> butter, Support: 0.10, Confidence: 0.83, Lift: 1.53
