## MLSD: Apriori Algorithm to find frequent 1-itemsets and 2-itemsets
#### Maria Rafaela Alves Abrunhosa 107658
**3rd April 2025**

Write a python program that implements the Apriori algorithm, limited to finding frequent 1-itemsets and 2-itemsets from the list of transactions.

*Note:* Use only basic python data types (such as lists, tuples, and sets). You can use *intertools.combinations()* to helo with 2-itemset generation.

Follow the next steps:
- **1st:** count element occurrence
- **2nd:** calculate element support
    - frequent 1-itemset
- **3rd:** generate pairs of items out of frequent 1-itemset
- **4th:** calculate 2-element (pair) support
    - frequent2-itemsets

Calculate the pair confidence and the association rules.

In [9]:
from itertools import combinations

transactions = [
    ['milk', 'bread', 'nuts', 'apple'],
    ['milk', 'bread', 'nuts'],
    ['milk', 'bread'],
    ['milk', 'apple'],
    ['bread', 'apple']
]

minSupport = 0.6
minConfidence = 0.7

In [12]:
elementCount = {}
elementSupport = {}
frequent1Itemset = {}
pairCount = {}
pairSupport = {}
frequent2Itemset = {}

# apriori algorithm - 1st step
for t in transactions:
    for element in t:
        if element not in elementCount:
            elementCount[element] = 0
        elementCount[element] += 1

# calculate the support - fraction of transactions that contain and itemset
for element, count in elementCount.items():
    if element not in elementSupport:
        elementSupport[element] = 0
    elementSupport[element] = count / len(transactions)

for element, support in elementSupport.items():
    if support >= minSupport:
        frequent1Itemset[element] = support

# 2nd step
for t in transactions:
    for pair in combinations(t, 2): # generate all possible pairs of transactions
        pair = tuple(sorted(pair)) # only to avoid duplications
        if pair not in pairCount:
            pairCount[pair] = 0
        pairCount[pair] += 1
    
# calculate the support for 2-itemsets
for pair, count in pairCount.items():
    if pair not in pairSupport:
        pairSupport[pair] = 0
    pairSupport[pair] = count / len(transactions)

for pair, support in pairSupport.items():
    if support >= minSupport:
        frequent2Itemset[pair] = support

print("Frequent 1-itemsets:", frequent1Itemset)
print("Frequent 2-itemsets:", frequent2Itemset)

Frequent 1-itemsets: {'milk': 0.8, 'bread': 0.8, 'apple': 0.6}
Frequent 2-itemsets: {('bread', 'milk'): 0.6}


We calculate the confidence for 2-itemsets since even with a high support (both being frequent), it is not a garantee that one element leads to another.
The confidence quantifies that direct relation.

In [19]:
# calculate confidence for 2-itemsets
# print(pairSupport)
confidenceValues = {}
for pair, support in frequent2Itemset.items():
    a, b = pair
    # print(pair)
    confidenceValues[pair] = {
        f"{a} -> {b}": round(support / elementSupport[a], 2),
        f"{b} -> {a}": round(support / elementSupport[b], 2)
    }

# filter rules based on minConfidence
filteredRules = {}
for pair, confidences in confidenceValues.items():
    filteredRules[pair] = {rule: conf for rule, conf in confidences.items() if conf >= minConfidence}

print("Filtered Rules (Confidence >= min_confidence):", filteredRules)


Filtered Rules (Confidence >= min_confidence): {('bread', 'milk'): {'bread -> milk': 0.75, 'milk -> bread': 0.75}}
