# Discovery of Frequent Itemsets and Association Rules

The problem of discovering association rules between itemsets in a sales transaction database (a set of baskets) includes the following two sub-problems:

1. Finding frequent itemsets with support at least s;
2. Generating association rules with confidence at least c from the itemsets found in the first step.
Remind that an association rule is an implication X → Y, where X and Y are itemsets such that X∩Y=∅. Support of the rule X → Y is the number of transactions that contain X⋃Y. Confidence of the rule X → Y the fraction of transactions containing X⋃Y in all transactions that contain X.

You are to solve the first sub-problem: to implement the A-Priori algorithm for finding frequent itemsets with support at least s in a dataset of sales transactions. Remind that support of an itemset is the number of transactions containing the itemset. To test and evaluate your implementation, write a program that uses your A-Priori algorithm implementation to discover frequent itemsets with support at least s in a given dataset of sales transactions.

The sale transaction dataset includes generated transactions (baskets) of hashed items (see Canvas).

In [1]:
baskets = [i.strip().split() for i in open("T10I4D100K.dat").readlines()]

In [2]:
transactions = {} # Dictionary with transaction ID as key, and basket as value
count = 0
for basket in baskets:
    count += 1
    transactions[count] = basket

In [3]:
items = set() # Set of items from all baskets
for i in transactions.values():
    for j in i:
        items.add(j) 

In [31]:
# Count the frequency of each item
def freq(items, transactions):
    items_counts = dict() # Dictionary of item and its frequency
    for i in items: # Check for every item
        temp_i = {i}
        for j in transactions.items(): # and basket
            if temp_i.issubset(set(j[1])): # if item is in basket
                if i in items_counts:
                    items_counts[i] += 1 # If already spotted, add 1 to count
                else:
                    items_counts[i] = 1 # If not spotted yet, set count to 1
    return items_counts

In [5]:
items_counts = freq(items, transactions)

In [6]:
def support(items_counts, transactions):
    support = dict()
    for i in items_counts:
        support[i] = items_counts[i]/len(transactions) # Support = #transactions in which item appears/#total transactions
    return support

In [7]:
min_support = 0.05
items_atleast_min_support = [{j[0]:j[1] for j in support(items_counts, transactions).items() if j[1]>=min_support}]

In [8]:
items_atleast_min_support

[{'529': 0.07057,
  '684': 0.05408,
  '419': 0.05057,
  '722': 0.05845,
  '829': 0.0681,
  '766': 0.06265,
  '354': 0.05835,
  '494': 0.05102,
  '217': 0.05375,
  '368': 0.07828}]

Optional task for extra bonus: Solve the second sub-problem, i.e., develop and implement an algorithm for generating association rules between frequent itemsets discovered by using the A-Priori algorithm in a dataset of sales transactions. The rules must have support at least s and confidence at least c, where s and c are given as input parameters.