# Q4 - Association Analysis
---

Consider the following market basket transactions shown in the Table below.

Transaction ID | Items Ordered
--- | ---
1 | {Flour, Eggs, Bread}
2 | {Soda, Coffee}
3 | {Flour, Butter, Milk, Eggs}
4 | {Bread, Eggs, Juice, Detergent}
5 | {Bread, Milk, Eggs}
6 | {Eggs, Bread}
7 | {Detergent, Milk}
8 | {Coffee, Soda, Juice}
9 | {Butter, Juice, Bread}
10 | {Milk, Bread, Detergent}

## Import Libraries

In [1]:
from pprint import pprint
import itertools
import math

## (a) How many items are in this data set? What is the maximum size of itemsets that can be extracted from this data set?

### Transaction History

In [2]:
transactions = [(1, frozenset({'Flour', 'Eggs', 'Bread'})),
                (2, frozenset({'Soda', 'Coffee'})),
                (3, frozenset({'Flour', 'Butter', 'Milk', 'Eggs'})),
                (4, frozenset({'Bread', 'Eggs', 'Juice', 'Detergent'})),
                (5, frozenset({'Bread', 'Milk', 'Eggs'})),
                (6, frozenset({'Eggs', 'Bread'})),
                (7, frozenset({'Detergent', 'Milk'})),
                (8, frozenset({'Coffee', 'Soda', 'Juice'})),
                (9, frozenset({'Butter', 'Juice', 'Bread'})),
                (10, frozenset({'Milk', 'Bread', 'Detergent'})),
               ]

### Count of Items

In [3]:
items = set()
for i, x in transactions:
    items.update(x)
print('Total items in the data set - {}'.format(len(items)))
print('The items are - {}'.format(items))

Total items in the data set - 9
The items are - {'Juice', 'Flour', 'Detergent', 'Eggs', 'Bread', 'Milk', 'Soda', 'Butter', 'Coffee'}


### Itemsets with Support ≥ 1

In [4]:
itemsets = {}
for i in range(1, len(items)+1):
    for x in itertools.combinations(items, i):
        for _, y in transactions:
            if set(x).issubset(y):
                if frozenset(x) not in itemsets:
                    itemsets[frozenset(x)]=0
                itemsets[frozenset(x)]+=1
print('Total itemsets with support ≥ 1 are - {}'.format(len(itemsets)))

Total itemsets with support ≥ 1 are - 44


In [5]:
print('Maximum Size of Itemset - {}'.format(max([len(x) for x in itemsets.keys()])))

Maximum Size of Itemset - 4


**There are 9 unique items, and 44 itemsets with with support ≥ 1, with maximum number of items in an itemset being 4.**

## (b) What is the maximum number of association rules that can be extracted from this data (including rules that have zero support)?

In [6]:
cnt=math.pow(3,len(items))-math.pow(2,len(items)+1)+1
print('The maximum number of association rules that can be extracted are {:.0f}'.format(cnt))

The maximum number of association rules that can be extracted are 18660


## (c) What is the maximum number of 2-itemsets that can be derived from this data set (including those have zero support)?

In [7]:
print('The maximum number of 2-itemsets that can be derived from this data set are {:.0f}'.format(
    math.factorial(len(items))/(math.factorial(len(items)-2)*2)))

The maximum number of 2-itemsets that can be derived from this data set are 36


## (d) Find an itemset (of size 2 or larger) that has the largest support.

In [8]:
max_cnt=0
max_set=None
for x,cnt in itemsets.items():
    if len(x)>=2 and cnt>max_cnt:
        max_cnt=cnt
        max_set=x
print('The itemset of size 2 or larger with maximum support is -> {' + ', '.join(max_set) + '}')

The itemset of size 2 or larger with maximum support is -> {Bread, Eggs}


## (e) Given minconf = 0.5, find two pairs of items, a and b, such that the rules {a} -> {b} and {b} -> {a} have the same confidence, and their confidence is greater than or equal to the minconf threshold.

In [9]:
s = set(itemsets.keys())
for x, y in itertools.combinations(s, 2):
    union = x.union(y)
    if union in itemsets and itemsets[x]==itemsets[y] and itemsets[union]/itemsets[x]>=0.5:
        print('Confidence -> {:.2f}\t'.format(itemsets[union]/itemsets[x]),
              '[{}]'.format(', '.join(x)),
              '[{}]'.format(', '.join(y)))

Confidence -> 1.00	 [Butter, Eggs] [Milk, Butter, Eggs]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Milk, Eggs]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Milk, Butter, Eggs]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Milk]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Butter, Eggs]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Butter]
Confidence -> 1.00	 [Butter, Eggs] [Flour, Milk, Butter]
Confidence -> 1.00	 [Butter, Eggs] [Milk, Butter]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Milk, Eggs]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Milk, Butter, Eggs]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Milk]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Butter, Eggs]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Butter]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Flour, Milk, Butter]
Confidence -> 1.00	 [Milk, Butter, Eggs] [Milk, Butter]
Confidence -> 1.00	 [Eggs, Detergent] [Bread, Juice, Eggs]
Confidence -> 1.00	 [Eggs, Detergent] [Bread, Juice, Detergent]
Confidence -> 1