# Data Mining

## EXERCISE: Association Analysis from scratch



### Generate frequent itemsets

Let's find all sets of items with a support greater than some threshold.

We define 4 functions for generating frequent itemsets:
* createC1 - Create first candidate itemsets for k=1
* scanD - Identify itemsets that meet the support threshold
* aprioriGen - Generate the next list of candidates
* apriori - Generate all frequent itemsets

See slides for explanation of these functions.

In [1]:
def createC1(dataset):
    "Create a list of candidate item sets of size one."
    c1 = []
    for transaction in dataset:
        for item in transaction:
            if not [item] in c1:
                c1.append([item])
    c1.sort()
    #frozenset because it will be a key of a dictionary.                         
    return list(map(frozenset, c1))



def scanD(dataset, candidates, min_support):
    "Returns all candidates that meets a minimum support level"
    # example: which candidate meets minimum support? 
    # dataset = [(beer, nut, diaper), (beer, coffee, diaper), (coffee)]
    # candidates = [(beer), (nut), (coffee), (diaper)]
    # min_support = 0.5
    # return ([beer, diaper, coffee])
    sscnt = {}
    for tid in dataset:
        for can in candidates:
            if can.issubset(tid):
                sscnt.setdefault(can, 0)
                sscnt[can] += 1

    num_items = float(len(dataset))
    retlist = []
    support_data = {}
    for key in sscnt:
        support = sscnt[key] / num_items
        if support >= min_support:
            retlist.insert(0, key)
            support_data[key] = support
    return retlist, support_data


def aprioriGen(freq_sets, k):
    "Generate the joint transactions from candidate sets"
    # generate different combination of candidate set of size k
    retList = []
    lenLk = len(freq_sets)
    for i in range(lenLk):
        for j in range(i + 1, lenLk):
            L1 = list(freq_sets[i])[:k - 2]
            L2 = list(freq_sets[j])[:k - 2]
            L1.sort()
            L2.sort()
            if L1 == L2:
                retList.append(freq_sets[i] | freq_sets[j]) # | is set union
    return retList


def apriori(dataset, min_support=0.5):
    "Generate a list of candidate item sets"
    # (A, C, D) 
    # (B, C, E) 
    # (A, B, C, E) 
    # (B, E) 
    C1 = createC1(dataset) # unique items (A, B, C, D, E) 
    D = list(map(set, dataset))
    L1, support_data = scanD(D, C1, min_support) # item in C1 with min_support (E, B, C, A)
    L = [L1]
    k = 2
    while (len(L[k - 2]) > 0):
        Ck = aprioriGen(L[k - 2], k)
        # 1st iter: k = 2
        # Ck -> all candidate sets of size 2 
        # (E, B), (E, C), (E, A), (B, C), (B, A), (C, A)
        Lk, supK = scanD(D, Ck, min_support) # check which of the candidate have support >= min_support
        support_data.update(supK)
        L.append(Lk) # append it to the final list
        k += 1 # next iter: k = 3...

    return L, support_data

### Itemset generation on sample data

In [2]:
MIN_SUPPORT= 0.5

# Sample data
DATASET = [['Mango', 'Onion', 'Apple'], ['Corn', 'Onion', 'Eggs'], ['Mango', 'Corn', 'Onion', 'Eggs'], ['Mango', 'Eggs']]
DATASET = [['A', 'C', 'D'], ['B', 'C', 'E'], ['A', 'B', 'C','E'],['B', 'E']]
print('Dataset in list-of-lists format:\n', DATASET, '\n')

# Generate a first candidate itemsets for k=1
C1 = createC1(DATASET)
print('Initial 1-itemset candidates:\n', C1, '\n')

# Convert data to a list of sets
D = list(map(set, DATASET))
print('Dataset in list-of-sets format:\n', D, '\n')

# Identify items that meet support threshold (0.5)
# Note that {4} isn't here as it only occurs in one transaction.
# Remove it so we don't generate any further candidate itemsets containing {4}.
L1, support_data = scanD(D, C1, MIN_SUPPORT)
print('1-itemsets that appear in at least 50% of transactions:\n', L1, '\n')

# Generate the next list of candidates
print('Next set of candidates:\n', aprioriGen(L1,2), '\n')

# Generate all candidate itemsets
L, support_data = apriori(DATASET, min_support=MIN_SUPPORT)
print('Full list of candidate itemsets:\n', L, '\n')
print('Support values for candidate itemsets:\n', support_data, '\n')

Dataset in list-of-lists format:
 [['A', 'C', 'D'], ['B', 'C', 'E'], ['A', 'B', 'C', 'E'], ['B', 'E']] 

Initial 1-itemset candidates:
 [frozenset({'A'}), frozenset({'B'}), frozenset({'C'}), frozenset({'D'}), frozenset({'E'})] 

Dataset in list-of-sets format:
 [{'A', 'D', 'C'}, {'B', 'C', 'E'}, {'A', 'B', 'C', 'E'}, {'B', 'E'}] 

1-itemsets that appear in at least 50% of transactions:
 [frozenset({'E'}), frozenset({'B'}), frozenset({'C'}), frozenset({'A'})] 

Next set of candidates:
 [frozenset({'B', 'E'}), frozenset({'C', 'E'}), frozenset({'A', 'E'}), frozenset({'B', 'C'}), frozenset({'A', 'B'}), frozenset({'A', 'C'})] 

Full list of candidate itemsets:
 [[frozenset({'E'}), frozenset({'B'}), frozenset({'C'}), frozenset({'A'})], [frozenset({'B', 'C'}), frozenset({'C', 'E'}), frozenset({'B', 'E'}), frozenset({'A', 'C'})], [frozenset({'B', 'C', 'E'})], []] 

Support values for candidate itemsets:
 {frozenset({'A'}): 0.5, frozenset({'C'}): 0.75, frozenset({'B'}): 0.75, frozenset({'E'}): 

### TODO Exploring support thresholds

* Generate frequent itemsets with a support threshold of 0.7
* How many frequent itemsets do we get at 0.7?
* How many do we get at 0.3?
* Do you have datasets that resemble transactions?
* What about the apps/websites you use?

In [3]:
MIN_SUPPORT = 0.3

# Identify items that meet support threshold (0.3) 
# Note that {4} isn't here as it only occurs in one transaction. 
# Remove it so we don't generate any further candidate itemsets containing {4}.
L1, support_data = scanD(D, C1, MIN_SUPPORT) 
print('1-itemsets that appear in at least 30% of transactions:\n', L1, '\n')

# Generate the next list of candidates
print('Next set of candidates:\n', aprioriGen(L1, 2), '\n')

# Generate all candidate itemsets
L, support_data = apriori(DATASET, min_support = MIN_SUPPORT) 
print('Full list of candidate itemsets:\n', L, '\n')
print('Support values for candidate itemsets:\n', support_data, '\n')

MIN_SUPPORT = 0.7

# Identify items that meet support threshold (0.7) 
# Note that {4} isn't here as it only occurs in one transaction. 
# Remove it so we don't generate any further candidate itemsets containing {4}.
L1, support_data = scanD(D, C1, MIN_SUPPORT) 
print('1-itemsets that appear in at least 70% of transactions:\n', L1, '\n')

# Generate the next list of candidates
print('Next set of candidates:\n', aprioriGen(L1, 2), '\n')

# Generate all candidate itemsets
L, support_data = apriori(DATASET, min_support = MIN_SUPPORT) 
print('Full list of candidate itemsets:\n', L, '\n')
print('Support values for candidate itemsets:\n', support_data, '\n')

1-itemsets that appear in at least 30% of transactions:
 [frozenset({'E'}), frozenset({'B'}), frozenset({'C'}), frozenset({'A'})] 

Next set of candidates:
 [frozenset({'B', 'E'}), frozenset({'C', 'E'}), frozenset({'A', 'E'}), frozenset({'B', 'C'}), frozenset({'A', 'B'}), frozenset({'A', 'C'})] 

Full list of candidate itemsets:
 [[frozenset({'E'}), frozenset({'B'}), frozenset({'C'}), frozenset({'A'})], [frozenset({'B', 'C'}), frozenset({'C', 'E'}), frozenset({'B', 'E'}), frozenset({'A', 'C'})], [frozenset({'B', 'C', 'E'})], []] 

Support values for candidate itemsets:
 {frozenset({'A'}): 0.5, frozenset({'C'}): 0.75, frozenset({'B'}): 0.75, frozenset({'E'}): 0.75, frozenset({'A', 'C'}): 0.5, frozenset({'B', 'E'}): 0.75, frozenset({'C', 'E'}): 0.5, frozenset({'B', 'C'}): 0.5, frozenset({'B', 'C', 'E'}): 0.5} 

1-itemsets that appear in at least 70% of transactions:
 [frozenset({'E'}), frozenset({'B'}), frozenset({'C'})] 

Next set of candidates:
 [frozenset({'B', 'E'}), frozenset({'C', 

## *STOP PLEASE. THE FOLLOWING IS FOR THE NEXT EXERCISE. THANKS.*

## Mine association rules

Given frequent itemsets, we can create association rules.

We add three more functions:
* calc_confidence - Identify rules that meet the confidence threshold
* rules_from_conseq - Recursively generate and evaluate candidate rules
* generateRules - Mine all confident association rules

See slides for explanation of functions.

In [4]:
def calc_confidence(freqSet, H, support_data, rules, min_confidence=0.7):
    "Evaluate the rule generated"
    # freqSet: frequent itemset (rule components) 
    # H: possible consequences (RHS of rule) 
    pruned_H = []
    for conseq in H:
        conf = support_data[freqSet] / support_data[freqSet - conseq] # calculate confidence
        if conf >= min_confidence:
            #print(freqSet - conseq, '--->', conseq, 'conf:', conf)
            rules.append((freqSet - conseq, conseq, conf))
            pruned_H.append(conseq)
    # retirm consequences that pass the confidence level         
    return pruned_H


def rules_from_conseq(freqSet, H, support_data, rules, min_confidence=0.7):
    "Generate a set of candidate rules"
    m = len(H[0])
    Hmp1 = createC1(H)
    Hmp1 = calc_confidence(freqSet, Hmp1,  support_data, rules, min_confidence)
    if len(Hmp1) <= len(freqSet):
        if (len(freqSet) > (m + 1)):
            Hmp1 = aprioriGen(H, m + 1)
            Hmp1 = calc_confidence(freqSet, Hmp1,  support_data, rules, min_confidence)
            if len(Hmp1) > 1:
                rules_from_conseq(freqSet, Hmp1, support_data, rules, min_confidence)

def generateRules(L, support_data, min_confidence=0.7):
    """Create the association rules
    L: list of frequent item sets
    support_data: support data for those itemsets
    min_confidence: minimum confidence threshold
    """
    # for example: 
    # L = E, B, C, BE (obtained with min_support = 0.7) 
    # support data = C: 0.75, B: 0.75, E: 0.75, BE: 0.75
    
    rules = []
    for i in range(1, len(L)): # for each frequent itemset (with length greater than 1)
        for freqSet in L[i]:
            H1 = [frozenset([item]) for item in freqSet] # initial consequence candidiate (B, E)      
            if (i > 1): # recursively evaluate
                rules_from_conseq(freqSet, H1, support_data, rules, min_confidence)
            else: # evaluate H1 only
                calc_confidence(freqSet, H1, support_data, rules, min_confidence)
                # append itemset if it meets the min_confidence
    return rules

def print_rules(rules):
    for r in rules:
        print('{} ==> {} (c={})'.format(*r))

### Rule mining on sample data

In [5]:

MIN_CONFIDENCE = 0.7
# Mine association rules
association_rules = generateRules(L, support_data, min_confidence=MIN_CONFIDENCE)
print_rules(association_rules)

frozenset({'E'}) ==> frozenset({'B'}) (c=1.0)
frozenset({'B'}) ==> frozenset({'E'}) (c=1.0)


### TODO Exploring confidence thresholds

* Mine rules with a confidence threshold of 0.7
* How many rules do we get at 0.7?
* How many do we get at 0.3?
* Can we use this for recommendation (e.g., Amazon, Netflix)?

In [6]:
MIN_CONFIDENCE = 0.3
# Mine association rules
association_rules = generateRules(L, support_data, min_confidence = MIN_CONFIDENCE)
print_rules(association_rules)

frozenset({'E'}) ==> frozenset({'B'}) (c=1.0)
frozenset({'B'}) ==> frozenset({'E'}) (c=1.0)


## EXERCISE: mlxtend library and apriori_python

## Association analysis using mlxtend library

In [8]:
#Install the library if it is not available
#!pip install mlxtend
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import pandas as pd
dataset =   [['A', 'C', 'D'], ['B', 'C', 'E'], ['A', 'B', 'C','E'],['B', 'E']]
            
# Convert item lists into transaction data for frequent itemset mining
# http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
oht = TransactionEncoder()
oht_ary = oht.fit(dataset).transform(dataset)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
print (df)           

# http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print('Support values for candidate itemsets:\n', frequent_itemsets, '\n')

 
rules= association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
#rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
#print (rules.as_matrix(columns=['antecedents','consequents','confidence']))
print(rules[{'antecedents','consequents','confidence'}])

       A      B      C      D      E
0   True  False   True   True  False
1  False   True   True  False   True
2   True   True   True  False   True
3  False   True  False  False   True
Support values for candidate itemsets:
    support   itemsets
0     0.50        (A)
1     0.75        (B)
2     0.75        (C)
3     0.75        (E)
4     0.50     (A, C)
5     0.50     (B, C)
6     0.75     (B, E)
7     0.50     (C, E)
8     0.50  (B, C, E) 

  antecedents consequents  confidence
0         (A)         (C)         1.0
1         (B)         (E)         1.0
2         (E)         (B)         1.0
3      (B, C)         (E)         1.0
4      (C, E)         (B)         1.0


## Association analysis using apriori_python library


In [None]:
#!pip install apriori_python
from apriori_python import apriori
dataset =   [['A', 'C', 'D'], ['B', 'C', 'E'], ['A', 'B', 'C','E'],['B', 'E']]
freqItemSet, rules = apriori(dataset, minSup=0.7, minConf=0.0)

for r in rules:
    print('{} ==> {} (c={})'.format(*r)) 

# Load the  supermarket transaction datasets
### Now Lets work on a real Grocery dataset

In [10]:
import csv 
import pprint
file_name = 'Groceries.csv'
data_list = []
with open(file_name, 'r') as f:  #opens PW file
    reader = csv.reader(f)
    # Print every value of every row. 
    for row in reader:
        row_list = []
        for value in row: 
            if len(value.strip()) > 0 and value.strip() != '':
                row_list.append(value.strip())
        data_list.append(row_list)
pprint.pprint(data_list)        

[['pork',
  'sandwich bags',
  'lunch meat',
  'all- purpose',
  'flour',
  'soda',
  'butter',
  'vegetables',
  'beef',
  'aluminum foil',
  'all- purpose',
  'dinner rolls',
  'shampoo',
  'all- purpose'],
 ['shampoo',
  'hand soap',
  'waffles',
  'vegetables',
  'cheeses',
  'mixes',
  'milk',
  'sandwich bags',
  'laundry detergent',
  'dishwashing liquid/detergent',
  'waffles',
  'individual meals',
  'hand soap',
  'vegetables'],
 ['pork',
  'soap',
  'ice cream',
  'toilet paper',
  'dinner rolls',
  'hand soap',
  'spaghetti sauce',
  'milk',
  'ketchup',
  'sandwich loaves',
  'poultry',
  'toilet paper',
  'ice cream',
  'ketchup'],
 ['juice', 'lunch meat', 'soda', 'toilet paper', 'all- purpose'],
 ['pasta',
  'tortillas',
  'mixes',
  'hand soap',
  'toilet paper',
  'vegetables',
  'vegetables',
  'paper towels',
  'vegetables',
  'flour',
  'vegetables',
  'pork',
  'poultry',
  'eggs'],
 ['toilet paper',
  'eggs',
  'toilet paper',
  'vegetables',
  'bagels',
  'dishwa

 ['sugar',
  'mixes',
  'vegetables',
  'poultry',
  'waffles',
  'pork',
  'sandwich bags',
  'toilet paper',
  'all- purpose',
  'tortillas',
  'dinner rolls',
  'hand soap'],
 ['eggs',
  'mixes',
  'lunch meat',
  'fruits',
  'coffee/tea',
  'yogurt',
  'aluminum foil',
  'flour'],
 ['poultry',
  'soap',
  'butter',
  'sandwich bags',
  'beef',
  'milk',
  'all- purpose',
  'pasta',
  'dinner rolls'],
 ['bagels',
  'tortillas',
  'dinner rolls',
  'flour',
  'vegetables',
  'ketchup',
  'juice',
  'toilet paper',
  'milk',
  'all- purpose',
  'spaghetti sauce',
  'shampoo',
  'spaghetti sauce',
  'hand soap'],
 ['waffles',
  'laundry detergent',
  'pasta',
  'toilet paper',
  'juice',
  'sugar',
  'milk',
  'cereals',
  'flour',
  'soda'],
 ['paper towels',
  'pork',
  'cereals',
  'fruits',
  'aluminum foil',
  'bagels',
  'butter',
  'dishwashing liquid/detergent',
  'aluminum foil',
  'eggs',
  'lunch meat',
  'flour',
  'laundry detergent',
  'vegetables'],
 ['butter',
  'indivi

  'individual meals',
  'mixes',
  'coffee/tea',
  'tortillas'],
 ['fruits',
  'juice',
  'poultry',
  'beef',
  'ketchup',
  'laundry detergent',
  'spaghetti sauce',
  'beef',
  'waffles',
  'soda',
  'soap',
  'sugar',
  'coffee/tea',
  'all- purpose'],
 ['flour',
  'hand soap',
  'dishwashing liquid/detergent',
  'sandwich bags',
  'mixes',
  'tortillas',
  'hand soap',
  'yogurt',
  'pork',
  'flour',
  'coffee/tea',
  'pasta',
  'beef',
  'poultry'],
 ['spaghetti sauce',
  'aluminum foil',
  'eggs',
  'ice cream',
  'juice',
  'cereals',
  'sugar',
  'mixes',
  'paper towels'],
 ['cheeses',
  'sandwich bags',
  'cereals',
  'juice',
  'lunch meat',
  'poultry',
  'fruits',
  'yogurt',
  'shampoo',
  'beef',
  'toilet paper',
  'sandwich loaves',
  'juice',
  'soap'],
 ['hand soap',
  'pasta',
  'bagels',
  'cheeses',
  'all- purpose',
  'mixes',
  'ketchup',
  'sandwich bags',
  'soda',
  'lunch meat',
  'spaghetti sauce',
  'beef',
  'fruits',
  'mixes'],
 ['soda',
  'bagels',
 

  'poultry'],
 ['hand soap',
  'bagels',
  'dishwashing liquid/detergent',
  'waffles',
  'beef',
  'poultry',
  'mixes',
  'laundry detergent',
  'individual meals',
  'bagels'],
 ['poultry',
  'cheeses',
  'waffles',
  'sugar',
  'vegetables',
  'aluminum foil',
  'dishwashing liquid/detergent',
  'soda',
  'milk',
  'beef',
  'all- purpose',
  'vegetables',
  'all- purpose',
  'soda'],
 ['lunch meat',
  'cereals',
  'flour',
  'yogurt',
  'fruits',
  'sugar',
  'dinner rolls',
  'soap',
  'all- purpose',
  'soap',
  'fruits',
  'sandwich bags',
  'dishwashing liquid/detergent',
  'vegetables'],
 ['cereals',
  'pasta',
  'mixes',
  'juice',
  'vegetables',
  'all- purpose',
  'dinner rolls',
  'all- purpose',
  'flour',
  'eggs',
  'ketchup',
  'yogurt',
  'flour',
  'laundry detergent'],
 ['vegetables',
  'pasta',
  'beef',
  'lunch meat',
  'dishwashing liquid/detergent',
  'ice cream',
  'lunch meat',
  'mixes',
  'laundry detergent',
  'vegetables',
  'lunch meat',
  'mixes',
  '

  'fruits',
  'butter',
  'poultry',
  'flour'],
 ['pork',
  'paper towels',
  'bagels',
  'lunch meat',
  'ice cream',
  'cheeses',
  'dishwashing liquid/detergent',
  'bagels',
  'coffee/tea',
  'waffles',
  'hand soap',
  'milk',
  'spaghetti sauce',
  'flour'],
 ['milk',
  'sandwich loaves',
  'yogurt',
  'dinner rolls',
  'eggs',
  'cereals',
  'hand soap',
  'hand soap',
  'individual meals',
  'hand soap',
  'juice',
  'pasta',
  'tortillas',
  'dishwashing liquid/detergent'],
 ['cereals',
  'butter',
  'aluminum foil',
  'bagels',
  'vegetables',
  'poultry',
  'toilet paper',
  'ketchup',
  'cheeses',
  'sandwich bags',
  'cheeses',
  'laundry detergent',
  'laundry detergent',
  'vegetables'],
 ['spaghetti sauce',
  'ketchup',
  'coffee/tea',
  'eggs',
  'dinner rolls',
  'vegetables',
  'soda',
  'ketchup',
  'waffles',
  'flour',
  'dinner rolls',
  'eggs',
  'cheeses',
  'laundry detergent'],
 ['spaghetti sauce',
  'mixes',
  'butter',
  'bagels',
  'mixes',
  'waffles',
 

  'butter',
  'bagels',
  'paper towels',
  'paper towels',
  'pasta',
  'aluminum foil',
  'lunch meat',
  'dishwashing liquid/detergent',
  'beef',
  'soda'],
 ['sandwich loaves', 'soda', 'soap', 'cheeses', 'paper towels'],
 ['toilet paper',
  'poultry',
  'sandwich bags',
  'eggs',
  'soda',
  'ice cream',
  'cheeses',
  'beef',
  'pork',
  'sandwich bags',
  'sandwich bags',
  'eggs',
  'eggs',
  'sandwich bags'],
 ['dinner rolls',
  'aluminum foil',
  'cereals',
  'pasta',
  'flour',
  'cereals',
  'yogurt',
  'vegetables',
  'cheeses',
  'spaghetti sauce',
  'dishwashing liquid/detergent',
  'spaghetti sauce',
  'soda',
  'spaghetti sauce'],
 ['mixes', 'fruits', 'pork', 'laundry detergent'],
 ['yogurt', 'waffles', 'vegetables', 'laundry detergent', 'aluminum foil'],
 ['tortillas',
  'lunch meat',
  'milk',
  'bagels',
  'soap',
  'poultry',
  'sugar',
  'ice cream',
  'flour',
  'eggs',
  'vegetables',
  'mixes',
  'ketchup',
  'eggs'],
 ['vegetables',
  'yogurt',
  'soda',
  'to

## TODO Mining association rules on Groceries datasets
* Apply apriori and association_rules functions from mlxtend library
* Apply apriori and association_rules functions from apriori_python library
* What would be a reasonable value of min-support for these supermarket transaction data

In [13]:
# TODO: replace the content of this cell with your Python solution

## Apply apriori and association_rules functions from mlxtend library
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import pandas as pd


oht = TransactionEncoder()
oht_ary = oht.fit(data_list).transform(data_list)
df = pd.DataFrame(oht_ary, columns=oht.columns_)
print (df)           

frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)
print('Support values for candidate itemsets:\n', frequent_itemsets, '\n')

rules= association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

print(rules[{'antecedents','consequents','confidence'}])


## Apply apriori and association_rules functions from apriori_python library

from apriori_python import apriori
freqItemSet, rules = apriori(data_list, minSup=0.3, minConf=0.0)

for r in rules:
    print('{} ==> {} (c={})'.format(*r)) 

      all- purpose  aluminum foil  bagels   beef  butter  cereals  cheeses  \
0             True           True   False   True    True    False    False   
1            False          False   False  False   False    False     True   
2            False          False   False  False   False    False    False   
3             True          False   False  False   False    False    False   
4            False          False   False  False   False    False    False   
...            ...            ...     ...    ...     ...      ...      ...   
1494          True          False   False   True   False    False     True   
1495         False          False   False  False   False     True     True   
1496         False          False   False   True   False    False    False   
1497          True          False   False   True   False    False     True   
1498         False          False   False  False   False    False    False   

      coffee/tea  dinner rolls  dishwashing liquid/detergent  .

## *STOP PLEASE. THE FOLLOWING IS FOR THE NEXT EXERCISE. THANKS.*

# EXERCISE: FP-Growth

## Rules  generation using pyfpgrowth library


In [17]:
pip install pyfpgrowth

Collecting pyfpgrowth
  Downloading pyfpgrowth-1.0.tar.gz (1.6 MB)
     ---------------------------------------- 1.6/1.6 MB 11.7 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pyfpgrowth
  Building wheel for pyfpgrowth (setup.py): started
  Building wheel for pyfpgrowth (setup.py): finished with status 'done'
  Created wheel for pyfpgrowth: filename=pyfpgrowth-1.0-py2.py3-none-any.whl size=5509 sha256=e9489c329fb0c5a1f7a0b3a83bb6e424df636c89c9c8b08037998f8079c6d0ff
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\30\bd\27\bbd99f16e2a89737066af54b00f0d3c1219416c24bcb0b962a
Successfully built pyfpgrowth
Installing collected packages: pyfpgrowth
Successfully installed pyfpgrowth-1.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [18]:
#Install the library if it is not available
#!pip install pyfpgrowth
import pyfpgrowth
MIN_SUPPORT = 2 
MIN_CONFIDENCE = 0.7
DATASET = [['A', 'C', 'D'], ['B', 'C', 'E'], ['A', 'B', 'C','E'],['B', 'E']]

frequent_itemsets = pyfpgrowth.find_frequent_patterns(DATASET, MIN_SUPPORT)
print('Support values for candidate itemsets:\n', frequent_itemsets, '\n')
rules = pyfpgrowth.generate_association_rules(frequent_itemsets, MIN_CONFIDENCE)
print('Resultant assoication rules:\n')
pprint.pprint(rules) 

Support values for candidate itemsets:
 {('A',): 2, ('A', 'C'): 2, ('C',): 3, ('B', 'C'): 2, ('B',): 3, ('E',): 3, ('C', 'E'): 2, ('B', 'E'): 3, ('B', 'C', 'E'): 2} 

Resultant assoication rules:

{('A',): (('C',), 1.0),
 ('B',): (('E',), 1.0),
 ('B', 'C'): (('E',), 1.0),
 ('C', 'E'): (('B',), 1.0),
 ('E',): (('B',), 1.0)}


## Rules  generation using fpgrowth_py library


In [19]:
pip install fpgrowth_py

Collecting fpgrowth_py
  Downloading fpgrowth_py-1.0.0-py3-none-any.whl (5.6 kB)
Installing collected packages: fpgrowth_py
Successfully installed fpgrowth_py-1.0.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [20]:
#!pip install fpgrowth_py
# http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/#:~:text=FP%2DGrowth%20%5B1%5D%20is,established%20Apriori%20algorighm%20%5B2%5D.
from fpgrowth_py import fpgrowth
freqItemSet, rules = fpgrowth(DATASET, minSupRatio=0.5, minConf=0.7)
for r in rules:
    print('{} ==> {} (c={})'.format(*r)) 

{'A'} ==> {'C'} (c=1.0)
{'B', 'C'} ==> {'E'} (c=1.0)
{'C', 'E'} ==> {'B'} (c=1.0)
{'B'} ==> {'E'} (c=1.0)
{'E'} ==> {'B'} (c=1.0)


### TODO Mining association rules using FP-growth and fpgrowth_py on Groceries datasets
* Try different confidence thresholds
* What’s a reasonable value for real data?



In [21]:
from fpgrowth_py import fpgrowth 
freqItemSet, rules = fpgrowth(data_list, minSupRatio = 0.05, minConf = 0.7) 
for r in rules: 
    print('{} ==> {} (c = {})'.format(*r))

{'sandwich loaves', 'eggs'} ==> {'vegetables'} (c = 0.7142857142857143)
{'paper towels', 'poultry'} ==> {'vegetables'} (c = 0.7047619047619048)
{'laundry detergent', 'yogurt'} ==> {'vegetables'} (c = 0.7368421052631579)
{'cheeses', 'yogurt'} ==> {'vegetables'} (c = 0.7169811320754716)
{'laundry detergent', 'flour'} ==> {'vegetables'} (c = 0.7009345794392523)
{'ice cream', 'laundry detergent'} ==> {'vegetables'} (c = 0.7211538461538461)
{'laundry detergent', 'cereals'} ==> {'vegetables'} (c = 0.7387387387387387)
{'lunch meat', 'sugar'} ==> {'vegetables'} (c = 0.7155172413793104)


# End of Exercise. Many Thanks.