General approach to FP-growth algorithm

1. Collect:The first step is to scan the database to find the occurrences of the itemsets in the database.

2. Count of 1-itemsets in the database is called support count or frequency of 1-itemset.

3. The second step is to construct the FP tree. For this, create the root of the tree. The root is represented by null.

4. The next step is to scan the database again and examine the transactions and we construct branch of tree with transaction in item set

5. The next step is to mine the created FP Tree. For this, the lowest node is examined first along with the links of the lowest nodes. The lowest node represents the frequency pattern length 1.

6. Conditional pattern base is a sub-database consisting of prefix paths in the FP tree occurring with the lowest node (suffix).

7. Construct the frequent patterns from the conditional patterns collected in above step. 


Tree display

In [None]:
class treeNode:
    def __init__(self, itemname, item_count, parentNode):
        self.itemname = itemname
        self.item_count = item_count
        self.nodeLink = None
        self.parent = parentNode      #needs to be updated
        self.children = {} 
    def inc(self, item_count):
        self.item_count += item_count
    def disp(self, ind=1):
        print ('  '*ind, self.itemname, ' ', self.item_count)
        for child in self.children.values():
            child.disp(ind+1)

Creating FP tree

In [None]:
def createTree(dataSet, minSup=1): 
    itemTable = {}
    for transactions in dataSet:
        for item in transactions:
            itemTable[item] = itemTable.get(item, 0) + dataSet[transactions]
    for k in list(itemTable):  
        if itemTable[k] < minSup: 
            del(itemTable[k])
    freqItemSet = set(itemTable.keys())

    if len(freqItemSet) == 0: return None, None  
    for k in itemTable:
        itemTable[k] = [itemTable[k], None] 

    retTree = treeNode('Null Set', 1, None) 
    for tranSet, count in dataSet.items():  
        localD = {}
        for item in tranSet:
            if item in freqItemSet:
                localD[item] = itemTable[item][0]
        if len(localD) > 0:
            orderedItems = [v[0] for v in sorted(localD.items(), key=lambda p: p[1], reverse=True)]
            updateTree(orderedItems, retTree, itemTable, count)
    return retTree, itemTable 

Loading data

In [None]:
def load_data():

    simpDat = [["MILK","BREAD","BISCUIT"],
               ["BREAD","TEA","BOURNVITA"],
               ["JAM","MAGGI","BREAD","MILK"],
               ["MAGGI","TEA","BISCUIT"],
               ["BREAD","TEA","BOURNVITA"],
               ["MAGGI","BREAD","TEA","BISCUIT"],
               ["JAM","MAGGI","BREAD","TEA"],
               ["BREAD","MILK"],
               ["COFFEE","COCK","BISCUIT","CORNFLAKES"],
               ["COFFEE","COCK","BISCUIT","CORNFLAKES"],
               ["COFFEE","SUGER","BOURNVITA"],
               ["BREAD","COFFEE","COCK"],
               ["BREAD","SUGER","BISCUIT"],
               ["COFFEE","SUGER","CORNFLAKES"],
               ["BREAD","SUGER","BOURNVITA"],
               ["BREAD","COFFEE","SUGER"],
               ["BREAD","COFFEE","SUGER"],
               ["TEA","MILK","COFFEE","CORNFLAKES"]]
    return simpDat

Creating inital item set

In [None]:
def create_itemset(dataSet):
    itemDict = {}
    for trans in dataSet:
        itemDict[frozenset(trans)] = 1
    return itemDict

In [None]:
simpDat = load_data()

Initial item sets

In [None]:
initSet = create_itemset(simpDat)

In [None]:
initSet

{frozenset({'BISCUIT', 'BREAD', 'MILK'}): 1,
 frozenset({'BOURNVITA', 'BREAD', 'TEA'}): 1,
 frozenset({'BREAD', 'JAM', 'MAGGI', 'MILK'}): 1,
 frozenset({'BISCUIT', 'MAGGI', 'TEA'}): 1,
 frozenset({'BISCUIT', 'BREAD', 'MAGGI', 'TEA'}): 1,
 frozenset({'BREAD', 'JAM', 'MAGGI', 'TEA'}): 1,
 frozenset({'BREAD', 'MILK'}): 1,
 frozenset({'BISCUIT', 'COCK', 'COFFEE', 'CORNFLAKES'}): 1,
 frozenset({'BOURNVITA', 'COFFEE', 'SUGER'}): 1,
 frozenset({'BREAD', 'COCK', 'COFFEE'}): 1,
 frozenset({'BISCUIT', 'BREAD', 'SUGER'}): 1,
 frozenset({'COFFEE', 'CORNFLAKES', 'SUGER'}): 1,
 frozenset({'BOURNVITA', 'BREAD', 'SUGER'}): 1,
 frozenset({'BREAD', 'COFFEE', 'SUGER'}): 1,
 frozenset({'COFFEE', 'CORNFLAKES', 'MILK', 'TEA'}): 1}

Creating FP tree and display it 

In [None]:
myFPtree, myHeaderTab = createTree(initSet, 3)

In [None]:
myFPtree.disp()

   Null Set   1
     BREAD   10
       BISCUIT   3
         MILK   1
         TEA   1
           MAGGI   1
         SUGER   1
       TEA   2
         BOURNVITA   1
         MAGGI   1
       MILK   2
         MAGGI   1
       COFFEE   2
         SUGER   1
       SUGER   1
         BOURNVITA   1
     BISCUIT   1
       TEA   1
         MAGGI   1
     COFFEE   4
       BISCUIT   1
         CORNFLAKES   1
       SUGER   2
         BOURNVITA   1
         CORNFLAKES   1
       TEA   1
         MILK   1
           CORNFLAKES   1


construct the tree to the prefix paths

In [None]:
def tree_construct(leafNode, prefixPath): 
    if leafNode.parent != None:
        prefixPath.append(leafNode.itemname)
        tree_construct(leafNode.parent, prefixPath)

Get conditional patterns from the tree constructed

In [None]:
def findPrefixPath(basePat, treeNode): #treeNode comes from item table
    condPats = {}
    while treeNode != None:
        prefixPath = []
        tree_construct(treeNode, prefixPath)
        if len(prefixPath) > 1: 
            condPats[frozenset(prefixPath[1:])] = treeNode.item_count
        treeNode = treeNode.nodeLink
    return condPats

Get the patterns from the constructed tree.

In [None]:
findPrefixPath('TEA', myHeaderTab['TEA'][1])

{frozenset({'BREAD'}): 2,
 frozenset({'BISCUIT'}): 1,
 frozenset({'BISCUIT', 'BREAD'}): 1,
 frozenset({'COFFEE'}): 1}

In [None]:
findPrefixPath('MILK', myHeaderTab['MILK'][1])

{frozenset({'BREAD'}): 2,
 frozenset({'BISCUIT', 'BREAD'}): 1,
 frozenset({'COFFEE', 'TEA'}): 1}