**Frequent Patterns:**
Here we will mine some common types of frequent patterns that are
1. Basic frequent patterns
2. Closed frequent patterns
3. Maximal frequent patterns
4. top-K frequent patterns

In [1]:
#@title
# check the dataset
N = 0
db = []
with open('transactional_retail.csv', encoding='utf8') as FI:
    for line in FI:
        N += 1
        db.append(line.strip().split('\t'))

**Install PAMI library**

In [4]:
#! pip install pami # This command works only on Linux and Mac OS (On Windows, need to install using command prompt)

**Inport required libraries**

### What is the organizational structure of PAMI?

The algorithms in PAMI have been organized in an hierarchical fashion. The format of this hierarchy is,

    PAMI.patternMiningModel.typeOfPattern.Algorithm

1. patternMiningModel — denotes the type of pattern that needs to be discovered, such as frequent pattern, correlated pattern, fuzzy frequent pattern, etc.

2. typeOfPattern — denotes the classification of the pattern. Currently, PAMI implements four types of patterns. (i) basic — find all patterns in the data, (ii) closed — find only closed patterns in the data, (iii) maximal — find only maximal patterns in the data and (iv) topK — find top-k patterns in the data.

3. Algorithm — denotes the technique used for discovering the patterns.


An example is

    PAMI.frequentPattern.basic.FPGrowth
    

where frequentPattern is the model, basic is the pattern type, and FPGrowth is the mining algorithm.

In [9]:
from PAMI.frequentPattern.basic import FPGrowth as freq_alg
from PAMI.frequentPattern.basic import Apriori as freq_apriori
from PAMI.frequentPattern.closed import CHARM as closed_alg
from PAMI.frequentPattern.maximal import MaxFPGrowth as maximal_alg
from PAMI.frequentPattern.topk import FAE as topK_alg

# from PAMI.sequentialPatternMining import prefixSpan as seq_alg
# from PAMI.AssociationRules import RuleMiner as rule_alg

**Find frequent patterns**

An example of finding two item frequent patterns when minimum support is defined to be 2. 

<img src='https://miro.medium.com/max/836/1*OpcjjDrGMe650au1AlXJdA.png'> </br>

<img src='https://miro.medium.com/max/840/1*gjetx2BB8ejqkRtWuzWltg.png'> </br>

<img src='https://miro.medium.com/max/832/1*KBqNzc4mp-DlBeJc36XubA.png'> </br>

In [10]:
# Initialize the FP-growth algorithm by providing the file, minimum support (minSup), and separator as the input parameters.

obj = freq_alg.FPGrowth('transactional_retail.csv',100,'\t')

# 'transactional_retail.csv' is the input file downloaded from the URL https://www.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_retail.csv

# 100 is the minimum support count. 

#\t is the separetor that exist between the items in a transaction

In [11]:
# Start mining the pattern

obj.startMine()

Frequent patterns were generated successfully using frequentPatternGrowth algorithm


In [12]:
# Show the discovered patterns as pandas DataFrame

df = obj.getPatternsAsDataFrame()
df

Unnamed: 0,Patterns,Support
0,14248,100
1,7540,100
2,6998,100
3,6173,100
4,6024,100
...,...,...
6446,38 48 39,6102
6447,38 39,10345
6448,48,42135
6449,48 39,29142


In [14]:
# Save the patterns in a file

obj.save('frequentPatters_100.txt')

# In the output file, say frequentPatters_100.txt, the first column is the pattern and the second column is the support.

df = obj.getPatternsAsDataFrame()
df

In [15]:
# Runtime and memory requirements of the mining algorithm 

print('Runtime: ' + str(obj.getRuntime()))
print('Memory: ' + str(obj.getMemoryRSS()))

Runtime: 10.099799871444702
Memory: 442720256


**Find Closed Pattern:**
It is used to reduce the number of frequent patterns. 

A closed pattern is a frequent pattern. So it meets the minimum support criteria. In addition to that, all super-patterns of a closed pattern are less frequent than the closed pattern.

Let’s see some examples.

Suppose, the minimum support count is 2. For the first example, suppose there are a total of 3 items: a, b, c. Suppose a pattern ab has support count of 2 and a pattern abc has support count of 2. Is the pattern ab is a closed pattern? Pattern ab is a frequent pattern, but it has a super-pattern that is NOT less frequent than ab.

For the second example,

suppose there are a total of 3 items: x, y, z. suppose a pattern xy has support count of 3 and a pattern xyz has support count of 2. Is the pattern xy is a closed pattern? Pattern xy is a frequent pattern and also the only super-pattern xyz is less frequent than xy.

Therefore, xy is a closed pattern.

In [16]:
obj = closed_alg.CHARM('transactional_retail.csv',100,'\t')
obj.startMine()
obj.save('closedPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df


Closed Frequent patterns were generated successfully using CHARM algorithm


Unnamed: 0,Patterns,Support
0,476,102
1,3411,116
2,348,138
3,1765,148
4,349,106
...,...,...
4753,38 48,10345
4754,38,15596
4755,39 48,29142
4756,48,42135


**Find Maximal Pattern:**
It is used to reduce the number of frequent patterns.

A max pattern is a frequent pattern. So it also meets the minimum support criteria like closed pattern In addition, but unlike closed pattern, all super-patterns of a max pattern are NOT frequent patterns.

Let’s see some examples as well.

Suppose, the minimum support count is 2. Like before, for the first example, suppose there are a total of 3 items: a, b, c. Suppose a pattern ab has support count of 3 and a pattern abc has support count of 2. Is the pattern ab is a max pattern? Pattern ab is a frequent pattern, but it has a super-pattern that is a frequent pattern as well. So, pattern ab is NOT a max pattern.

For the second example,

suppose there are a total of 3 items: x, y, z. Suppose a pattern xy has support count of 3 and a pattern xyz has support count of 1. Is the pattern xy is a max pattern? Pattern xy is a frequent pattern and also the only super-pattern xyz is NOT a frequent pattern. Therefore, xy is a max pattern.

In [17]:
obj = maximal_alg.MaxFPGrowth('transactional_retail.csv',100,'\t')
obj.startMine()
obj.save('maximalPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df

Maximal Frequent patterns were generated successfully using MaxFp-Growth algorithm 


Unnamed: 0,Patterns,Support
0,14248,100
1,7540,100
2,6998,100
3,6173,100
4,6024,100
...,...,...
2957,39 48 38 41 89,148
2958,38 32 65,128
2959,39 48 38 41 65,130
2960,39 48 32 41 65,122


**Find top-k Patterns:**
It is used to reduce the number of frequent patterns.

The top-K pattern aims to return the most frequent K patterns a given dataset. 

In [18]:
obj = topK_alg.FAE('transactional_retail.csv',10,'\t')
obj.startMine()
obj.save('topKPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df

 TopK frequent patterns were successfully generated using FAE algorithm.


Unnamed: 0,Patterns,Support
0,39,50675
1,48,42135
2,39 48,29142
3,38,15596
4,32,15167
5,41,14945
6,39 41,11414
7,39 38,10345
8,48 41,9018
9,39 32,8455


**Association Rule Mining** 

We will use **mlxtend** as PAMI does not support rule mining right now.

Association Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X→Y, where X and Y are disjoint itemsets [1]. A more concrete example based on consumer behaviour would be {Diapers}→{Beer} suggesting that people who buy diapers are also likely to buy beer. To evaluate the "interest" of such an association rule, different metrics have been developed. The current implementation make use of the confidence and lift. We will use only confidence here. 

In [19]:
# First we generate frequent patterns
obj = freq_alg.FPGrowth('transactional_retail.csv',100,'\t')
obj.startMine()
df = obj.getPatternsAsDataFrame()
df

Frequent patterns were generated successfully using frequentPatternGrowth algorithm


Unnamed: 0,Patterns,Support
0,14248,100
1,7540,100
2,6998,100
3,6173,100
4,6024,100
...,...,...
6446,38 48 39,6102
6447,38 39,10345
6448,48,42135
6449,48 39,29142


In [20]:
# Changing format for inter library compatibility of PAMI and mlxtend
df.columns = ['itemsets', 'support']
df['itemsets'] = df['itemsets'].apply(lambda x: frozenset(x.split()))

In [21]:
df

Unnamed: 0,itemsets,support
0,(14248),100
1,(7540),100
2,(6998),100
3,(6173),100
4,(6024),100
...,...,...
6446,"(48, 38, 39)",6102
6447,"(38, 39)",10345
6448,(48),42135
6449,"(48, 39)",29142


In [23]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(15879),(48),128.0,42135.0,104.0,0.812500,0.000019,-5.393176e+06,-224714.666667,-1.002455
1,(4164),(48),135.0,42135.0,105.0,0.777778,0.000018,-5.688120e+06,-189603.000000,-1.002480
2,(3857),(39),135.0,50675.0,106.0,0.785185,0.000015,-6.841019e+06,-235896.206897,-1.002081
3,(2049),(48),135.0,42135.0,101.0,0.748148,0.000018,-5.688124e+06,-167296.764706,-1.002385
4,(1473),(39),135.0,50675.0,108.0,0.800000,0.000016,-6.841017e+06,-253370.000000,-1.002120
...,...,...,...,...,...,...,...,...,...,...
3957,"(48, 32)",(39),8034.0,50675.0,5402.0,0.672392,0.000013,-4.071175e+08,-154678.919453,-1.119306
3958,"(32, 39)",(48),8455.0,42135.0,5402.0,0.638912,0.000015,-3.562460e+08,-116686.200459,-1.147044
3959,"(48, 38)",(39),7944.0,50675.0,6102.0,0.768127,0.000015,-4.025561e+08,-218541.941368,-1.136882
3960,(38),(39),15596.0,50675.0,10345.0,0.663311,0.000013,-7.903170e+08,-150506.894687,-1.256492


**Sequential Pattern Mining**

In [None]:
# !pip install -U prefixspan

In [24]:
from prefixspan import PrefixSpan

ps = PrefixSpan(db[:30])

In [25]:
ps.frequent(2)

[(2, ['3']),
 (4, ['32']),
 (2, ['32', '41']),
 (3, ['36']),
 (3, ['36', '38']),
 (3, ['36', '38', '39']),
 (2, ['36', '38', '39', '41']),
 (2, ['36', '38', '39', '48']),
 (2, ['36', '38', '41']),
 (2, ['36', '38', '48']),
 (3, ['36', '39']),
 (2, ['36', '39', '41']),
 (2, ['36', '39', '48']),
 (2, ['36', '41']),
 (2, ['36', '48']),
 (8, ['38']),
 (8, ['38', '39']),
 (3, ['38', '39', '41']),
 (5, ['38', '39', '48']),
 (2, ['38', '39', '56']),
 (3, ['38', '41']),
 (5, ['38', '48']),
 (2, ['38', '56']),
 (17, ['39']),
 (6, ['39', '41']),
 (3, ['39', '41', '48']),
 (10, ['39', '48']),
 (2, ['39', '48', '89']),
 (2, ['39', '56']),
 (2, ['39', '79']),
 (2, ['39', '89']),
 (8, ['41']),
 (3, ['41', '48']),
 (13, ['48']),
 (2, ['48', '89']),
 (2, ['56']),
 (2, ['79']),
 (2, ['89'])]

In [26]:
ps.topk(10)

[(17, ['39']),
 (13, ['48']),
 (10, ['39', '48']),
 (8, ['38']),
 (8, ['38', '39']),
 (8, ['41']),
 (6, ['39', '41']),
 (5, ['38', '39', '48']),
 (5, ['38', '48']),
 (4, ['32'])]

In [27]:
ps.frequent(2, closed=True)

[(2, ['3']),
 (4, ['32']),
 (2, ['32', '41']),
 (3, ['36', '38', '39']),
 (2, ['36', '38', '39', '41']),
 (2, ['36', '38', '39', '48']),
 (8, ['38', '39']),
 (3, ['38', '39', '41']),
 (5, ['38', '39', '48']),
 (2, ['38', '39', '56']),
 (17, ['39']),
 (6, ['39', '41']),
 (3, ['39', '41', '48']),
 (10, ['39', '48']),
 (2, ['39', '48', '89']),
 (2, ['39', '79']),
 (8, ['41']),
 (13, ['48'])]

In [28]:
ps.topk(5, closed=True)

[(17, ['39']),
 (13, ['48']),
 (10, ['39', '48']),
 (8, ['38', '39']),
 (8, ['41'])]

**Sequential Rule Mining**

In [29]:
from prefixspan import PrefixSpan

ps = PrefixSpan(db[:30])

In [30]:
supports = [element[0] for element in ps.frequent(2)]
patterns = [frozenset(element[1]) for element in ps.frequent(2)]

In [31]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(15879),(48),128.0,42135.0,104.0,0.812500,0.000019,-5.393176e+06,-224714.666667,-1.002455
1,(4164),(48),135.0,42135.0,105.0,0.777778,0.000018,-5.688120e+06,-189603.000000,-1.002480
2,(3857),(39),135.0,50675.0,106.0,0.785185,0.000015,-6.841019e+06,-235896.206897,-1.002081
3,(2049),(48),135.0,42135.0,101.0,0.748148,0.000018,-5.688124e+06,-167296.764706,-1.002385
4,(1473),(39),135.0,50675.0,108.0,0.800000,0.000016,-6.841017e+06,-253370.000000,-1.002120
...,...,...,...,...,...,...,...,...,...,...
3957,"(48, 32)",(39),8034.0,50675.0,5402.0,0.672392,0.000013,-4.071175e+08,-154678.919453,-1.119306
3958,"(32, 39)",(48),8455.0,42135.0,5402.0,0.638912,0.000015,-3.562460e+08,-116686.200459,-1.147044
3959,"(48, 38)",(39),7944.0,50675.0,6102.0,0.768127,0.000015,-4.025561e+08,-218541.941368,-1.136882
3960,(38),(39),15596.0,50675.0,10345.0,0.663311,0.000013,-7.903170e+08,-150506.894687,-1.256492


In [32]:
import pandas as pd
df = pd.DataFrame(zip(patterns, supports))
df.columns = ['itemsets', 'support']
df.head(5)

Unnamed: 0,itemsets,support
0,(3),2
1,(32),4
2,"(41, 32)",2
3,(36),3
4,"(36, 38)",3


In [33]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(36),(38),3.0,8.0,3.0,1.0,0.125,-21.0,inf,-1.4
1,"(36, 38)",(39),3.0,17.0,3.0,1.0,0.058824,-48.0,inf,-1.142857
2,"(36, 39)",(38),3.0,8.0,3.0,1.0,0.125,-21.0,inf,-1.4
3,(36),"(38, 39)",3.0,8.0,3.0,1.0,0.125,-21.0,inf,-1.4
4,"(36, 41, 38)",(39),2.0,17.0,2.0,1.0,0.058824,-32.0,inf,-1.066667
5,"(36, 41, 39)",(38),2.0,8.0,2.0,1.0,0.125,-14.0,inf,-1.166667
6,"(38, 41, 39)",(36),3.0,3.0,2.0,0.666667,0.222222,-7.0,-6.0,-2.333333
7,"(36, 38, 39)",(41),3.0,8.0,2.0,0.666667,0.083333,-22.0,-21.0,-1.222222
8,"(36, 41)","(38, 39)",2.0,8.0,2.0,1.0,0.125,-14.0,inf,-1.166667
9,"(38, 41)","(36, 39)",3.0,3.0,2.0,0.666667,0.222222,-7.0,-6.0,-2.333333
