**Frequent Patterns:**
Here we will mine some common types of frequent patterns that are
1. Basic frequent patterns
2. Closed frequent patterns
3. Maximal frequent patterns
4. top-K frequent patterns

In [None]:
#@title
# These few lines of codes are to allow importing custom module in colab (no need if using local/dedicated server)

### Write the location of the current folder
BASE = '/content/gdrive/My Drive/QutX/B_5_Association_Mining'

from google.colab import drive
drive.mount('/content/gdrive')

import sys
sys.path.insert(0, BASE)

In [None]:
#@title
# These few lines of codes are to allow loading files from google drive (no need if using local/dedicated server)
!pip install -U -q PyDrive
from gd_file_handler import GoogleDriveFileHandler

### Write the link of the file on the GoogleDrive
link = 'https://drive.google.com/file/d/1P7C3UrLf4B1hamEVYl0Y74QlnWC6PtS0/view?usp=sharing'
### Write the file name
file_name = 'transactional_retail.csv'

gd_file_h = GoogleDriveFileHandler()
gd_file_h.download_file(link, file_name)

In [None]:
#@title
# check the dataset
N = 0
db = []
with open('transactional_retail.csv', encoding='utf8') as FI:
    for line in FI:
        N += 1
        db.append(line.strip().split('\t'))

**Install PAMI library**

In [None]:
! pip install pami # This command works only on Linux and Mac OS (On Windows, need to install using command prompt)

**Inport required libraries**

### What is the organizational structure of PAMI?

The algorithms in PAMI have been organized in an hierarchical fashion. The format of this hierarchy is,

    PAMI.patternMiningModel.typeOfPattern.Algorithm

1. patternMiningModel — denotes the type of pattern that needs to be discovered, such as frequent pattern, correlated pattern, fuzzy frequent pattern, etc.

2. typeOfPattern — denotes the classification of the pattern. Currently, PAMI implements four types of patterns. (i) basic — find all patterns in the data, (ii) closed — find only closed patterns in the data, (iii) maximal — find only maximal patterns in the data and (iv) topK — find top-k patterns in the data.

3. Algorithm — denotes the technique used for discovering the patterns.


An example is

    PAMI.frequentPattern.basic.FPGrowth
    

where frequentPattern is the model, basic is the pattern type, and FPGrowth is the mining algorithm.

In [None]:
from PAMI.frequentPattern.basic import FPGrowth as freq_alg
from PAMI.frequentPattern.basic import Apriori as freq_apriori
from PAMI.frequentPattern.closed import CHARM as closed_alg
from PAMI.frequentPattern.maximal import MaxFPGrowth as maximal_alg
from PAMI.frequentPattern.topk import FAE as topK_alg

from PAMI.sequentialPatternMining import prefixSpan as seq_alg
from PAMI.AssociationRules import RuleMiner as rule_alg

**Find frequent patterns**

An example of finding two item frequent patterns when minimum support is defined to be 2. 

<img src='https://miro.medium.com/max/836/1*OpcjjDrGMe650au1AlXJdA.png'> </br>

<img src='https://miro.medium.com/max/840/1*gjetx2BB8ejqkRtWuzWltg.png'> </br>

<img src='https://miro.medium.com/max/832/1*KBqNzc4mp-DlBeJc36XubA.png'> </br>

In [None]:
# Initialize the FP-growth algorithm by providing the file, minimum support (minSup), and separator as the input parameters.

obj = freq_alg.FPGrowth('transactional_retail.csv',100,'\t')

# 'transactional_retail.csv' is the input file downloaded from the URL https://www.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_retail.csv

# 100 is the minimum support count. 

#\t is the separetor that exist between the items in a transaction

In [None]:
# Start mining the pattern

obj.startMine()

In [None]:
# Show the discovered patterns as pandas DataFrame

df = obj.getPatternsAsDataFrame()
df

In [None]:
# Save the patterns in a file

obj.savePatterns('frequentPatters_100.txt')

# In the output file, say frequentPatters_100.txt, the first column is the pattern and the second column is the support.

In [None]:
# Runtime and memory requirements of the mining algorithm 

print('Runtime: ' + str(obj.getRuntime()))
print('Memory: ' + str(obj.getMemoryRSS()))

**Find Closed Pattern:**
It is used to reduce the number of frequent patterns. 

A closed pattern is a frequent pattern. So it meets the minimum support criteria. In addition to that, all super-patterns of a closed pattern are less frequent than the closed pattern.

Let’s see some examples.

Suppose, the minimum support count is 2. For the first example, suppose there are a total of 3 items: a, b, c. Suppose a pattern ab has support count of 2 and a pattern abc has support count of 2. Is the pattern ab is a closed pattern? Pattern ab is a frequent pattern, but it has a super-pattern that is NOT less frequent than ab.

For the second example,

suppose there are a total of 3 items: x, y, z. suppose a pattern xy has support count of 3 and a pattern xyz has support count of 2. Is the pattern xy is a closed pattern? Pattern xy is a frequent pattern and also the only super-pattern xyz is less frequent than xy.

Therefore, xy is a closed pattern.

In [None]:
obj = closed_alg.CHARM('transactional_retail.csv',100,'\t')
obj.startMine()
obj.savePatterns('closedPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df


**Find Maximal Pattern:**
It is used to reduce the number of frequent patterns.

A max pattern is a frequent pattern. So it also meets the minimum support criteria like closed pattern In addition, but unlike closed pattern, all super-patterns of a max pattern are NOT frequent patterns.

Let’s see some examples as well.

Suppose, the minimum support count is 2. Like before, for the first example, suppose there are a total of 3 items: a, b, c. Suppose a pattern ab has support count of 3 and a pattern abc has support count of 2. Is the pattern ab is a max pattern? Pattern ab is a frequent pattern, but it has a super-pattern that is a frequent pattern as well. So, pattern ab is NOT a max pattern.

For the second example,

suppose there are a total of 3 items: x, y, z. Suppose a pattern xy has support count of 3 and a pattern xyz has support count of 1. Is the pattern xy is a max pattern? Pattern xy is a frequent pattern and also the only super-pattern xyz is NOT a frequent pattern. Therefore, xy is a max pattern.

In [None]:
obj = maximal_alg.MaxFPGrowth('transactional_retail.csv',100,'\t')
obj.startMine()
obj.savePatterns('maximalPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df

**Find top-k Patterns:**
It is used to reduce the number of frequent patterns.

The top-K pattern aims to return the most frequent K patterns a given dataset. 

In [None]:
obj = topK_alg.FAE('transactional_retail.csv',10,'\t')
obj.startMine()
obj.savePatterns('topKPatters_100.txt')
df = obj.getPatternsAsDataFrame()
df

**Association Rule Mining** 

We will use **mlxtend** as PAMI does not support rule mining right now.

Association Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X→Y, where X and Y are disjoint itemsets [1]. A more concrete example based on consumer behaviour would be {Diapers}→{Beer} suggesting that people who buy diapers are also likely to buy beer. To evaluate the "interest" of such an association rule, different metrics have been developed. The current implementation make use of the confidence and lift. We will use only confidence here. 

In [None]:
# First we generate frequent patterns
obj = freq_alg.FPGrowth('transactional_retail.csv',100,'\t')
obj.startMine()
df = obj.getPatternsAsDataFrame()
df

In [None]:
# Changing format for inter library compatibility of PAMI and mlxtend
df.columns = ['itemsets', 'support']
df['itemsets'] = df['itemsets'].apply(lambda x: frozenset(x.split()))

In [None]:
df

In [None]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar

**Sequential Pattern Mining**

In [None]:
!pip install -U prefixspan

In [None]:
from prefixspan import PrefixSpan

ps = PrefixSpan(db[:30])

In [None]:
ps.frequent(2)

In [None]:
ps.topk(10)

In [None]:
ps.frequent(2, closed=True)

In [None]:
ps.topk(5, closed=True)

**Sequential Rule Mining**

In [None]:
from prefixspan import PrefixSpan

ps = PrefixSpan(db[:30])

In [None]:
supports = [element[0] for element in ps.frequent(2)]
patterns = [frozenset(element[1]) for element in ps.frequent(2)]

In [None]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar

In [None]:
import pandas as pd
df = pd.DataFrame(zip(patterns, supports))
df.columns = ['itemsets', 'support']
df.head(5)

In [None]:
#!pip install mlxtend --upgrade
from mlxtend.frequent_patterns import association_rules
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.60)
df_ar