# PAMI: A Python library for Pattern Mining

### Source

Towards Data Science: https://towardsdatascience.com/hello-i-am-pami-937439c7984d

Github: https://github.com/udayRage/PAMI

User Manual: https://udayrage.github.io/PAMI/index.html

In [1]:
# Install the library
! pip install pami

### What is the organizational structure of PAMI?

The algorithms in PAMI have been organized in an hierarchical fashion. The format of this hierarchy is,

    PAMI.patternMiningModel.typeOfPattern.Algorithm

1. patternMiningModel — denotes the type of pattern that needs to be discovered, such as frequent pattern, correlated pattern, fuzzy frequent pattern, etc.

2. typeOfPattern — denotes the classification of the pattern. Currently, PAMI implements four types of patterns. (i) basic — find all patterns in the data, (ii) closed — find only closed patterns in the data, (iii) maximal — find only maximal patterns in the data and (iv) topK — find top-k patterns in the data.

3. Algorithm — denotes the technique used for discovering the patterns.


An example is

    PAMI.frequentPattern.basic.FPGrowth
    

where frequentPattern is the model, basic is the pattern type, and FPGrowth is the mining algorithm.

### How to utilize PAMI in order to discover frequent patterns?

Frequent pattern mining is an important knowledge discovery technique in Big Data Analytics. It involves identifying all itemsets (or patterns) that are occurring frequently in the data. A classic application is a market-basket analytics. It involves identifying the itemsets that were frequently purchased by the customers. An example of a frequent pattern is:

    {Beer, Cheese}     [support=10%]
    
The above pattern says that 10% of the customers have purchased the items ‘Beer’ and ‘Cheese’ together. Such a piece of information can be found extremely useful for the users in the placement of the products and inventory management.

More information on frequent pattern mining can be found at

https://link.springer.com/article/10.1007/s10618-018-0556-z


### We now walk through the step-by-step process of frequent pattern mining using the PAMI library.

1. Download the transactional Retail dataset at https://www.u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_retail.csv

In [13]:
# check the dataset
with open('transactional_retail.csv', encoding='utf8') as FI:
    i = 0
    for line in FI:
        print(line)
        print('----------------')
        i += 1
        if i> 5:
            break

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29

----------------
30	31	32

----------------
33	34	35

----------------
36	37	38	39	40	41	42	43	44	45	46

----------------
38	39	47	48

----------------
38	39	48	49	50	51	52	53	54	55	56	57	58

----------------


2. FP-growth is a famous algorithm for frequent pattern mining. Let us first import this algorithm by executing the following command

In [2]:
from PAMI.frequentPattern.basic import FPGrowth as alg

3. Initialize the FP-growth algorithm by providing the file, minimum support (minSup), and separator as the input parameters.

In [3]:
obj = alg.FPGrowth('transactional_retail.csv',100,'\t')

# 'transactional_retail.csv' is the input file downloaded from the URL

#100 is the minimum support count. 

#\t is the separetor that exist between the items in a transaction

4. Start the mining process

In [4]:
obj.startMine()

Frequent patterns were generated successfully using frequentPatternGrowth algorithm


5. Show the discovered patterns:

In [14]:
df = obj.getPatternsAsDataFrame()
df

Unnamed: 0,Patterns,Support
0,14248,100
1,7540,100
2,6998,100
3,6173,100
4,6024,100
...,...,...
6446,38 48 39,6102
6447,38 39,10345
6448,48,42135
6449,48 39,29142


6. Save the patterns in a file by executing the following code:

In [6]:
obj.savePatterns('frequentPatters_100.txt')
# 100 is the minSup count
# In the output file, say frequentPatters_100.txt, the first column is the pattern and the second column is the support.

7. Runtime and memory requirements of the mining algorithm can be derived by executing the following code:

In [15]:
print('Runtime: ' + str(obj.getRuntime()))
print('Memory: ' + str(obj.getMemoryRSS()))

Runtime: 19.962828397750854
Memory: 335929344
