Skip to content

The shortest yet efficient implementation of the famous sequential pattern mining algorithm PrefixSpan in Python.

License

Notifications You must be signed in to change notification settings

nkmrtty/PrefixSpan-py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI version

The shortest yet efficient implementation of PrefixSpan in Python 3, in less than 15 lines in core part. You can find the Scala version here.

Installation

This package is available on PyPi. Just use pip3 install -U prefixspan to install it.

CLI Usage

You can simply use the algorithm on terminal.

prefixspan-cli (frequent | top-k) <threshold> [--minlen=1] [--maxlen=maxint] [<file>]
  • Sequences are read from standard input. Each sequence is integers separated by space, like this example:
0 1 2 3 4
1 1 1 3 4
2 1 2 2 0
1 1 1 2 2
  • The patterns and their respective frequencies are printed to standard output.

API Usage

Alternatively, you can use the algorithm via API.

from prefixspan.api import PrefixSpan

db = [
    [0, 1, 2, 3, 4],
    [1, 1, 1, 3, 4],
    [2, 1, 2, 2, 0],
    [1, 1, 1, 2, 2],
]

ps = PrefixSpan(db)

print(ps.frequent(2))
print(ps.topk(10))

Features

Outputs traditional single-item sequential patterns, where gaps are allowed between items.

  • Mining top-k patterns is also supported, with respective optimizations.
  • You can also limit the length of mined patterns. Note that setting maximum pattern length properly can significantly speedup the algorithm.

Tip

I strongly encourage using PyPy instead of CPython to run the script for best performance. In my own experience, it is nearly 10 times faster in average.

About

The shortest yet efficient implementation of the famous sequential pattern mining algorithm PrefixSpan in Python.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%