pLCM++

A fast C++ implementation of the LCM (Linear Closed itemsets Miner) algorithm, as proposed by T.Uno & H.Arimura. It is multi-threaded, as proposed by Négrevergne et al.

The pLCM++ implementation started as a port of jLCM to C++. jLCM is a parallel implementation of the LCM algorithm, in java, and was mostly written by Martin Kirchgessner and Vincent Leroy.

Reference papers:

"An efficient algorithm for enumerating closed patterns in transaction databases" by T. Uno, T. Asai, Y. Uchida and H. Arimura, in Discovery Science, 2004
"Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses" by B. Negrevergne, A. Termier, J-F. Mehaut, and T. Uno in International Conference on High Performance Computing & Simulation, 2010

Performance

pLCM++ is usually around 3 times faster than jLCM on common datasets (main improvements over jLCM are described below)
when limited to 1 thread (i.e. using the command line option -t 1), the performance of pLCM++ is usually similar to the one of LCM 2.5, the fastest mono-threaded implementation proposed by T. Uno.

Note: performance must be measured using option -b (benchmark mode), which disables online printing of patterns.

Compiling and running pLCM++

Use make to build:

cd src
make

Run ./pLCM++ --help for usage details.

This tool uses ASCII files as input: each line represents a transaction. You may find example input files in the FIMI repository.

License and copyright owners

This work is released under the Apache License 2.0 (see LICENSE).

Improvements over jLCM

After the initial porting phase, pLCM++ received a set of optimizations. Main ones are:

Use of minimal integer types in the processing part, not only for storage (as it was in jLCM). This allows more optimization opportunities for the compiler.
Improved dataset reduction algorithm: pLCM++ uses hashes in order to find which transactions can be merged together.
Prefer range-based over java-style iterators (i.e. provide start and end of a range instead of an abstract Iterator object with methods hasNext() and next()). This is obviously faster, but it required algorithmic changes in order to provide contiguous ranges when possible.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.settings		.settings
src		src
.cproject		.cproject
.gitignore		.gitignore
.project		.project
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pLCM++

Performance

Compiling and running pLCM++

License and copyright owners

Improvements over jLCM

About

Releases

Packages

Contributors 2

Languages

License

slide-lig/plcmpp

Folders and files

Latest commit

History

Repository files navigation

pLCM++

Performance

Compiling and running pLCM++

License and copyright owners

Improvements over jLCM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages