Skip to content

A fast and multi-threaded implementation of the LCM algorithm.

License

Notifications You must be signed in to change notification settings

slide-lig/plcmpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pLCM++

A fast C++ implementation of the LCM (Linear Closed itemsets Miner) algorithm, as proposed by T.Uno & H.Arimura. It is multi-threaded, as proposed by Négrevergne et al.

The pLCM++ implementation started as a port of jLCM to C++. jLCM is a parallel implementation of the LCM algorithm, in java, and was mostly written by Martin Kirchgessner and Vincent Leroy.

Reference papers:

  • "An efficient algorithm for enumerating closed patterns in transaction databases" by T. Uno, T. Asai, Y. Uchida and H. Arimura, in Discovery Science, 2004
  • "Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses" by B. Negrevergne, A. Termier, J-F. Mehaut, and T. Uno in International Conference on High Performance Computing & Simulation, 2010

Performance

  • pLCM++ is usually around 3 times faster than jLCM on common datasets (main improvements over jLCM are described below)
  • when limited to 1 thread (i.e. using the command line option -t 1), the performance of pLCM++ is usually similar to the one of LCM 2.5, the fastest mono-threaded implementation proposed by T. Uno.

Note: performance must be measured using option -b (benchmark mode), which disables online printing of patterns.

Compiling and running pLCM++

Use make to build:

cd src
make

Run ./pLCM++ --help for usage details.

This tool uses ASCII files as input: each line represents a transaction. You may find example input files in the FIMI repository.

License and copyright owners

This work is released under the Apache License 2.0 (see LICENSE).

Copyright 2013 Etienne Dublé, Martin Kirchgessner, Vincent Leroy, Alexandre Termier, CNRS and Université Joseph Fourier.

Improvements over jLCM

After the initial porting phase, pLCM++ received a set of optimizations. Main ones are:

  • Use of minimal integer types in the processing part, not only for storage (as it was in jLCM). This allows more optimization opportunities for the compiler.
  • Improved dataset reduction algorithm: pLCM++ uses hashes in order to find which transactions can be merged together.
  • Prefer range-based over java-style iterators (i.e. provide start and end of a range instead of an abstract Iterator object with methods hasNext() and next()). This is obviously faster, but it required algorithmic changes in order to provide contiguous ranges when possible.

About

A fast and multi-threaded implementation of the LCM algorithm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages