A Java implementation of the LCM (Linear Closed itemsets Miner) algorithm, as proposed by T.Uno & H.Arimura. It is multi-threaded, as proposed by Négrevergne et al., hence the name of its main class: PLCM.
Key features:
- Mines closed frequent itemsets
- Excellent speed-up on multi-core machines
- Provides iterator interfaces to adapt your own items/transactions/itemset collections
- Able to load datasets greater than 4GB (since v1.7)
Reference papers:
- "An efficient algorithm for enumerating closed patterns in transaction databases" by T. Uno, T. Asai, Y. Uchida and H. Arimura, in Discovery Science, 2004
- "Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses" by B. Negrevergne, A. Termier, J-F. Mehaut, and T. Uno in International Conference on High Performance Computing & Simulation, 2010
Please use Maven to build the program.
Download jLCM-cli's JAR and invoke java -jar jLCM-cli-1.7.0-wdeps.jar
to show the complete manual. Note that this program's main
function lives in a separated project.
This tool uses ASCII files as input: each line represents a transaction (using UNIX line terminators). You may find example input files in the FIMI repository, or start with a small one embedded in src/test/resources
like 50retail.dat.
Add the following dependency to your pom.xml
<dependency>
<groupId>fr.liglab.jlcm</groupId>
<artifactId>jLCM</artifactId>
<version>1.7.0</version>
</dependency>
To perform the mining you will have to instanciate an ExplorationStep
, a PatternsCollector
and the main class PLCM
. Depending on how you want to do the I/O you may have to implement your own Iterable<TransactionReader>
(for input) and/or PatternsWriter
(for output).
The main class of jLCM-cli provides an example use of the library.
This work is released under the Apache License 2.0 (see LICENSE).
Copyright 2013,2014,2015,2016 Martin Kirchgessner, Vincent Leroy, Alexandre Termier, Sihem Amer-Yahia, Marie-Christine Rousset, Université Joseph Fourier and CNRS.