Probabilistic Generalized Frequent Itemset Mining (PGFIM) in Uncertain Databases
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R_scripts
datasets
Commandline.cpp
Commandline.h
Database.cpp
Database.h
DynamicCalc.cpp
DynamicCalc.h
EnumTree.cpp
EnumTree.h
Global.h
IProbCalc.h
LICENSE
Makefile
Mine.cpp
Mine.h
README
TaxVertex.cpp
TaxVertex.h
Taxonomy.cpp
Taxonomy.h
TreeVertex.cpp
TreeVertex.h
Util.cpp
Util.h
main.cpp
run_test_series_eta.pl
run_test_series_tau.pl
tinystr.cpp
tinystr.h
tinyxml.cpp
tinyxml.h
tinyxmlerror.cpp
tinyxmlparser.cpp

README

IMPORTANT: 
Two pieces of software must be installed to compile PGFIM:
1. The high-precision static library ARPREC, which one can download and install 
from the following website: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
2. The mathematics library Boost, which one can download and install from the
following website: http://www.boost.org/

COMPILING:
To compile on a Linux-based machine, modify the Makefile file, to your environment and
locations of the two software libraries installed above. One can find where the locations 
are set in the Makefile, next to the CXX_CFLAGS and CXX_LDFLAGS variables.

Once your environment is set correctly, simply type 'make' at the command-line
in the directory of the source code, and the executable PGFIM should be created.

RUNNING:
./PGFIM -tau <TAU> -inputDB <INPUT_DB_FILENAME> -inputXML <INPUT_XML_FILENAME> [optional_args]

Sample Execution:
./PGFIM -tau 0.9 -minsup 3000 -inputDB input.data -inputXML taxonomy.xml


Command Line Options:
	-minsup minimum support threshold: [0.0-Size of DB] default: 1
	-tau Tau frequent probability threshold: (0.0-1.0] (Required)
	-inputDB Input DB file name (Required)
    -inputXML Input XML taxonomy (Required)
    -calc Frequentness calculation method: {dynamic} default: dynamic
	-output Output file name
	-exec Execution stats file name
	-print Print found itemsets: {0 | 1} (0 = false, 1 = true) default 0
	
DATABASE FILE & TAXONOMY FORMAT:
Attributes in the database file are to be space-delimited, and transactions/instances 
should be new-line delimited. Currently, only the natural numbers can be used as 
attribute names / items. Following each attribute name is a colon and then the 
probability of that attribute occurring. 

IMPORTANT: The last item and probability of each transaction must be followed by a space.

Example database with 3 transactions:
1:0.9 2:0.23 3:0.6 
2:0.2 4:0.8 
1:0.4 2:0.54 

An XML file is used to represent the taxonomy to be used. See a sample taxonomy in the
datasets directory, as well as, example databases.

CITATION:
Erich A. Peterson and Peiyi Tang. Mining Probabilistic Generalized Frequent Itemsets in Uncertain Databases. In Proceedings of the 51tst ACM Southeast Conference (ACMSE), Savannah, GA, USA, April 2013.