HEMLOCK is a software tool for constructing, evaluating, and applying heterogeneous ensemble data models for use in solving supervised machine learning problems. Specifically, the main class of problems targeted by HEMLOCK is the problem of multiple-class classification (also called labeling or categorization) of data with continuous or discrete features. HEMLOCK consists of various data readers, machine learning algorithms, model combination and comparison routines, evaluation methods for model performance testing, and interfaces to external, state-of-the-art machine learning software libraries.
In order to compile Hemlock, an Ant build file, build.xml is provided.
Provided an Apache Ant framework is installed on the machine, the following
command, executed from the HEMLOCK directory, should completely build the
project: ant.
In order to interface with Weka, weka.jar must be in the class path or in the
HEMLOCK/tpl directory at the time of compilation. You will get a warning
message if weka.jar is not in either of those locations when running the Ant
build file.
The project must be built before using as the executables are not distributed. See the section titled "Compiling" for more information.
In order to interface with Weka, weka.jar must be in the class path or in the
HEMLOCK/tpl directory while running Hemlock. If it is not, then any
experiments that request the use of Weka will not be executed and an error
message will be displayed.
Hemlock can only import one type of data set. It is a modification of the C45
file format. Each data set must have a *.name file and *.data file. The
first line of the *.name file is a space separated list of class labels.
This is followed by an empty line and then one line per attribute where each
line contains continuous for continuous attributes or discrete followed by
a space separated list of possible values. Both the names file and the data
file must have the same name and be placed in a folder with the same name. All
such data directories should be put in a data repository directory. A data
repository directory is nothing more than a directory which only contains data
directories formatted using the format just described. An entry in the
HEMLOCK/.config file must be added to point to the data repository directory
you have created. By default the HEMLOCK/data directory is already setup as
a data repository so data directories can be immediately dropped in that
location for use by Hemlock.
Use runHemlock [inputPath] [outPutPath] to run Hemlock. The two arguments
are required.
inputPath: path for experiment file to be runoutputPathdirectory for result files to be written to