Budgeted Machine Learning with Bayesian Network Experiment Framework
The Budgeted Machine Learning with Bayesian Network Experiment Framework (BLBN-EF) consists of two main components, in the form of computer programs: (1) the learner and (2) the generator. Both of these computer programs are implemented in the C programming language and make use of the BLBN application programming interface (API), which is also implemented in the C programming language. The BLBN API depends on the Norsys Netica C API for its Bayesian network data structures and algorithms. The BLBN API does not include its own Bayesian network data structure implementation. The BLBN API provides a framework for implementing budgeted machine learning algorithms that relies on the data structures and algorithms provided by the Netica C API whenever possible.
The BLBN-EF and the BLBN API was built to be used in an academic research environment. It has been designed to be simple and extensible, so new algorithms can be easily integrated into the API in a strightforward manner. To effectively use the BLBN-EF and BLBN API, it is crucial to be familiar with the Netica C API, since the BLBN-EF depends on the BLBN API, which in turn depends on the Netica C API.
How to Compile BLBN-EF Programs
The BLBN-EF must be compiled on the Linux operating system. The BLBN-EF was designed to be compatible with the Holland Computing Center's (HCC) Prairiefire cluster, which is made up of nodes that run the Linux operating system.
To compile the learner and generator components of the BLBN-EF, run the
makefile using the
make command in a Linux terminal (such as the Prairiefire
This will invoke the make command with the makefile named
will produce two executables: (1)
blbn_learner and (2)
compile only the learner, run the command:
Likewise, to compile only the generator, run the command:
To inspect the commands that are executed when the makefile is invoked, open
the file named
Makefile in a text editor such as Vim or Emacs.
Alternatively, the following command can be run to manually compile the learner (however, this method should be considered a last resort, and should be used only when the Makefile cannot be used):
/util/comp/gcc/4.4.1/bin/gcc ./lib/NeticaEx.o ./src/blbn/blbn.c \ ./src/blbn_learner.c -o blbn_learner -L"./lib" -lm \ -lnetica -lpthread -lstdc++
Likewise, the following command can be run to manually compile the generator:
/util/comp/gcc/4.4.1/bin/gcc ./lib/NeticaEx.o ./src/blbn_generator.c \ -o blbn_generator -I"./src" -L"./lib" -lm -lnetica \ -lpthread -lstdc++
For instructions on running the compiled
executables on Prairiefire, go to the section named Running Experiments Using
For instructions on how to use the make command visit the make documentation.
For information about accessing the Holland Computing Center Prairiefire cluster or its successors, visit the HCC website.
Running Experiments Using the BLBN-EF
To run an experiment using BLBN-EF, use one of the included run scripts. Scripts are included for running the experiments on a single laptop or desktop (running Linux) with processes that run in a serial or parallel manner, and for submitting jobs to be run as independent processes on the HCC Prairiefire cluster. Because individual processes in experiments can take many hours or days to complete, experiments should be run on the HCC Prairiefire cluster whenever possible.
To run an experiment on the HCC Prairiefire cluster, run a Bash shell script associated with the desired experiment. For example, to run the experiment using the ALARM data set and Bayesian network structure, run the following command on the head node of the Prairiefire cluster:
This Bash shell script contains parameters for the ALARM experiment, including parameters specifying file locations of the data set and Bayesian network structure, learning policy/algorithm, budget value, the directory to which the results of the experiment should be saved, among others. For a complete list of the parameters, open the Bash script in a text editor. An experiments parameters are used by the script to submit jobs to the Prairiefire cluster scheduler, which will schedule each job to run as an independent process on one of the cluster's nodes.
A Bash shell script such as this must be created for each experiment.
Creating and Configuring an Experiment
To create an experiment, a Bayesian network structure file is required. This
file will have either the extension
*.neta. Of these two extensions,
*.dne should be preferred since it is a text file and can be
read and modified easily by humans in a text editor. The extension
a binary file format that is not intended to be viewed or modified in a text
Archiving Experiment Results
Because an experiment produced many results files (i.e., output files) in the
/results/ directory, it may be desirable to periodically compress and archive
all files produced by experiments. The BLBN-EF has a script file that
produces a compressed archive (in tar.gz format) of the
/results/ directory and
all subdirectories that preserves the directory hierarchy and all file
permissions. To execute this script, run the following command:
The produced archive will be named based on the current date
according to the format
YYYY is the current
MM is the current month, and
DD is the current date. For example, an
archive produced on June 30, 2011 would be named
blbn_generator with the following command:
blbn_generator with the following command:
The short answer is that I generated the data sets using the
blbn_generator program with input files that were the parameterized networks from the Netica Net Repository on the Norsys website. I wrote brief instructions about using the
blbn_generator program in the README I wrote for you (which I've also attached to this email). The general usage syntax for this program is as follows:
./blbn_generator -d <data_file_path> -m <model_file_path> -c <case_count> -t <target_node_name>
For example, to run the generator for the BreastCancerWisconsin data set:
./blbn_generator -d ../data/BreastCancerWisconsin/BreastCancerWisconsin.cas -m ../data/BreastCancerWisconsin/BreastCancerWisconsin.dne -c 1000 -t Class
*.neta files to
Here's the command that I used to generate a
*.dne file from a
./blbn_generator -m ./data/Animals/Animals.neta
I have a file located at
./data/Animals/Animals.neta and this command creates the file
This code was used by Yaling Zheng in her doctoral research for her dissertation. Her code is available on GitHub in her blbn_1st and blbn_2nd repositories. Her dissertation, Machine Learning with Incomplete Information is available on DigitalCommons@University of Nebraska - Lincoln.
The citation for Yaling's dissertation is below:
Zheng, Yaling, "Machine learning with incomplete information" (2011). ETD collection for University of Nebraska - Lincoln. AAI3487272. http://digitalcommons.unl.edu/dissertations/AAI3487272