blip

This is the "Bayesian network Learning Improved Project" (blip), an open-source Java package that offers a wide range of structure learning algorithms. It is developed my Mauro Scanagatta and it is distributed under the LGPL-3 by IDSIA.

It focuses on score-based learning, mainly the BIC and the BDeu score functions, and allows the user to learn BNs from datasets containing thousands of variables. It provides state-of-the-art algortihms for the following tasks: parent set identification ( BIC ), general structure optimization (WINASOBS-ENT), bounded treewidth structure optimization (KMAX) and structure learning on incomplete data sets (SEM-KMAX).

An R binding is also available: (https://github.com/mauro-idsia/r.blip).

References

This package implements the algorithms detailed in the following papers:

Learning Bayesian Networks with Thousands of Variables (NIPS 2015) Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos, Marco Zaffalon
Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables (NIPS 2016) Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos, Marco Zaffalon
Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets (IJAR 2018) - supplementary material
Improved Local Search in Bayesian Networks Structure Learning (AMBN 2017)
Approximated Structural Learning for Large Bayesian Networks (ECML PKDD 2018) supplementary material

Usage

The process of learning a bounded-treewidth BN is explained by using the "child" network as example.

Dataset format

The format for the initial dataset has to be the same as the file "child-5000.dat", namely a space-separated file containing:

* First line: list of variables names, separated by space;
* Second line: list of variables cardinalities, separated by space;
* Following lines: list of values taken by the variables in each datapoint, separated by space.

Parent set identification

The first step is build the parent sets score cache. The state-of-the-art approach is to use BIC* (for the BIC score):

java -jar blip.jar scorer.is -d data/child-5000.dat -j data/child-5000.jkl -t 10 -b 0

Main options:

-d VAL : Datafile input path (.dat format)
-j VAL : Parent set scores output file (.jkl format)
-t N : Maximum time limit, in seconds (default: 10)
-b N : Number of machine cores to use - if 0, all are used (default: 1)

General structure optimization

Given the parent sets score cache, now it is time to learn the structure. The state-of-the-art approach is to use WINASOBS (Windows operator applied to ASOBS) with ENT (entropy-based) ordering:

java -jar blip.jar solver.winasobs.adv -smp ent -d data/child-5000.dat -j data/child-5000.jkl -r data/child.wa.res -t 10 -b 0

Main options:

-smp VAL : Advanced sampler (possible values: std, mi, ent, r_mi, r_ent)
-d VAL : Datafile input path (.dat format)
-j N : Parent set scores input file (.jkl format)
-r VAL : Structure output file (.res format)
-t N : Maximum time limit, in seconds (default: 10)
-b N : Number of machine cores to use - if 0, all are used (default: 1)

Bounded-treewidth structure optimization

Given the parent sets score cache, it is possible to learn a structure under a bounded treewidth constraints. The state-of-the-art approach is to use k-max:

For perfoming with k-max:

java -jar blip.jar solver.kmax -w 4 -j data/child-5000.jkl -r data/child-5000.kmax.res -t 10 -b 0

Main options:

-w N : Maximum treewidth allowed
-j N : Parent set scores input file (.jkl format)
-r VAL : Structure output file (.res format)
-t N : Maximum time limit, in seconds (default: 10)
-b N : Number of machine cores to use - if 0, all are used (default: 1)

Structure learning from incomplete data sets

To learn a structure from data containing missing values the state-of-the-art approach is to use SEM-kMAX:

java -jar blip.jar imputation.sem  -d data/child-5000-missing.dat -o data/child-5000-imputed.dat -r data/child.res -t 1 -tmp data/tmp -w 6 -b 0

Main options:

-d VAL : Datafile (with missing valus) input path (.dat format)
-o VAL : Datafile (with imputed values) output path (.dat format)
-r VAL : Structure output file (.res format)
-t N : Time regulation parameter (default: 1)
-tmp VAL : Temporary directory
-w N : Learning treewidth (default: 6)
-b N : Number of machine cores to use - if 0, all are used (default: 1)

Interpreting the result

The format of the ".res" file is as follows: each line indicates the parent set assigned to each variable and its score.

For example the line "4: -2797.39 (10,17,18)" indicates that to the variable with index 4 in the dataset are assgined as parents the variables with index (10,17,18). This parent set has score -2797.39 (by default the score function is the BIC).

Learn the parameters

Using the structure found it is possible to learn the parameters with:

java -jar blip.jar parle -d data/child-5000.dat -r data/child-5000.kmax.res -n data/child-5000.kmax.uai

Main options:

-d VAL : Datafile input path (.dat format)
-r VAL : Structure input file (.res format)
-n VAL : BN output file (.uai format)

The final output will be a full Bayesian network in UAI format.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
api		api
core		core
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compile-blip.sh		compile-blip.sh
supplementary-IJAR.pdf		supplementary-IJAR.pdf
supplementary-ML17.pdf		supplementary-ML17.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blip

References

Usage

Dataset format

Parent set identification

General structure optimization

Bounded-treewidth structure optimization

Structure learning from incomplete data sets

Interpreting the result

Learn the parameters

About

Releases

Packages

Contributors 3

Languages

License

mauro-idsia/blip

Folders and files

Latest commit

History

Repository files navigation

blip

References

Usage

Dataset format

Parent set identification

General structure optimization

Bounded-treewidth structure optimization

Structure learning from incomplete data sets

Interpreting the result

Learn the parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages