Skip to content

mauro-idsia/blip

Repository files navigation

blip

This is the "Bayesian network Learning Improved Project" (blip), an open-source Java package that offers a wide range of structure learning algorithms. It is developed my Mauro Scanagatta and it is distributed under the LGPL-3 by IDSIA.

It focuses on score-based learning, mainly the BIC and the BDeu score functions, and allows the user to learn BNs from datasets containing thousands of variables. It provides state-of-the-art algortihms for the following tasks: parent set identification ( BIC ), general structure optimization (WINASOBS-ENT), bounded treewidth structure optimization (KMAX) and structure learning on incomplete data sets (SEM-KMAX).

An R binding is also available: (https://github.com/mauro-idsia/r.blip).

References

This package implements the algorithms detailed in the following papers:

Usage

The process of learning a bounded-treewidth BN is explained by using the "child" network as example.

Dataset format

The format for the initial dataset has to be the same as the file "child-5000.dat", namely a space-separated file containing:

* First line: list of variables names, separated by space;
* Second line: list of variables cardinalities, separated by space;
* Following lines: list of values taken by the variables in each datapoint, separated by space.

Parent set identification

The first step is build the parent sets score cache. The state-of-the-art approach is to use BIC* (for the BIC score):

java -jar blip.jar scorer.is -d data/child-5000.dat -j data/child-5000.jkl -t 10 -b 0 

Main options:

  • -d VAL : Datafile input path (.dat format)
  • -j VAL : Parent set scores output file (.jkl format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

General structure optimization

Given the parent sets score cache, now it is time to learn the structure. The state-of-the-art approach is to use WINASOBS (Windows operator applied to ASOBS) with ENT (entropy-based) ordering:

java -jar blip.jar solver.winasobs.adv -smp ent -d data/child-5000.dat -j data/child-5000.jkl -r data/child.wa.res -t 10 -b 0

Main options:

  • -smp VAL : Advanced sampler (possible values: std, mi, ent, r_mi, r_ent)
  • -d VAL : Datafile input path (.dat format)
  • -j N : Parent set scores input file (.jkl format)
  • -r VAL : Structure output file (.res format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Bounded-treewidth structure optimization

Given the parent sets score cache, it is possible to learn a structure under a bounded treewidth constraints. The state-of-the-art approach is to use k-max:

For perfoming with k-max:

java -jar blip.jar solver.kmax -w 4 -j data/child-5000.jkl -r data/child-5000.kmax.res -t 10 -b 0

Main options:

  • -w N : Maximum treewidth allowed
  • -j N : Parent set scores input file (.jkl format)
  • -r VAL : Structure output file (.res format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Structure learning from incomplete data sets

To learn a structure from data containing missing values the state-of-the-art approach is to use SEM-kMAX:

java -jar blip.jar imputation.sem  -d data/child-5000-missing.dat -o data/child-5000-imputed.dat -r data/child.res -t 1 -tmp data/tmp -w 6 -b 0

Main options:

  • -d VAL : Datafile (with missing valus) input path (.dat format)
  • -o VAL : Datafile (with imputed values) output path (.dat format)
  • -r VAL : Structure output file (.res format)
  • -t N : Time regulation parameter (default: 1)
  • -tmp VAL : Temporary directory
  • -w N : Learning treewidth (default: 6)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Interpreting the result

The format of the ".res" file is as follows: each line indicates the parent set assigned to each variable and its score.

For example the line "4: -2797.39 (10,17,18)" indicates that to the variable with index 4 in the dataset are assgined as parents the variables with index (10,17,18). This parent set has score -2797.39 (by default the score function is the BIC).

Learn the parameters

Using the structure found it is possible to learn the parameters with:

java -jar blip.jar parle -d data/child-5000.dat -r data/child-5000.kmax.res -n data/child-5000.kmax.uai

Main options:

  • -d VAL : Datafile input path (.dat format)
  • -r VAL : Structure input file (.res format)
  • -n VAL : BN output file (.uai format)

The final output will be a full Bayesian network in UAI format.

About

Bayesian network Learning Improved Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages