Skip to content
The source code repository for the FactorBase system
Branch: master
Clone or download
Latest commit 8834905 May 23, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
code Merge pull request #115 from sfu-cl-lab/table-memory-bump May 23, 2019
documentation
examples Moved sample input and output files into the examples directory. Sep 10, 2018
images Update the config.cfg sample image. Oct 16, 2018
obsolete
travis-resources Merge pull request #115 from sfu-cl-lab/table-memory-bump May 23, 2019
.gitignore Removing many temporary scripts from version control. May 16, 2018
.travis.yml Updated the Travis build to do additional validation. Sep 20, 2018
README.md Update the readme. Oct 16, 2018

README.md

FactorBase: Learning Graphical Models from multi-relational data

Build Status
The source code repository for the FactorBase system. The code in this repository implements the learn-and-join algorithm (see algorithm paper on ''Learning Graphical Models for Relational Data via Lattice Search'').

  • Input: A relational schema hosted on a MySQL server.

  • Output: A Bayesian network that shows probabilistic dependencies between the relationships and attributes represented in the database. Both network structure and parameters are computed by the system.

Contingency Table Generator

One of the key computational problems in relational learning and inference is to compute how many times a conjunctive condition is instantiated in a relational structure. FactorBase computes relational contingency tables, which store for a given set of first-order terms/predicates how many times different value combinations of the terms are instantiated in the input database. Given the general importance of this problem in pretty much any relational data problem, we provide stand-alone code for computing contingency tables that can be used independently of our Bayesian network learning system.

Further Information

  • Our project website contains various helpful information such as pointers to datasets, a gallery of learned models, and system tips.
  • The tutorial explains the concepts underlying the algorithm and our other tools
  • Our system paper titled '' FactorBase: Multi-Relational Model Learning with SQL All The Way'' explains the system components.

How to Use - Overview

  1. Import data into a Mysql server

    We provide two sets of example datasets in testsql folder. These are .sql files for:

    • Mutagenesis
    • Uneilwin : Dataset about the following schema University Schema
  2. Install the program

    First clone the project by running a command similar to the following:

    git clone https://github.com/sfu-cl-lab/FactorBase.git

    FactorBase and other tools in the project can all be built using the following command (make sure to have Maven installed):

    cd FactorBase/code
    mvn install

    After the above commands are successfully run, an executable JAR file for FactorBase can be found at:

    factorbase/target/factorbase-<version>-SNAPSHOT.jar

    Where the <version> field is the version of FactorBase that you have generated.

  3. Update config.cfg with your own configuration according to format explained here

  4. Point to the required database in your MySQL server

    Modify travis-resources/config.cfg with your own configuration according to the sample format explained in the image.

    Sample Configuration.

    See our project website for an explanation of the options.

    For the last row, you can set the global logger to this threee levels:

    • debug: show all log messages;
    • info: only show info, warning and error messages(no debug message), which is the default;
    • off: show no log message;
  5. Learn a Bayesian Network Structure

    In the FactorBase folder, run

    java -jar factorbase/target/factorbase-<version>-SNAPSHOT.jar

    Where the <version> field is the version of FactorBase that you have generated.

    Note: For big databases, you may need to specify larger java heap size by

    java -jar -Xmx8G factorbase/target/factorbase-<version>-SNAPSHOT.jar

    By default the executable JAR file will look for the configuration file in the current directory (i.e. where you are running the command), if you would like to specify a different configuration file to use when running FactorBase you can use the parameter -Dconfig=<config-file>. For example:

    java -Dconfig=../travis-resources/config.cfg -jar factorbase/target/factorbase-<version>-SNAPSHOT.jar
  6. Inspect the Bayesian Network (BN)

    We follow the BayesStore design philosphy where statistical objects are treated as managed within the database.

    1. The network structure is stored in the table Final_Path_BayesNets of the <db>_BN database where <db> is the model database specified in your configuration file.
    2. The conditional probability tables are stored in tables named <nodename>_CP of the <db>_BN database where <db> is the model database specified in your configuration file and <nodename> is the name of the child node.

===============

Other Output Formats: BIF, MLN, ETL

The learned BN structure can be exported from the database to support a number of other applications.


Other Applications (May Be Under Construction)

After running the learn-and-join algorithm, the learned Bayesian network can be leveraged for various applications.

You can’t perform that action at this time.