Fast, scalable, parallel and distributed inference of very large networks by Bayesian Model Averaging
The core approach used by fastBMA is based on ScanBMA by William Chad Young, Adrian E Raftery and Ka Yee Yeung . For the R implementation, Kaiyuan Shi wrote and tested the Rcpp glue code. In addition to the code in this repository, LHH wrote the configuration scripts for the R package. Daniel Kristiyanto wrote and tested the initial version of the Dockerfile. This work was supported by the National Institutes of Health U54HL127624 to KaYee Yeung, R01HD054511 to Adrian Raftery, R01HD070936 to Adrian Rafftery, and the Microsoft Azure for Research Award.
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a novel and computationally efficient method for eliminating redundant indirect edges in the network. fastBMA is orders of magnitude faster than existing fast methods such as ScanBMA and LASSO. fastBMA also has a much smaller memory footprint and produces more accurate and compact networks. A 100 gene network is obtained in 0.1 seconds and a complete 10,000-gene regulation network can be obtained in a matter of hours.
We evaluated the performance of fastBMA on synthetic data and real genome-wide yeast and human datasets. When using a single CPU core, fastBMA is 30 times faster than ScanBMA and up to 100 times faster than LASSO with increased accuracy. The new transitive reduction algorithm is fast and increases the accuracy of the most confidently predicted edges. fastBMA is memory efficient and can be run on multiple instances for the increased speed nec-essary for genome-wide analyses.
fastBMA is available as a standalone function or as part of the networkBMA R package. The binaries are also distributed in portable software containers, for reproducible deployment on Linux/Mac/Windows machines, cloud instances and clusters. The source code is open source (M.I.T. license).
The compilation is relatively straightforward for Linux and MacOS and should work with MinGW with some minor modifications to the Makefile. Note that for old processors (pre 2012) it may be necessary to remove the mavx flag. For very old processors (pre 2005) it may be necessary to change the msse3 flag to msse2 or remove it altogether. If you have an old 32 bit processor you will need to remove the m64 flag
However, it is much easier to use the Docker container especially if you want to set up a distributed cloud network to run fastBMA. Even simpler is the R package. However the R version lacks some minor features and does not use OpenBLAS or MPI. The Docker version is compiled with the mavx and m64 flags so it does require a newer cpu (Intel ivy-bridge or newer (2011), AMD Bulldozer or newer (2012)). If you wish to use the Docker container with older cpus, you will need to modify the Makefile and then build the container using the provided Dockerfile rather than obtaining it from DockerHub
fastBMA uses OpenBLAS and mpich/boost-mpi(http://www.boost.org/doc/libs/1_60_0/doc/html/mpi.html). mpich2 can be installed as a package using apt-get/yum/dnf/brew. However, Boost must be compiled from source if MPI is to be used. Compilation instructions for boost/boost-mpi can be found here.
Once the necessary libraries are installed, clone the repository or download the zip file and extract the contents and change into the CD. There are two Makefiles, one for a typical Fedora installation and one for a Ubuntu installation. The non-MPI installation has also been tested on MacOS-Yosemite and it probably will compile under MinGW on Windows with minor changes to the Makefile. Once in the src directory.
cp Makefile.Ubuntu Makefile
make clean; make <FLAGS>
If MPI is desired the should include USEMPI=1. For MACOS should include MACOS=1
After compilation you can run the provided test scripts runFastBMA.sh and runfastBMAMPI.sh which should infer a network from a 100 variable time series. This should take less than a second.
A dockerFile is included starting with from an Ubuntu image with OpenBLAS. Unfortunately, for the version of MPI used, mpich2, the boost libraries must be compiled from source in order for boost-mpi to work properly. So it may take a while to generate the initial image.
fastBMA has been incorporated into the networkBMA package. However, the package is in beta and can only be installed on Linux systems. A demo is available here This is due to the requirement for OpenBLAS which is difficult to provide in R without recompiling R on Windows. We plan to provide a version without OpenBLAS for Bioconductor in the future. There is also no MPI support for the R version but multithreading is available through openMP. ##Sample usage# Sample usage is provided in the two shell scripts, one for MPI and one for OpenMP only. fastBMA is very customizable with a myriad of flags. To get a list of flags and a summary of what they do type
fastBMA --help
fastBMA has been extensively tested with time series data. Examples of the data files are provided in the package.
fastBMA takes as input simple tab delimited text files such as those generated by spreadsheets like Excel. A header row is optional. The --noHeader flag is provided if no header row is present. The gene names are obtained from the header. If it is missing - numerical gene names are assigned
fastBMA is very effective with time series. Each variable is fitted to the values in the previous time point allowing for detection of self regulation. The format is .. For example:
"Name" "replicate" "time" "YBL103C" "YKL112W"
"BY4716.0" "1" "0" 9.531639971 11.32400726
"BY4716.10" "1" "10" 9.587839320 11.11526454
"BY4716.20" "1" "20" 9.948749983 10.69329399
"RM11.1a.0" "2" "0" 9.62716879 10.94404074
"RM11.1a.10" "2" "10" 10.35317124 10.41511953
"RM11.1a.20" "2" "20" 10.41215922 10.42349600
For time series there is also the option of using residuals with the --useResiduals flag. In this case the time values residuals are calculated as described here to reduce the influence of self-correlation
fastBMA can also be used with non-time series data. This is the same format as the timeSeries data with the time field omitted. In this case each variable is fitted to the values of all the other variables.
Priors can be read as a matrix (flag -p or --priorsMatrix) or as a list (flag --priorsList. Matrix format has a mandatory header row with the N tab delimited variable names followed by N rows with the variable Name and columns with the prior probability of that variable being regulated by variable 1 to variable N. For example
Gene1 Gene2 Gene3
Gene1 1.0 0.5 0.4
Gene2 0.3 0.8 0.6
Gene3 0.1 0.2 0.7
The equivalent list format is regulatingGeneName regulatedGeneName PriorProbability. For example
Gene1 Gene1 1.0
Gene2 Gene1 0.5
Gene3 Gene1 0.4
Gene1 Gene1 0.3
Gene2 Gene1 0.8
Gene3 Gene1 0.6
Gene1 Gene1 0.1
Gene2 Gene1 0.2
Gene3 Gene1 0.7
fastBMA can be used to postprocess networks inferred by other methods to remove edges due to indirect interactions. This is done by using the -e or --edgeList flag to read in an edge list. The format of the edgelist is the similar as for a priorsList i.e. on each tab delimited line line
regulatingGeneName regulatedGeneName weight
Weights must be between 0 and 1.
MPI jobs are run using mpiexec or mpirun. Documentation on running MPI apps can be found here OpenMP can be used at the same time by using the -n flag to set the number of cores used. For some reason, even for single machines, MPI is considerably more efficient that OpenMP for managing separate fastBMA threads. This is despite trying several different approaches to improve OpenMP performance.
A demo is available here. ##Algorithm Documentation# There are 4 major algorithmic improvements that increase the speed, scalability and accuracy of fastBMA relative to its predecessor ScanBMA
- Parallel and distributed implementation
- Faster regression by updating previous solutions
- Probabilistic hashing
- Post-processing with transitive reduction
These are described in detail in an upcoming paper.
Graphs of the overall accuracy of networks as a function of running time on the DREAM4 simulated and yeast time series data. The area under the receiver operating character curve (AUROC) and area under the precision recall curve (AUPR) of networks inferred from the DREAM4 dataset by fastBMA (no post-processing), ScanBMA and LASSO are plotted against the running times. The different points represent fastBMA and ScanBMA with increasingly wider searches as determined by the odds ratio (OR) parameter (OR=100,1000,10000). LASSO does not have an equivalent parameter and was run with the default settings. For the yeast tests, prior probabilities of regulatory relationships (informative priors) were obtained using external data sources as described in Lo et al. For all methods not using priors (including LASSO) variables were ordered by their absolute correlation to the response variable. For the ScanBMA yeast tests the search space was restricted to the 100 variables with the highest prior probabilities. fastBMA was run with a search space of 100 variables using 1 core and all 3556 variables using 8 cores, with and without the Lo et al. prior probabilities. All tests were conducted on an A10 Microsoft Azure cloud instance, which is an Intel Xeon CPU with 8 cores and 56 GB of RAM.
The precision-recall curves were plotted for the networks inferred from the yeast (3556 variables) time series expression data using LASSO, LASSO+ fastBMA post-processing, fastBMA and fastBMA with informative prior. Precision-recall curves for post-processed fastBMA are almost identical to curves for un-processed fastBMA and are not plotted. In the upper left, the precision-recall curves for fastBMA are above that for LASSO, indicating the increased accuracy of fastBMA for predictions assigned higher confidences. Post-processing by transitive reduction also increases the accuracy of high confidence predictions for LASSO.