Infrared is a generic C++/Python hybrid library for efficient (fixed-parameter tractable) Boltzmann sampling.
Disclaimer and license
Infrared is free software. Note that the system is still in an early stage of development and is likely to still undergo significant changes. It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details [http://www.gnu.org/licenses/].
Main features of Infrared
Infrared provides a fast and flexible C++ engine that evaluates a constraint network consisting of variables, multi-ary functions, and multi-ary constraints. Functions and constraints are C++ or Python objects, where new functions and constraints are easily added in C++ or in Python! The evaluation is performed efficiently using cluster tree elimination following a (hyper-)tree decomposition of the dependencies (due to functions and constraints). Interpreting the evaluations as partition functions, the system supports sampling of variable assignments from the corresponding Boltzmann distribution. Finally, Infrared implements a generic multi-dimensional Boltzmann sampling strategy to target specific feature values. Such functionality is made conveniently available via general Python classes.
While the core library is agnostic to the origin of the tree decomposition, it comes with interfaces to TDlib and libhtd.
On top of the Infrared library, we re-implemented RNARedPrint [https://github.com/yannponty/RNARedPrint], which is shipped together with the library and serves as non-trivial example for the use of the Infrared engine. RNARedprint samples RNA sequences from a Boltzmann distribution based on the energies of multiple target structures and GC-content. Our research paper on RNARedPrint
Stefan Hammer, Yann Ponty, Wei Wang, Sebastian Will. Fixed-Parameter Tractable Sampling for RNA Design with Multiple Target Structures. Proc. of the 22nd Annual International Conference on Research in Computational Molecular Biology, RECOMB'18. Apr 2018, Paris, France. 2018.
describes many of the ideas that lead to the development of Infrared and points to the origins in cluster tree elimination and constraint networks. If you find this software useful for your own work, please do not forget to cite us.
The software can be compiled and installed (after cloning the repository) on GNU/Linux or other Unix-like systems as follows. In preparation, install libhtd https://github.com/mabseher/htd/releases. Moreover, compilation requires boost.graph and boost.python. Compile and install Infrared itself by
cd Infrared ./autoreconf -i ./configure --with-htd=$path_to_htd_installation --prefix=$path_to_infrared_installation make && make install
To use tree decomposition based on the Java tool TDlib, please obtain a copy and set the Java CLASSPATH accordingly.
Generate html documentation of the Infrared API by
Documentation will be generated in Doc/html; find the main page at Doc/html/index.html
For running RNARedprint, one furthermore needs an installation of the ViennaRNA package; make sure that the Python module is found in the path described by the environment variable PYTHONPATH. PYTHONPATH must as well point to the libinfrared module from the Infrared installation, i.e. $path_to_infrared_installation/lib
Redprint reads the multiple RNA target structures from an 'inp'-file, e.g.
....((((((..(((((((....))))((((((......))..))))..((((((....((((...)))).....)))))).))).))))))........ ..............((((((.....(((.(((((..((((..(((((...((......))....)))))..))))..))...)))))).))))))..... ..((((.((.((.........((((((((.((.(((......))).)).))))))))..)))).)))).(((......................)))... ;
A typical call to produce 20 Boltzmann samples with given weights looks like
redprint.py test.inp -n 20 --model bp --gcweight=0.15 --weight 5 --method 0 --turner
Moreover, redprint supports targeting specific gc-content and energies, which is calculated by performing multi-dimensional Boltzmann sampling (mdbs). Here is a example call
time _inst/bin/redprint.py test.inp -n 20 --mdbs --gctarget=70 --gctol 3 \ --tar -15 --tar -20 --tar -20 --tol 1 --gc --turner -v
Further usage information is available by calling
usage: redprint.py [-h] [--method METHOD] [-n NUMBER] [--seed SEED] [-v] [--model MODEL] [--turner] [--gc] [--checkvalid] [--no_red_constrs] [--gcweight GCWEIGHT] [--weight WEIGHT] [--mdbs] [--gctarget GCTARGET] [--target TARGET] [--gctolerance GCTOLERANCE] [--tolerance TOLERANCE] [--plot_td] infile Boltzmann sampling for RNA design with multiple target structures positional arguments: infile Input file optional arguments: -h, --help show this help message and exit --method METHOD Method for tree decomposition (0: use htd; otherwise pass to TDlib as strategy) -n NUMBER, --number NUMBER Number of samples --seed SEED Seed infrared's random number generator (def=auto) -v, --verbose Verbose --model MODEL Energy model used for sampling [bp=base pair model,stack=stacking model] --turner Report Turner energies of the single structures for each sample --gc Report GC contents of the single structures for each sample --checkvalid Check base pair complementarity for each structure and for each sample --no_red_constrs Do not add redundant constraints --gcweight GCWEIGHT GC weight --weight WEIGHT Energy weight (def=1; in case, use last weight for remaining structures) --mdbs Perform multi-dim Boltzmann sampling to aim at targets --gctarget GCTARGET GC target --target TARGET Energy target (def=0; in case, use last target for remaining structures) --gctolerance GCTOLERANCE GC tolerance --tolerance TOLERANCE Energy tolerance (def=1; in case, use last tolerance for remaining structures) --plot_td Plot tree decomposition
A further tool
redprint_complexity.py is provided to report tree
widths for two energy models of different complexity (see our research
paper for full details) and plot the dependency graphs and tree
decompositions. This tool provides insight into the run time / space
consumption behavior of redprint on specific instances.
Infrared architecture and background
The system is build to separate the core "Infrared" from applications like Redprint.
ired --- Infrared core
The core preforms precomputation, i.e. calculation of partition functions, and Boltzmann sampling based on a given, populated cluster tree. A /cluster tree/ of a constraint network (of variables, functions and constraints) corresponds to a (hyper-)tree decomposition of the dependency graph (induced by the functions/constraints on the variables). It consists of bags (aka clusters) of variables, functions, and constraints. The bags are connected by edges to form a tree (or forest), such that the following properties hold
For each variable in the CN, there is one bag that contains the variable.
For each variable, the bags that contain this variable form a subtree.
For each function, there is exactly one bag that contains the function and its variables.
For each constraint, there is at least one bag that contains the function and its variables. Constraints are only assigned to bags that contain all of their variables.
The infrared core (namespace ired) defines the following classes
ired::Assignment an assignment of variables to values ired::AssignmentIterator provides iteration over sub-assignments; the work horse of ired ired::Function<ValueType> defines functions over variables; abstract ired::Constraint defines constraints over variables; abstract ired::ConstraintNetwork implements the constraint network (holds vars, funs, and constrs) ired::Cluster bag of variables, functions, constraints ired::ClusterTree tree of bags, provides evaluation and sampling
By design, the Infrared core is strictly low-level and domain-agnostic, it only knows finite domain variables, constraints, and functions; as well as to evaluate partition functions and sample based on cluster trees for constraint networks. This will allow using the same core for evaluation (even not necessarily of partition functions, e.g. instead optimizing energy/fitness) and sampling in very different domains just by writing domain-specific Python code.
ired::rnadesign --- RNA design specific extension of Infrared
In namespace ired::rnadesign, the system provides domain-specific constraints and functions for rna design, as such it specializes the classes ired::Function and ired::Constraint, e.g.
ired::rnadesign::ComplConstraint complementarity constraint ired::rnadesign::GCControl function for control of GC-content ired::rnadesign::BPEnergy evaluate base pair energy (base pair model) ired::rnadesign::StackEnergy evaluate stacking energy (stacking model)
infrared / libinfrared --- Python modules exposing infrared core and extensions
Using boost::python, we expose classes of ired and ired::rnadesign to
Python, such that cluster trees can be constructed and populated with
exposed C++ constraints/functions and/or derived Python
constraints/functions. Python programs like
redprint.py that want to
use the infrared library, typically import only module infrared, which
provides base classes that can be specialized to generate
application-specific constraint networks and cluster trees for
Infrared. Moreover the module provides classes to generate samples
from a specific Boltzmann distrubition as well as samples targeted at
specific feature values. The latter performed by multi-dimensional
Boltzmann sampling. The module provides access to the Infrared core and
exports the additional base classes
infrared.Feature derived classes represent features like GC-content, or Turner energy infrared.FeatureStatistics gathers statistsics of the feature values for a series of samples infrared.ConstraintNetwork derived to construct the constraint network (variables, constraints, functions) of a problem instance infrared.TreeDecomposition derived to construct and represent the tree decomposition infrared.BoltzmannSampler wraps the Boltzmann sampling functionality infrared.MultiDimensionalBoltzmannSampler additionally implements multi-dimensional Boltzmann sampling
treedecomp.py --- Generation of tree decompositions
This python module provided by infrared supports the computation of tree decompositions based on external libs (interfaces the Java lib TDlib and the C++-lib libhtd; for the latter infrared implements a wrapper (module libhtdwrap), which exposes a specific variant of libthd-tree decomposition to Python.
redprint.py --- Multi-dimensional Boltzmann sampling for multi-target RNA design
Redprint uses the Infrared library to perform multi-dimensional Boltzmann sampling of sequences for given multi-target RNA design instances. For either the base pair or the stacking model, it computes the dependencies, calls the tree decomposition construction, and constructs and populates the cluster tree for Infrared. For this purpose, it specialized base classes of the module infrared. Finally, it generates samples and optionally reports (statistics over) features of the sampled sequences, again with the support of the infrared module.