# AEON file formats

In AEON, we try to support as many releavant model formats as possible. At the moment, these are `.bnet`, `.aeon` and `.sbml`, but we'll be happy to add new formats in the future.

In this notebook, we briefly describe the specification of each format and how AEON works with the format.

In [1]:
from biodivine_aeon import *

## `.bnet`

This is a very basic format originally introduced by the [`boolnet` R package](https://cran.r-project.org/web/packages/BoolNet/index.html) (see the `loadNetwork` function for details). 
It only holds the update functions of the network and nothing else.

It mostly aligns with a default representation of a "matrix of named expressions" in R, and thus many tools that are not using R implement some of the edge cases slightly differently.
We try to explain these edge cases below as best as possible.

 1. Any line starting with `#` is a comment.
 2. The `.bnet` file starts with a `targets,factors` header.
 > In theory, the whitespace on this line should not be relevant, but some tools can only read the header if there is no space after `,`. Also, some tools use `Targets,Factors`. AEON should be able to read any of these variants. The note about whitespace also applies to other parts of the file: it should ignore whitespace, but not all tools actually do this properly.

 3. Each line after the header (which is not a comment) describes the update function of one variable. This is again a pair `name,function`.
 4. Each such line can also have a rational *probability* following after the function (for probabilistic networks), but this is not supported in AEON.
 5. Tools have often different requirements for what a valid `name` should look like. Typically, these align with a valid variable name in R (i.e. starts with a letter, and otherwise contains numbers, letters and `_`). AEON can actually parse a wider range of names than this, but will only output such "safe" names by default.
 6. Each function is a "normal" Boolean expression consisting of vairable names, parentheses, and operators `!`/`|`/`&` (see the original documentation).
 7. There are also "special functions" like `all`/`any`/`maj`, but these are not recognized by most tools. As such, they are also not recognized by AEON.
 > Note that the format does not have any explicit `true`/`false` constant. You can circumvent this by replacing constants with tautoligies/contradictions. In AEON, we do support `true`/`false` as well as `0`/`1` when parsing, but this is not generally available in all tools.

 8. Some tools allow you to omit the update functions for constant input variables (i.e. identity function is assumed by default). We parse such variables as if not having an update function.
 > Note that this only works reasonably well with the input variables, since these don't have any incoming regulators. For any other variable, it is not clear what are its regulators unless the update function is specified.

 9. AEON infers the regulatory graph of the network based on the syntax of each update function (i.e. all variables that appear in the function are considered regulators). Each detected regulation is marked as observable, and the monotonicity is set to unknown for all regulations.

 10. When outputting a `.bnet` model, AEON will:
     
     - Use `targets,factors` as the header.
     - Use whitespace around `,`/`&`/`|` and introduce parentheses around every binary operator.
     - Fail if the model contains any uninterpreted functions (explicit parameters) or missing update functions (implicit parameters), except for input nodes (see above).
     - Fail if the network contains variable names that are not valid identifiers in R. You can override this and automatically rename all such variables.
     - Each constant is converted to a tautology/contradiction using the target variable of the update function (i.e. through a self-regulation).
     - Unsupported Boolean operators are automatically converted to `!`/`|`/`&`.



In [2]:
bn = BooleanNetwork.from_bnet("""
targets,factors
a, a | b
b, a & !(c | d)
c, b | !a
# d is an unspecified constant input
""")
print(bn.to_bnet())

targets,factors
a, (a | b)
b, (a & !(c | d))
c, (b | !a)



## `.aeon`

AEON format is, in a sense, an extension of `.bnet` which allows to (a) specify the regulatory graph explicitly outside of the update functions, and (b) use explicit and implicit parameters (uninterpreted functions) in the model.

 1. Everything following a `#` is a comment. Some comments can be used to store metadata in the form of annotations (see `ModelAnnotation` class). However, annotations are not a "semantic" part of the model and we thus skip them in this tutorial.
 2. Every other line either specifies a regulation, or an update function. The order can be arbitrary.
 3. A regulation is given as `variable edge variable`, such that `edge` is one of `->`,`-|`,`-?`,`->?`,`-|?`,`-??`. Here, `>` signifies activation (positive monotonicity), `|` signifies inhibition (negative monotonicity), and `?` signifies unknown monotonicity. An additional `?` denotes that the edge may not be observable (i.e. the regulator does not need to have any impact on the target).
  - For example, `A -> B` is an observable positive regulation.
  - `A -|? B` is a negative regulation that does not have to be observable.
  - `A -?? B` is the most "generic" regulation with unknown monotonicity and no observability requirement.
    
 4. Only regulations that are explicitly included in the model are considered. It is an error to use a variable in an update function where said variable is not a regulator.
 5. A variable name in `.aeon` can be any string of alphanumeric characters plus `_`.
 6. An update function for variable `name` is specified on a single line as `$name: function`.
 7. Here, `function` is again a (normal) Boolean expression, just as in the case of `.bnet`. However, it supports `true`/`false` and `1`/`0` as constants, as well as extra Boolean operators `<=>` (equivalence), `^` (xor), and `=>` (implication). The operator priority is `<=>`, `=>`, `|`, `&`, and `^`.
 8. The function can also contain a call to an uninterpreted (i.e. unknown but fixed) function. These are the "explicit parameters" of the model.
 9. The names of the uninterpreted functions have the same structural constraints as variable names and must not clash with the variable names. Otherwise, an uninterpreted function does not have to be "declared" in any specific way.
 10. At the moment, only variables can be used as arguments of uninterpreted functions. In the future, AEON should support arbitrary expressions as function arguments.
 11. If a variable does not have an update function set in the file, this update function is considered to be an "implicit parameter" and its signature is reconstructed from the list of regulations.
 12. During parsing, AEON verifies that the inputs of the update functions align with the regulators in the regulatory graph. I.e. every variable used in an update function must have a regulation in the regulatory graph. However, it does not check the observability/monotonicity requirements (this can be later checked by `SymbolicAsyncGraph`).

 > Note that in the current structure, a variable with no incoming or outgoing regulations and no update function cannot be represented in any way. While having such variable is essentially always an error in a real-world model, it is possible to produce such models during testing, so beware!
 

In [3]:
bn = BooleanNetwork.from_aeon("""
a -> b
b -| c
c ->? d
d -?? d
# a does not depend on anything, hence it must be a constant.
$a: true
# b depends on a positively
$b: a
# c depend on b negatively
$c: !b
# d has unknown behaviour described by function f, but once activated, stays active
$d: d | f(c, d)
""")
print(bn.to_aeon())

a -> b
b -| c
c ->? d
d -?? d
$a: true
$b: a
$c: !b
$d: (d | f(c, d))



## `.sbml`

This is a format well known in systems biology, but it is also rather complicated as it encodes the whole model (including expressions) into XML. 

We will not go into the details of SBML, or SBML-qual (the extension used to represent Boolean networks). Extensive documentation is available on the [official website](https://sbml.org/documents/specifications/).

However, we should note that AEON actually extends SBML in a few minor ways:
 - A transition `<input>` can have an attribute `essential`, which corresponds to the observability constraint in the `.aeon` format. Monotonicity is already supported by SBML. This should be fully transparent to other tools working with SBML.
 - For variables with "implicit" (i.e. missing) update function, we do not output any `<transition>`. This is technically valid SBML, but some tools may not be able to read it if they expect all variables to have exactly one update function.
 - The update functions can contain uninterpreted functions, just as in the `.aeon` format. We use the `<csymbol>` tag from MathML to denote such uninterpreted functions within the `<apply>` tag (search for `<csymbol>` the example below). This is not supported by other SBML tools at the moment, but should be still compatible with any tool that does not parse the update functions (e.g. if it only reads the metadata or the network structure). Furthermore, any model with no explicit parameters is thus "standard" SBML because it contains no uninterpreted functions.

In [4]:
print(bn.to_sbml())

<?xml version='1.0' encoding='UTF-8' standalone='no'?><sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" layout:required="false" level="3" qual:required="true" xmlns:layout="http://www.sbml.org/sbml/level3/version1/layout/version1" version="1" xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1"><model><qual:listOfQualitativeSpecies xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1"><qual:qualitativeSpecies qual:maxLevel="1" qual:constant="false" qual:name="a" qual:id="a"/><qual:qualitativeSpecies qual:maxLevel="1" qual:constant="false" qual:name="b" qual:id="b"/><qual:qualitativeSpecies qual:maxLevel="1" qual:constant="false" qual:name="c" qual:id="c"/><qual:qualitativeSpecies qual:maxLevel="1" qual:constant="false" qual:name="d" qual:id="d"/></qual:listOfQualitativeSpecies><qual:listOfTransitions xmlns:qual="http://www.sbml.org/sbml/level3/version1/qual/version1"><qual:transition qual:id="tr_a"><qual:listOfInputs></qual:listOfInputs><qual:list

## Inference of the regulatory graph

Note that you can use the following method to infer the "best" regulatory graph of any Boolean network. This is particularly useful for `.bnet` models, since the observability and monotonicity of each regulation is not specified in the model. As such, this method will essentially "add" these values to the model. This can be also useful for other model formats if the model has been modified or if it contains inconsistent information.

In [5]:
bn = BooleanNetwork.from_bnet("""
targets,factors
a, a | b
b, a & !(c | d)
c, b | !a
# d is an unspecified constant input
""")
# Notice that the regulations have no monotonicity and are all observable.
print(bn.to_aeon())

a -? a
b -? a
a -? b
c -? b
d -? b
a -? c
b -? c
$a: (a | b)
$b: (a & !(c | d))
$c: (b | !a)



In [6]:
bn = bn.infer_regulatory_graph()
# Now, the strictest possible monotonicity is automatically inferred, and regulations
# that are not observable are no longer present (no such regulations were present in 
# the model in our example).
print(bn.to_aeon())

a -> a
b -> a
a -> b
c -| b
d -| b
a -| c
b -> c
$a: (a | b)
$b: (a & !(c | d))
$c: (b | !a)

