# pySBML

`pySBML` is a library to parse SBML models into native, type-annotated Python types and transform ODE models into a simpler representation.  

In [None]:
from pathlib import Path

import pysbml

## Main routine

The main feature of pySBML is to read SBML models and then transform them into a simpler representation that directly can be interpreted as a system of ordinary differential equations.  

For a one-line solution, you can use the `load_and_transform_model` function.

This supports both `Path` and `str` arguments, although the `pathlib.Path` solution is always preferred to support cross-platform scripts. 

Note that we defined a `_repr_markdown_` method for nice markdown display of a model in jupyter notebooks

In [None]:
model = pysbml.load_and_transform_model(Path("assets") / "00462.xml")
model

We also supply a `codegen` function to directly transform your model into a Python module that you can execute.  

In [None]:
from pysbml.codegen import codegen

print(codegen(model))

## Step by step

If you want to inspect every step of the process, you can.  
In this case, we start by loading the entire SBML document, which contains plugin information and the actual model.



### Step 1: loading the model

Using the `load_document` function, we parse the model into native Python types without further modifications.  

All SBML constructs as well as the mathml data is represented in a modern way, using type-annotated dataclasses.  
You can find these in `pysbml.parse.data` and `pysbml.parse.mathml` respectively.  

This representation will make it a lot easier to keep all variants in mind.

For example, the `Reaction` class can contain locally defined parameters as well as stoichiometries which either map a variable directly to a factor **or** a tuple of factor and species reference.
This is encoded as follows

```python
@dataclass(kw_only=True, slots=True)
class Reaction:
    body: Base
    stoichiometry: Mapping[str, float | list[tuple[float, str]]]
    args: list[Symbol]
    local_pars: dict[str, Parameter] = field(default_factory=dict)
```

No untyped `model.getListOfReactions()` methods, just data. Simple and efficient.

In [None]:
from pysbml import load_document

doc = load_document(Path("assets") / "00462.xml")
doc.model

### Step 2: transforming the model

As you can see above, the SBML standard contains a lot of different flags and options for what e.g. a Variable is supposed to **mean**.  

This includes whether the variable is an amount, a concentration, constant, is to be interpreted as an amount (`only_substrate_units`), has a boundary condition, lives in a constant or dynamic comparment and so on.

To us that representation is too complex.  
We want something simpler.  
Using the `transform` method, we can represent the model using just the data below.   

```python
type Expr = sympy.Symbol | sympy.Float | sympy.Expr
type Stoichiometry = dict[str, Expr]

class Parameter:
    value: sympy.Float
    unit: Quantity | None

class Variable:
    value: sympy.Float
    unit: Quantity | None

class Reaction:
    expr: sympy.Expr
    stoichiometry: Stoichiometry

class Model:
    name: str
    units: dict[str, Quantity] = field(default_factory=dict)
    functions: dict[str, Expr] = field(default_factory=dict)
    parameters: dict[str, Parameter] = field(default_factory=dict)
    variables: dict[str, Variable] = field(default_factory=dict)
    derived: dict[str, Expr] = field(default_factory=dict)
    reactions: dict[str, Reaction] = field(default_factory=dict)
    initial_assignments: dict[str, Expr] = field(default_factory=dict)
```

Parameters are always constant, variables always change.  
No special handling of compartments, no locally defined parameters.  

Note that we also transformed the MathML classes into sympy expressions for easier manipulation.  

In [None]:
from pysbml.transform import transform

model = transform(doc)
model

In [None]:
print(model._repr_markdown_())

### Step 3: codegen

As above, you can use our `codegen` function to directly generate a model.  

In [None]:
print(codegen(model))

If you have a library yourself and want to just use our transformed model to create your own code, great!  
We do the same at [MxlPy](github.com/Computational-Biology-Aachen/MxlPy).  

A few pointers for that to work seamlessly:

1. Derived values are stored as dictionaries internally. Depending on how you set up your models, you will need to **sort** these such that they are called in the right sequence (as they might depend on each other). Since this is essentially a dependency resolution problem, we implemented a topological sort for this. Take a look at `pysbml.codegen._sort_dependencies` for inspiration how to do this
2. Initial assignments have the same issue. Since they can depend on derived values, we recommend sorting twice: once with the initial ones and once without
3. It is legal SBML to have an ODE model without variables or ODEs. Be aware that your inputs and outputs might be empty