# BioSB Computational Metagenomics Course

### Practicals for metabolic modeling at the community level

For the following practicals we will use python to explore, simulate, and play with metabolic models in a community context.




[cobrapy](https://cobrapy.readthedocs.io/en/latest/index.html#) is one of the best ways to gain "programatic" access to genome-scale metabolic models (GSMMs). 

Before going into advanced applications, we give a very brief introduction to some of its commom operations.

Metabolic reconstructions are commonly stored as .xml files that follow the standard from the Systems Biology Markup Language ([sbml](https://en.wikipedia.org/wiki/SBML#:~:text=The%20Systems%20Biology%20Markup%20Language,community%20of%20users%20and%20developers.)). The models that we are going to use in the tutorial are available in the folder 'sbmlModels'.


In [None]:
import cobra

We demonstrate some of the basic functionalities of cobrapy. After this introduction, you should be able to:
 
1) Load a model object; 
    &nbsp;
2) Manipulate metabolites, reactions, and genes objects;
    &nbsp;
3) Solve the model with linear programing, using flux balance analysis (FBA);
    &nbsp;
4) Change the composition of the external environment.
&nbsp;
&nbsp;

**Note**: in this tutorial we often refer to 'objects' such as 'model object', 'reaction object', etc. These are python classes, which contain a number of specific attributes.

In [None]:
#import cobra and set the solver to the free glpk
import cobra
cobra_config = cobra.Configuration()
cobra_config.solver = 'glpk'


#import other packages 
import numpy as np
import os

#folder with models
modelFolder = '/data/precomputed/sbmlModels'

#get a list with the models that are available in the folder
print(os.listdir(modelFolder))

In [None]:
#create a model object by loading an SBML file
model = cobra.io.read_sbml_model(os.path.join(modelFolder,'acetogen_toy_model.xml' ))

#### Model

The model object is a python class with all the relevant attributes to manipulate a GSMM.

To see all the attributes...

```python

for i in dir(model):
    if i[0]!='_':
        print(i)
```


Beyond metabolites, reactions, and genes that we'll explain with more detail below, some of the relevant attributes to keep in mind are:

- id 
contains the model's id
 `print(model.id)`

- compartments
contains the model's compartments and the 'key' by which they are referred to. For instance 'c' represents that cytoplasm and 'e' the external environment (or periplasm)
`print(model.compartments)`


Notice that model compartment dictionary is only intended for seeing the model compartments.

`model.compartments['c'] = 'Cytoplasm'
print(model.compartments)`

To change add a name to the compartments, use:

`model._compartments['c'] = 'Cytoplasm'
print(model.compartments)`



- medium
contains the information about the concentration of metabolites that are available to the model from the external compartment (media metabolites). More information about this in the section about changing the environment.
`print(model.medium)`


- exchanges

A list with reaction objects for the metabolites that could potentially be exchanged (in or out) by the model. 
`print(model.exchanges)`

Notice that the list contains the actual reaction objects, not their ids.

- sinks

Similar to the exchanges, is a list with the sink reaction objects.
`print(model.sinks)`

#### Metabolites
A list that contains an object for each metabolite in the model.

There are different functions to retrieve the desired metabolite object from the model, according to some query criteria:

`model.metabolites.get_by_id` , `model.metabolites.get_by_any`, `model.metabolites.has_id`

The most important attributes of a metabolite object are: 

- id - allows one to access the model object. 
```
water = model.metabolites.cpd00001_c

```

if the database used contains ids that are not suitable as python variable names (such as the BIGG ids), the metabolite can be accessed with the `get_by_id` function of the `model.metabolites` object.

```
water = model.metabolites.get_by_id('cpd00001_c')
```

- name
`print(water.name)`

- formula
`print(water.formula)`

- elements
`print(water.elements)`

- charge
`print(water.charge)`

- compartment
A metabolite object is specific for a single compartment. If the same metabolite occurs on different compartments, then new objects need to be made for each compartment that it occurs. 

The metabolite ids commonly have a '_c' or '_e' as a suffix, to indicate their compartments.

`print(water.compartment)`

- reactions
list of reaction objects that the metabolite is a reactant or product.

`print(water.reactions)`

#### Reactions
Also an attribute of the model object. Consists of a list containing a reaction object for each of the model's reactions.

`print(len(model.reactions))`

Similar to the metabolite objects, reaction objects may be accessed directly by their ids or with the special functions:

`model.reactions.get_by_id` , `model.reactions.get_by_any`, `model.reactions.has_id`


`atpase = model.reactions.rxn08173`

or

`atpase = model.reactions.get_by_id('rxn08173')`

Some of the most important attributes of the reaction object are:

- name
`print(atpase.name)`

- compartments
while metabolites are present on a single compartments, reactions may occur across compartments.
`print(atpase.compartments)`

- metabolites
A dictionary where the keys are the metabolite objects that participate in the reactions and the values are their stoichiometries. Reactants and products have, respectively, negative and positive values

`print(atpase.metabolites)`

- reactants
A list with the metabolite objects of the reactants.
`print(atpase.reactants)`

- products
A list with the metabolite objects of the products.
`print(atpase.products)`

- build_reaction_string
`print(atpase.build_reaction_string())`

To see the metabolite names:
`print(atpase.build_reaction_string(use_metabolite_names=1))`



- lower_bound
The lowest flux value that is allowed to go through the reaction when solving the model.
`print(atpase.lower_bound)`

- upper_bound
The highest flux value that is allowed to go through the reaction when solving the model.
`print(atpase.upper_bound)`

**note:** the upper and lower bounds control the reaction directionality. 

```python

print(atpase.build_reaction_string(use_metabolite_names=1))
atpase.lower_bound = 0
print(atpase.build_reaction_string(use_metabolite_names=1))

```

They are also the easiest way to 'knockout' a reaction.

```python

atpase.lower_bound = 0
atpase.upper_bound = 0

```

Or to 'force' flux through a reaction (but in many cases would make the model unfeasible)

```python

atpase.lower_bound = 10
atpase.upper_bound = 1000

```

- flux
The flux that goes through a reaction is only accessible once the model is solved. 


`print(atpase.flux)`

##### Types of reactions

We can distinguish reactions based on the compartments of the metabolites


- Internal

Reactions that occur only in the cytosol. E.g.:

```python

fdh = model.reactions.FDH

print(fdh.compartments)

```


- Objective
The objective reaction that is 'maximized' (or minimized) by the model.
`print(model.objective_direction)`

```python

for i in model.reactions:
    if i.objective_coefficient==1:
        print(i.id)`


```

- Transport

Reactions that take extracellular metabolites and transport them into the cell (i.e. convert them into cytosol metabolites)

```python

gluT = model.reactions.rxn05147

print(gluT.build_reaction_string(use_metabolite_names=0))

print(gluT.upper_bound)

print(gluT.lower_bound)

```


Notice that 'cpd00027_e' is converted to 'cpd00027_c'.

- Exchange

Reactions that make a metabolite available in the extracellular space. From the extracellular space it could be taken up by a transporter.

```python

print([i.id for i in model.exchanges])

gluExch = model.reactions.EX_cpd00027_e

print(gluExch.build_reaction_string(use_metabolite_names=1))

print(gluExch.lower_bound)

print(gluExch.upper_bound)

```

Notice that the inward flux is negative. If the flux is positive, it means the metabolite is getting secreted. 


- Sink 

Reactions that eliminate a dead-end metabolite (i.e. a metabolite that is not consumed by any further reaction and is not secreted)

```python

piSk = model.reactions.piSink

piSk.build_reaction_string(use_metabolite_names=1)

print(piSk.lower_bound)

print(piSk.upper_bound)`

```


## Task: Model manipulation 1 (5-10 min)

Make a summary table containing all the reactions in the toy models with the following fields:

- Id

- Name
- Reaction string with metabolite names
- Lower bound
- Upper bound
- Directionality
- Type (internal, objective, transport, sink, exchange)
- SugarFermenter (1 if present in the sugar fermenter, 0 otherwise)
- ButyrateProducer (1 if present in the butyrate producer, 0 otherwise)
- Acetogen (1 if present in the acetogen, 0 otherwise)



#### genes

The model object also contais genes.

```python
#change to the 'full' model

model = cobra.io.read_sbml_model(os.path.join(modelFolder,'acetogen_full_model.xml' ))


print(model.genes)

gene1 = model.genes[0]

print(gene1.name)

print(gene1.id)

print(gene1.reactions)

```
Note that a single gene can explain several reactions and a reaction may be explained by one or more genes. Boolean rules may be applied to explain if the reaction is an isozyme (OR) or part of a complex (AND).

```python
for i in gene1.reactions:
    print(i.gene_reaction_rule)

```

Gapfilled reactions and some others (sink, exchange) do not have genes associated to them:

```python
geneless = [i.id for i in model.reactions if i.genes == frozenset()]

```


```python
#change back to the 'toy' model

model = cobra.io.read_sbml_model(os.path.join(modelFolder,'acetogen_toy_model.xml' ))

```




**note** the toy models do not contain any genes associated to them.


#### solving the model

We can solve the model using FBA. 

```python
obj = model.optimize()

print(obj.objective_value)

print(obj.fluxes)

```

For algorithms that require the objective_value in a fast way, use:

```python
objCoef = model.slim_optimize()

print(objCoef)

```



## Task: model manipulation 2 (10-15 min)

- Find reactions that are essential for the model to produce biomass

- Find reactions that are only essential in pairs

- Repeat the above, but now find reaction and reaction pairs that reduce the model’s biomass by up to an arbitrary threshold (i.e. 10%)

#### Changing the environment

The metabolites are made available to the models through the exchange reactions (see above).

To 'control' the composition, set the lower bound. But take into account that these are constrained-based models. To 'force' the models to consume a defined amount, set the lower and upper bound to this amount (this often leads to unfeasible models)

```python
#provide a rich media
for i in model.exchanges:
    i.lower_bound = -1000

obj = model.optimize()

print(obj.objective_value)


#remove glucose 
model.reactions.EX_cpd00027_e.lower_bound=0

obj = model.optimize()

print(obj.objective_value)

#force a glucose flux

model.reactions.EX_cpd00027_e.lower_bound=-30
model.reactions.EX_cpd00027_e.upper_bound=-30
obj = model.optimize()

print(obj.objective_value)



```

## Task: model manipulation 3 (10-15 min)

- Generate random environment compositions and associate them to the production of biomass by the three toy models

- Compare their growth rate distributions across these random environments (making box plots should be enough)

- Make the same comparison without glucose
