# **Introduction to COBRApy**

## Prepare your environment

> **NOTE**
> 
> This section is necessary only when you are running this notebook on GoogleCollab! 
> If you are running locally, then you should already have a running `conda` environment as described [here](./preparingYourEnvironment.ipynb).

Again, only for the case you are running on Google Collab, make the `Antony2025` folder of the repo your working directory. 

In [None]:
import os
os.chdir("/content/metabolic_toy_model/Antony2025")

## Our three gut species example case

![three species](./files/figs/multistabilityPaper.png)

We will first use three toy models for three human gut species that occupy important ecological niches in the gut and are known to metabolically interact with one another as in the following diagram:

These ecological niches are based on the metabolism of the following three species:

![test](../files/figs/primeferm_but_acet.png)

## Building a metabolic toy model 

### The openCOBRA project

First, load all the necessary libraries for this section. 

In [None]:
import pandas as pd

We will now consider a set of core-metabolism related reactions to build a toy-model, meaning a model that is not representing the actual metabolism of a living species,
but it will be an easy way for us to "jam" a little :sunglasses: 

After this section, you will (have to) be able to:

* parse the metabolites of a `cobra` model 
* parse the reactions of a `cobra` model 
* view and edit the medium of a `cobra` model 
* reach the objective function of the model and get its optimal solution -- we will discuss further this part and its background on [tomorrow's afternoon session](./computationalMethods.ipynb)

But before anything further, **what is `cobra`**? :thinking: \U+1F914

[openCOBRA](https://opencobra.github.io) is an open-source, community-developed code base for COnstraint-Based Reconstruction and Analysis.

Its name is also indicative of our workshop's program: 

* **Reconstruction:** 
  
  how can we represent a species' metabolism in terms of a programming object? How can we link a reaction to its metabolites and to the genes that code for their corresponding enzymes?  
  Such questions and more will be covered in this session. Its application on the genome-scale will be shown in the [exact following session](./reconstructingDraftGSMMs.ipynb).

* **Cosntraint-based Analysis**:

  The definition of constraints and the various types of analyses for metabolic models will be discussed during [tomorrow's afternoon session](./computationalMethods.ipynb). 
  During this last session, we will explore how COBRA methods are actually applied throughout the workshop sessions.


The openCOBRA project supports COBRA methods across several programming languages. For this workshop, we will be using COBRApy, although it's important to note that the project began with MATLAB, and many analyses and routines are still only available in the COBRA Toolbox. However, the Python interface of COBRA offers a more accessible approach for capturing the complexity of integrated biological networks and provides an integration framework for multiomics data in systems biology.

For more on COBRApy you may study its corresponding [publication](https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-7-74).

What you will need for sure though for this workshop and for your future metabolic modeling adventures in Python is cobra's [documentation page](https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-7-74).

### Building a metabolic toy-model

In the [`BT_metabolicReactions.txt`](files/BT_metabolicReactions.txt) we have brought together a set of reactions to highlight *Bacteroides thetaiotaomicron*'s (BT) nature as a primary fermenter.

In [None]:
genome_reactions = pd.read_csv("files/BT_metabolicReactions.txt", sep="\t") 
genome_reactions.head()

As shown above, this file has seven columns from which 
* **BIGG id**

    [BiGG Models](http://bigg.ucsd.edu) is a knowledgebase of genome-scale metabolic network reconstructions that integrates metabolic models building a set of standardized identifiers called BiGG IDs.



* **SEED id**

    [ModelSEED](https://modelseed.org) is a resource for the reconstruction, exploration, comparison, and analysis of metabolic models.
    ModelSEED brings its own namespace for metabolites (`cpd_` prefix) and reactions (`rxn_` prefix). 
    You may [browse on ModelSEED](https://modelseed.org/biochem/reactions) even without being a user, yet by making an account you are able to reconstruct draft models on the fly. 
    Moreover, ModelSEED is integrated with [KBase](https://www.kbase.us),
    a community-driven research platform for systems biology that provides open science tools, data, and computing resources. 

* **Reaction**

    In this column, we can see three important things:
    - products and the reactants participating the reaction
    - the stoichiometry 
    - the **reversibility** of the reaction

The rest of the columns are much easier to get now: under the **name** column, there's the name of the reaction according to ModelSEED ontology, the **reaction** column provides the stoichiometry of it and its *reversibility*, while the **exchange** column shows which metabolite is being exchange between the cell itself (cytosol) and its environment. 
Last, **SEED link** points to the ModelSEED record of the reaction, and **Notes** gives some extra information on what or why we have added a reaction on this set.


In total, this file contains 23 reactions:

In [None]:
genome_reactions.shape

which will be the basis for the toy model. 
The reconstruction process for this toy-model is out-of-scope of this workshop, but in case you are interested you may have a look at the corresponding [build_bt_model.py](../scripts/build_bt_model.py).

This script's output is a [Systems Biology Markup Language (SBML)](https://sbml.org) file. 
As mention in SBML's page, SBML is a **software data format for describing models** in biology. 
You can think of it as a structure way to denote certain entities like metabolites, reactions, genes and their relationships.

> **Hint!** 
> SBML is **not** tool-dependent or analysis-specific; rather, it serves as a standardized interface for building models that can be used across different system platforms and operating systems. Software and tools for metabolic model analysis typically rely on SBML as the primary format for receiving input, i.e., the metabolic model.

Let us now have a look on the SBML toy-model we built for BT!

### Model's basic parts 

Let us have a look on its content.

In [None]:
%%bash
head -20 ../files/models/sugar_fermenter_toy_model.xml

Most likely, it does not look that helpful, especially if you think that a genome-scale model would come up with thousands of metabolites and reactions!

And this is the first instance where `cobra` comes to our rescue!

We will use `cobra` to *load* the model in an easily accessible way.

In [None]:
import cobra
bt_model = cobra.io.read_sbml_model("../files/models/sugar_fermenter_toy_model.xml")
bt_model

This is our model's summary! Our toy model has a name `sugar_fermenter` and consists of 37 metabolites, 33 reactions with one among those being its **objective function** (we will discuss later on what that means) and two compartments. 

So, let's check on those entities one-by-one to have a clear overview of the model! 

### Metabolites in `cobra`

A list that contains an object for each metabolite in the model.

There are different functions to retrieve the desired metabolite object from the model, according to some query criteria:

`model.metabolites.get_by_id` , `model.metabolites.get_by_any`, `model.metabolites.has_id`

The most important attributes of a metabolite object are:

- id - allows one to access the model object.

In [None]:
water = bt_model.metabolites.cpd00001_c
water

if the database used contains ids that are not suitable as python variable names (such as the BIGG ids), the metabolite can be accessed with the `get_by_id` function of the `model.metabolites` object.


In [None]:
water = bt_model.metabolites.get_by_id('cpd00001_c') #same result as above
water

Some attributes of the `metabolite` object

In [None]:
#name
print('name: ', water.name)

#formula
print('formula: ', water.formula)

#elements
print('elements: ', water.elements)

#charge
print('charge: ', water.charge)

#compartment (The metabolite ids commonly have a '_c' or '_e' as a suffix, to indicate their compartments)
print('compartment: ', water.compartment)

#reactions (reaction objects where the metabolite is either a reactant or a product)
print('reactions: ', water.reactions)

### Reactions in `cobra`

Like metabolites, reactions are an attribute of the `cobra` model object. 

Consists of a list containing a reaction object for each of the model's reactions.

Let us first see how a reaction looks like:

In [None]:
bt_model.reactions[0]

Again, like in the metabolites case, each reaction has a unique id and a name. 

Yet, the most important attributes of a reaction in a model is its stoichiometry, where we also see its **reversibility**, and its **bounds**,

We call bounds the lowest and the higher value the flux of the reaction can get.

In this case, where we have the reactions' flux ranging from `-1000` to `1000` we have the case of a free flux that it may get any possible flux. 

> **TASK** :question:
>
> Could you think what the bounds of an irreversible reaction would look like?

In [None]:
#number of reactions in the model
print(len(bt_model.reactions))
for r in bt_model.reactions:
    print(r.id, r.build_reaction_string())  # use_metabolite_names=1

Similar to the metabolite objects, reaction objects may be accessed directly by their ids or with the special functions:


`model.reactions.get_by_id` , `model.reactions.get_by_any`, `model.reactions.has_id`


In [None]:
acetate_ptransferase = bt_model.reactions.rxn00225
print(acetate_ptransferase)
acetate_ptransferase = bt_model.reactions.get_by_id('rxn00225')
print(acetate_ptransferase)

Relevant attributes of the reaction object:

In [None]:
#name
print('name:', acetate_ptransferase.name, '\n')

#compartments (while metabolites are present on a single compartments, reactions may occur across compartments)
print('compartments: ', acetate_ptransferase.compartments, '\n')

#metabolites
print('metabolites: ', acetate_ptransferase.metabolites)
print([i.name for i in acetate_ptransferase.metabolites], '\n')

#reactants
print('reactants: ', acetate_ptransferase.reactants)
print([i.name for i in acetate_ptransferase.reactants], '\n')

#products
print('products: ', acetate_ptransferase.products)
print([i.name for i in acetate_ptransferase.products], '\n')

#reaction
print('reaction: ', acetate_ptransferase.reaction, '\n')

#other ways to see the reaction
print(acetate_ptransferase.build_reaction_string())

print(acetate_ptransferase.build_reaction_string(use_metabolite_names=1))

In [None]:
#see the details of the metabolic reactions containing the metabolite
for i in water.reactions:
  print(i.id, '\t', i.build_reaction_string(use_metabolite_names=True), '\n')

Notice that the `reaction.metabolites` object returns a python dictionary where the keys are the metabolite objects that participate in the reactions and the values are their stoichiometries. Reactants and products have, respectively, negative and positive values.

Since this is a constraint-based model, reactions also have lower and upper bounds, which constraint the fluxes. Through these bounds we may cotrol the direction of the reaction.

In [None]:
print('lower bound: ', acetate_ptransferase.lower_bound, '\n')

print('upper bound: ', acetate_ptransferase.upper_bound, '\n')

# Set the lower bound to zero
acetate_ptransferase.lower_bound = 0
print('updated lower bound: ', acetate_ptransferase.lower_bound, '\n')

print('Default reaction: ', acetate_ptransferase.build_reaction_string(use_metabolite_names=1), '\n')

print("Is my reaction reversible?", acetate_ptransferase.reversibility)

# Still, we can make it if we wish to
print('irreversible reaction: ', acetate_ptransferase.build_reaction_string(use_metabolite_names=1))

### Compartments

We have already mentioned this concept, yet we have not defined it so far.

In a metabolic model, we need to know **where** a compound is available and **where** a reaction is taking place.

To this end, a model may have several **compartments** to simulate real-world conditions. 
In Bacteria, things are quite straight-forward in this aspect and most of the times we only consider two compartments:
a compound may be available within the cell, in the cytosol (`c`) or in its environment (`e`). 
Of course, you may see other compartments as well, e.g. the periplasm (`p`).

The different exchange reactions make sure, compounds can move from one compartment to the other. 
The annotation of what receptors a cell has, i.e. what compounds can get from its environment is quite a task but, this is a story for another time! 

### Reaction types

We can distinguish reactions based on the compartments of their metabolites

#### Internal reactions

Reactions that occur only in the cytosol.

In [None]:
ldh = bt_model.reactions.rxn00499
ldh

#### Transport reactions

Reactions that take extracellular metabolites and transport them into the cell (i.e. convert them into cytosol metabolites)

In [None]:
gluT = bt_model.reactions.rxn05573
gluT

In [None]:
print(gluT.build_reaction_string(use_metabolite_names=True))
print(gluT.build_reaction_string(use_metabolite_names=False))

print(gluT.upper_bound)

print(gluT.lower_bound)

#Notice that 'cpd00027_e' is converted to 'cpd00027_c'.

#### Exchange reactions

Reactions that make a metabolite available in the extracellular space. From the extracellular space it can be taken up by a transporter.

In [None]:
gluExch = bt_model.reactions.EX_cpd00027_e
gluExch

In [None]:
print(gluExch.build_reaction_string(use_metabolite_names=1))

print(gluExch.lower_bound)

print(gluExch.upper_bound)

# Notice that the inward flux is negative. If the flux is positive, it means the metabolite is getting secreted.

#### Sink reactions

Reactions that eliminate a dead-end metabolite (i.e. a metabolite that is not consumed by any further reaction and is not secreted)

In [None]:
piSk = bt_model.reactions.piSink

print(piSk.build_reaction_string(use_metabolite_names=1))

print(piSk.lower_bound)

print(piSk.upper_bound)

#### Objective function

In the framework of metabolic models, the objective function is a mathematical expression that represents the biological *goal* of our model, meaning
what our model is trying to do best, i.e. what is it trying to either minimize or maximize.

It is typically defined as a weighted sum of reaction fluxes and is used as the target for optimization in constraint-based modeling approaches like Flux Balance Analysis (FBA), meaning that you can consider this as a list of weights as many as the number of the reactions of your model, highlighting whether and how much a reaction is part of the *goal* of your model. 

To check whether a reaction is part of the objective, we can use an attribute of `cobra`'s `reaction` that was not shown earlier, the 

In [None]:
bt_model.reactions.rxn00545.objective_coefficient

Since it is 0 this reaction is not part of our objective.
We can use the `objective coefficient` to get which reaction that is about to be optimized by the model.

In [None]:
for i in bt_model.reactions:
    if i.objective_coefficient==1:
        print(i.id)

In [None]:
bt_model.reactions.rxn00225.objective_coefficient = 0

In [None]:
bt_model.reactions.biomass.objective_coefficient = 1

> **NOTE**
> 
> It is quite common to add a **pseudo** reaction in our model, called **biomass function**, that represents what compounds and how much of each of them the species requires in order to produce 1 gram of dry weight (gDW) of the organism. 
> If we set this biomass function as a model's objective function, we will be able to check:
> - whether our species grows and what would be its optimal growth rate 
> - what reactions need to perform to achieve this optimal state, and thus what are the limiting factors

In this toy model, we have set a biomass function on our own that looks like this.

In [None]:
bt_model.reactions.biomass

What do you think about the bounds ?

### `bt_model` check! 

The keen-eyed read will notice that the number of reactions (33) is not like the one in the `.txt` file (23).

With an easy check, we can also see that the last three reactions from the `.txt` are **not** part of our model, meaning that a total of 13 reactions that are part of our model, they are not in the `.txt` file:
`RNF`, `biomass` and `rxn05654`.

In [None]:
genome_reactions.tail(3)

So, before we move on with how a reaction 
So, as you can see from this summary of your model, it includes a couple of metabolites 
 what's those added reactions ? 

So now, we have a model with several metabolites and reactions, which take place in both the cytosol (`c`) and out of the cell (`e`).



Also, there is an ***objective function***! 

Let's check on this!


In [None]:
bt_model.reactions.biomass

> **The extra mile**
> 
> This step had a little cheat on our side, since there was no reason to already know which reaction is being used
> as the objective function. 
> And even if we would notice from the name that has something to do with the biomass, we do not know the actual id of the reaction.
> In an actual GEM, to make sure which is the reaction used as the objective function of the model, you would have to check the coefficients of each reaction in the objective function vector
> 
> ```obj_index = [r.objective_coefficient for r in model.reactions].index(True)```
> 

In [None]:
# Getting the reaction id of the objective function in an actual case 
obj_index = [r.objective_coefficient for r in bt_model.reactions].index(True)
bt_model.reactions[obj_index].id

So, what is interesting besides the stoichiometry of the biomass reaction, indicating what compounds are required for biomass to be produced, and also what compounds are being produced, is that there is a **lower** and an **upper bound** in the `Reaction` object.

These values are **essential** for the model as they are those implying its constraints. 

For the biomass function you can see that there is a lower bound of 0 indicating that the model cannot have a negative flux for biomass, i.e. consume biomass, while the upper bound can be considered free, since it gets an extreme high value.

The total list of the **exchange** reactions present in a model, reaction objects for the metabolites that could **potentially be exchanged** (in or out), are available by running:

In [None]:
bt_model.exchanges

*Exchange* reactions are a type of **boundary reactions**.
There are three different types of pre-defined **boundary reactions**. 
All of them are **unbalanced pseudo reactions**, that means they fulfill a function for modeling by **adding to or removing metabolites** from the model system but are **not based on real biolog**y. These reaction types include:

* **exchange:** reversible reactions that add to or remove an extracellular metabolite from the extracellular compartment 
* **demand:** irreversible reaction that each consumes an intracellular metabolite. 
* **sink** is similar to an exchange but specifically for intracellular metabolites, i.e., a reversible reaction that adds or removes an intracellular metabolite.

Likewise, we can get the **sink** reactions:

In [None]:
bt_model.sinks

While, we can also get all the boundary reactions together

In [None]:
bt_model.boundary

> Task! 
>
> Could you find whether other reactions except of the exchange ones are part of the boundary reactions of your model ? 
>
> Is there any reaction type that is missing from your boundary reactions?

### *In silico* environment (medium) in `cobra`

However, only if a metabolite is available in the `medium` of the model can be actually used. 
`medium` contains the information about the concentration of metabolites that are available to the model from the external compartment (media metabolites).


In [None]:
bt_model.medium

In [None]:
bt_model.metabolites.cpd00027_e

In [None]:
bt_model.metabolites.cpd00067_e

In [None]:
initial_medium = bt_model.medium
lim_glc_medium = initial_medium.copy()
lim_glc_medium["EX_cpd00027_e"] = 5
bt_model.medium = lim_glc_medium

In [None]:
bt_model.medium

For a human readable version of the same information, one may run:

In [None]:
for rxn in bt_model.medium:
    cobra_rxn = bt_model.reactions.get_by_id(rxn)
    cobra_met = cobra_rxn.metabolites
    print("reaction string:", cobra_rxn.build_reaction_string(), "metabolites:", [x.name for x in cobra_met])

However, the medium in which your species would grow can be different from the one it brings with it by default.
Yet, the way `cobra` builds the `medium` attribute, you cannot edit it directly.

In [None]:
print(bt_model.medium["EX_cpd00027_e"])
bt_model.medium["EX_cpd00027_e"] = 10 
print(bt_model.medium["EX_cpd00027_e"])

### Model's solution 

Tomorrow we will discuss in depth several [metabolic modeling analysis methods and some core ideas on what is called *constraint-based analysis*](./computationalMethods.ipynb).

For now, we will think of the most trivial analysis, called Flux Balance Analysis (FBA), as a black box, in order to learn:

- how we can benefit out of it 
- how to get it and handle its findings using `cobra`.

The solution 

In [None]:
solution = bt_model.optimize()
solution

Let us know check what would change if our Bt model would not "care" for maximizing its growth, but its ATP production!
ATP production is described in the rxn00148 ModelSEED reaction.

In [None]:
print(bt_model.reactions.rxn00148.build_reaction_string())
for met in bt_model.reactions.rxn00148.metabolites:
    print(met.name)

The key point here is to observe which compounds are considered as **reactants** and which as **products**!

In [None]:
sol = bt_model.optimize()

In [None]:
sol.fluxes.loc["rxn00148"]

We need to change the objective function to the one for ATP production and then solve the linear program once again.

In [None]:
bt_model.objective = bt_model.reactions.rxn00148

In [None]:
sol_atp = bt_model.optimize()

In [None]:
bt_model.objective_direction = "min"
sol_atp_min = bt_model.optimize()

So, let use see now the optimal solution and the fluxes of the model for maximizing ATP production

In [None]:
sol_atp_min

> :question: **Task**
>
> Try to use the medium of *Bt* to *Ri* and optimize for growth? Does it grow? 
> 
> If not, could you suggest limiting factors?

> :question: **Task!**
> 
> How would you try to edit your medium ? 
>
> Try to first edit the bound of an exchange reaction already part of the medium. 
>
> Then, try to remove one and add another. :rocket:
<!-- :hammer: -->

Now, let us have a quick overview of the different ways to analyze a metabolic model, where as first thing we'll investigate how this growth-check we did for the differenet media works! :tada:

## Bonus quest: exploring random media

1. Get the three toy models

In [None]:
# @title
sugar_fermenter = cobra.io.read_sbml_model("files/models/sugar_fermenter_toy_model.xml")
butyrate_producer = cobra.io.read_sbml_model("files/models/butyrate_producer_toy_model.xml")
acetogen = cobra.io.read_sbml_model("files/models/acetogen_toy_model.xml")

2. Find their exchange reactions

In [None]:
# @title
def findExchanges(model):
    for reaction in model.reactions:
        if 'EX_' in reaction.id: #alternatively use i in model.exchanges
            reac_string = reaction.build_reaction_string(use_metabolite_names =True)

            print(f"{reaction.id}\t{reac_string}")


print("Sugar fermenter:")
findExchanges(sugar_fermenter)

print("\nButyrate producer:")
findExchanges(butyrate_producer)

print("\nAcetogen:")
findExchanges(acetogen)

3. Make a medium based on their exchange reactions like a python dictionary
`[exchange_reaction] : -10`

`-10` is an arbitrary number representing the maximum amount of the compound that the strain can import from the enviroment.

In [None]:
# @title
def makeMediumFromModels(modelList):
    medium = {}

    for model in modelList:
        for reaction in model.exchanges:
            medium[reaction.id] = -10

    return medium

medium = makeMediumFromModels([sugar_fermenter, butyrate_producer, acetogen])
print(medium)

4. Make a function to generate a random medium for these exchange reactions

In [None]:
# @title
import numpy as np

def makeRandomMedium(medium_dict):
    return {reaction: -1*np.random.uniform() for reaction in medium_dict}



med1 = makeRandomMedium(medium)
med2 = makeRandomMedium(medium)
med3 = makeRandomMedium(medium)

print(f"{med1}\n\n{med2}\n\n{med3}")

5. Make a function to apply a medium to a model
   ```python
   def applyMedium(model, medium_dict):
    #return the growth rate of the model in this medium
   ```

In [None]:
# @title
def applyMedium(model, medium_dict):
    for reaction in medium_dict:
        if model.reactions.has_id(reaction):
            model.reactions.get_by_id(reaction).lower_limit = medium_dict[reaction]

    solution = model.optimize()
    return solution.objective_value

print(f"sugar fermenter: {applyMedium(sugar_fermenter, med1)}")
print(f"\nbutyrate producer: {applyMedium(butyrate_producer, med1)}")
print(f"\nacetogen: {applyMedium(acetogen, med1)}")


6. Apply 1000 random medias to each model and store their growth rates, visualize with an histogram

In [None]:
# @title
sf = np.zeros(1000)
bp = np.zeros(1000)
ac = np.zeros(1000)

for i in range(1000):
    medium = makeRandomMedium(medium)
    sf[i] = applyMedium(sugar_fermenter, medium)
    bp[i] = applyMedium(butyrate_producer, medium)
    ac[i] = applyMedium(acetogen, medium)

import matplotlib.pyplot as plt

plt.hist(sf, 25, density=True, color = 'red')
plt.show()
plt.hist(bp, 25,  density=True, color = 'blue')
plt.show()
plt.hist(ac, 25,  density=True, color = 'orange')
plt.show()