# Reminder of flux balance analysis

Flux balance analysis (FBA) is a mathematical and computational technique for estimating metabolic fluxes at a systems level. As you saw in the theory class, the general idea of the method is shown in the following illustration:

![FBA](Media/fba.gif)

In a given reconstruction of a metabolic network, we start by assuming that all fluxes $(v_1, v_2,...)$ (corresponding to reactions 1, 2,...) are possible, so we start with an *unconstrained solution space*. Then, we impose that, if metabolism is working in the steady state, no metabolite can accumulate or get depleated. This translates into a set of linear constraints on the values of the fluxes, and therefore defines an *allowable solution space*. Third and last, among all solutions in the allowable solution space, we choose the one that is optimal in some way. Typically, we choose the solution that results in maximum biomass production for the metabolic network, because we assume that this solution is evolutionarily favourable.

[`COBRApy`](https://cobrapy.readthedocs.io/) is a Python implementation of the COnstraint-Based Reconstruction and Analysis Toolbox [for MATLAB](https://opencobra.github.io/cobratoolbox/stable/), a software suite for quantitative prediction of cellular and multicellular biochemical networks with constraint-based modelling. As we will see in today's session, it can be used for modeling metabolic fluxes and simulating gene knockouts, among other things.

# Basic functioning of COBRApy

Let's import `COBRA` and take a look at some of its basic features. To begin with, we note that it comes with predefined models for *Salmonella* and *E. coli*, as well as a “textbook” model of *E. coli* 's core metabolism. Let's take a look at this textbook model:

In [None]:
import cobra
import cobra.test

# "textbook", "ecoli" and "salmonella" are valid arguments to create_test_model
model = cobra.test.create_test_model("textbook")
print('The model object is ot type:', type(model))
display(model)

***

#### Exercise

Describe the basic biological information about the `textbook` metabolic reconstruction. How many metabolites does it comprise? How many reactions? Which cellular compartments does it use?

*Write your answer here*

#### Exercise

Load the complete metabolic reconstruction for *E. coli* into a `Model` object called `model2` and describe its basic biological properties.

In [None]:
# Type your code here

*Type your answer here*

***

## Metabolites

The metabolites in a given reconstruction are stored in a `list` and can be accessed through the `.metabolites` atribute of the `Model` as follows:

In [None]:
# Metabolites are stored in a the attribute 'metabolites' of the Model, which can be accessed as
# model.metabolites. Each metabolite in this list can be accessed by its index, as in any other list
print('Access the first element in the .metabolites list:')
display(model.metabolites[0])

# Alternatively, metabolites can be accessed by their identifier
print()
print('Access the "atp_c" (cytoplasmatic ATP) metabolite in the .metabolites list:')
atp = model.metabolites.get_by_id('atp_c')
display(atp)

Note that, by convention, in this reconstruction, the name of the metabolite is followed by `_c` for the metabolite in the cytosol compartment, and by `_e` for the metabolite in the extracellular compartment. 

Another convenient way to access a single metabolite is as if the metabolite itself were an attribute of `.metabolites`: 

In [None]:
display(model.metabolites.atp_c)

Great!! Let's focus on the `atp_c` metabolite and see what information we can extract from it through its attributes.

In [None]:
print('Name:', atp.name)
print('Compartment:', atp.compartment)
print('Charge:', atp.charge)
print('Formula:', atp.formula)
print('Elements:', atp.elements)
print('Weight:', atp.formula_weight)

Last but not least, the `summary()` method provides useful information about all the reactions in which a given metabolite participates:

In [None]:
display(atp.summary())

So let's now move on to reactions to understand better each of these entries.

## Reactions

Let's take a look at how reactions are stored and how to access and extract information from them. Similar to metabolites, reactions are stored in a `list` in the `.reactions` attribute of the `Model`, and can be accessed by their index in this list, by their ID, or as attributes of the `.reactions` list:

In [None]:
# Reactions are stored in a the attribute ''.reactions' of the Model, which can be accessed as
# model.reactions. Each reaction in this list can be accessed by its index, as in any other list
print('Access the first element in the .reactions list:')
display(model.reactions[0])

# Alternatively, reactions can be accessed by their identifier
print('\nAccess the "PYK" reaction in the .reactions list:')
pyk = model.reactions.get_by_id('PYK')
display(pyk)

# Or as attributes of the reactions list
print('\nAccess the "PYK" as an attribute of the .reactions list:')
display(model.reactions.PYK)

As for metabolites, a number of attributes of the reaction allow us to get all sorts of information:

In [None]:
print('Name:', pyk.name)
print('Reaction:', pyk.reaction)
print('Compartments:', pyk.compartments)
print('Reactants:', pyk.reactants)
print('Products:', pyk.products)

***
#### Exercise

Print the molecular weight of each of the reactants and each of the products of the PYK reaction.

In [None]:
# Write your code here

***

We can also get the stoichiometric matrix of the reaction as follows:

In [None]:
display(pyk.metabolites)

So what is this? This is not a list, or any other data structure we have encountered so far. What data structure is this? This is a Python `dictionary` and, although, again, we have not encountered it in previous sessions, it is perhaps the most widely used and versatile Python data structure. Please, before continuing, **do read** [this short tutorial](https://www.w3schools.com/python/python_dictionaries.asp) on Python dictionaries.

The stoichiometric coefficient of each metabolite in a reaction can then be obtained as follows:

In [None]:
for metab in pyk.metabolites:
    print(metab, pyk.metabolites[metab])

`Reaction` objects are more complex than `Metabolite` objects because, of course, reactions are regulated by enzymes (which, in turn, are encoded by genes), have to fulfill certain conditions (e.g. conservation of atoms), and need to be balanced overall. Let's see how to handle all of these things.

### Regulation of reactions
Let's start with regulation: which enzymes are catalyzing the `pyk` reaction?

In [None]:
display(pyk.genes)

So there are two enzymes catalyzing the reaction, whose gene IDs are `b1676` and `b1854`. Now, these genes are encapsulated in a `frozenset`, which in some ways is similar to a `list` but with at least two important differences:
1. A `frozenset` does not contain repeated elements. If a `frozenset` is initialized with repeated elements, only one will be left.
2. The elements of a `frozenset` cannot be altered after its creation. In particular, we cannot add or remove elements to/from a `frozenset`.

You can read more about Python's built-in types, including sets and frozen sets, [here](https://docs.python.org/3/library/stdtypes.html).

In any case, we can iterate over the genes in the `frozenset` as we would in a list:

In [None]:
for g in pyk.genes:
    display(g)

OK. So this reaction is catalyzed by two enzymes, but how? For example, it may be the case that the two proteins encoded by genes pykF and pykA form a complex, which is responsible for catalyzing the reaction. In this case, we would need both proteins to carry out the reaction. By contrast, it could be that pykF and pykA encode for very similar proteins, each of which can catalyze the reaction independently of the other. To see which of these is true (that is, to see whether we need both proteins at the same time or we need either of the proteins), COBRA uses "reaction rules":

In [None]:
# Show reaction rule using gene ids and gene names
display(pyk.gene_reaction_rule)
display(pyk.gene_name_reaction_rule)

So the rule indicates that we need one protein OR the other, so the reaction will take place if either of the proteins is present (or both are present, of course). By contrast 'pykA and pykF' would indicate that BOTH proteins are needed to carry out the reaction.

Arbitrarily complex boolean reaction rules are possible. For example '(A and B) or C' would indicate that the reaction can be carried out if C is present OR if A AND B are present at the same time (but not if A is present but B and C are missing, for example).

***

#### Exercise

Get the reaction rule for the ATPS4r reaction and interpret it in biological terms.

In [None]:
# Write your code here

*Write your answer here*

***

Other than through the associated reactions, genes can be accessed directly from the model just as metabolites or reactions:

In [None]:
display(model.genes.get_by_id('b1854'))

### Mass balance 

We can also ensure the reaction is mass balanced with the `.check_mass_balance()` method. This function will return elements which violate mass balance. If it comes back empty, then the reaction is mass balanced.

In [None]:
# Check mass balance
pyk.check_mass_balance()

In order to add a metabolite to a reaction, we pass in a dictionary with the metabolite object and its coefficient. For instance, imagine that we want to add an extra hydrogen to the list of reactants for this reaction:

In [None]:
pyk.add_metabolites({model.metabolites.get_by_id("h_c"): -1})
display(pyk)

Now, if we check for mass balance, we will find that, as expected, the reaction is no longer mass balanced:

In [None]:
pyk.check_mass_balance()

We can remove the extra H, and the reaction will be balanced once again:

In [None]:
pyk.add_metabolites({model.metabolites.get_by_id("h_c"): +1})
pyk.check_mass_balance()

# Flux Balance Analysis

So now that we understand how metabolites, reactions, and enzymes are handled by COBRApy, let us move on to flux balance analysis (FBA).

## Reaction bounds

In FBA, we need to specify bounds to the fluxes of each reaction. In FBA, it is customary to measure fluxes in mmol/g DW/h, so these are the units in most reconstructions. The bounds are stored in the `lower_bound` and `upper_bound` attributes of the reaction:

In [None]:
print(pyk.lower_bound, "< pyk <", pyk.upper_bound)
print(pyk.reversibility)

So we see that this reaction has a minimum flux of 0 and a maximum flux of 1000. Since the flux can only be positive (or zero) the reaction is irreversible. We could alter this by changing the lower bound of the flux and make it negative:

In [None]:
pyk.lower_bound = -1000
print(pyk.lower_bound, "< pyk <", pyk.upper_bound)
print(pyk.reversibility)

A negative flux indicates a reaction where the "products" are being transformed into the "reactants", that is, a reaction that is flowing "backwards". Of course, for reversible reactions, what is forward and what is backwards is arbitrary, but in COBRApy it always refers to whatever is specified as reactant and product: reactants to products is forward (positive flux), and products to reactants is backwards (negative flux). Now, let's make our reaction irreversible again!

In [None]:
pyk.lower_bound = 0
print(pyk.lower_bound, "< pyk <", pyk.upper_bound)
print(pyk.reversibility)

### Interlude: Using models as contexts

This thing we just did was a bit annoying! After changing the reversibility of the reaction, we had to bo back and manually make it irreversible again. Now imagine that we want to make changes to each individual reaction in the network and, every time, we have to undo the changes manually. That could be a nightmare. Instead, using models as *contexts* is quite useful. Take a look at this code:

In [None]:
print('Outside context:', pyk.reversibility)
# Create a context, where the model is called "local_model"
with model as local_model:
    pyk.lower_bound = -1000
    print('Inside context:', pyk.reversibility)
# Leave the context
print('Outside context:', pyk.reversibility)

What happens here is the following. At the begining, the PYK reaction is irreversible, just as it should. Then we create a "context" with the `with` keyword. Within this context, the `model` is called `local_model`, and anything we do to it here will only take place within the context, and then "forgotten" as soon as we abandon the context. Therefore, inside the context we can make the reaction reversible, but when we leave the context the reaction is irreversible again, as it was before we entered the context. 

## Objective reaction

After the interlude, let's go back to FBA. As you saw in class, and as we revisited at the beginning of this notebook, FBA requires that one specifies an objective reaction (or more rarely, more than one). Among all sets of fluxes that satisfy mass conservation at each metabolite, FBA will choose those fluxes that maximize the objective. Very often, the objective is a "biomass reaction" that specifies the necessary metabolites for biomass production---the assumption being that, through evolution, organisms will will become efficient at growing and multiplying.

Let's see what is the objective of our `model`:

In [None]:
print(model.objective.expression)
print(model.objective.direction)

This means that FBA with maximize the reaction `Biomass_Ecoli_core`. So let's look at this reaction:

In [None]:
display(model.reactions.Biomass_Ecoli_core)

***

#### Exercise

Describe which are the necessary "ingredients" necessary for growth, according to the biomass reaction we have just seen.

In [None]:
# Write your code here

*Write your answer here*

***

## Medium (exchange fluxes)

Finally, we need to specify the "medium", that is, the set of nutrients that the organism is allowed to uptake from the surroundings. This, as everything else in FBA, is specified as a set of reactions with their corresponding fluxes. Let's look at them:

In [None]:
display(model.medium)

So this tells us that, for example, there is an exchange reaction called `EX_glc_D_e` whose maximum uptake flux is 10. Let's look at this reaction in more detail:

In [None]:
display(model.reactions.EX_glc__D_e)

***

#### Exercise

1. What are the reactants in this reaction?
2. What are the products?
3. Is this reaction mass-balanced?

In [None]:
# Write whatever code you need here

*Write your answers here*

***

So, what we have here is a reaction that can destroy or create D-Glucose from nothing. The destruction of D-Glucose can occur at a maximum rate of 1000 mmol/g DW/h, whereas the creation can take place, at most, at 10 mmol/g DW/h. Chemically speaking, of course, this reaction does not make any sense. Biologically, it just represents the fact that the system can uptake D-Glucose from the extracellular medium.

## Flux calculation via optimization

With all the necessary elements in place (reactions, objective, and medium) it is just a matter of running the optimization to solve the model, that is, to find the set of fluxes that satisfy mass conservation for each metabolite and, at the same time, maximize the objective (biomass production):

In [None]:
solution = model.optimize()
display(solution)

So the optimized biomass production is 0.874 g/g DW/h, and the optimal fluxes are as listed underneath. To analyze them more carefully, you can use the `.fluxes` attribute of the `solution`: 

In [None]:
display(solution.fluxes)

And for a summary of in-fluxes (medium uptakes) and out-fluxes, we can use the `.summary()` method of the `model`:

In [None]:
model.summary()

***

#### Exercise

What is the biological meaning of the out-fluxes?

*Write your answer here*

***

In addition, the input-output behavior of individual metabolites can also be inspected using summary methods. For instance, the following commands can be used to examine the overall redox balance of the model:

In [None]:
model.metabolites.nadh_c.summary()

...or to get a sense of the main energy production and consumption reactions:

In [None]:
model.metabolites.atp_c.summary()

# Reaction and gene knock-out

To finish up, we will see how we can use FBA to predict the effect on growth of knocking out certain reactions or genes. To knock out a reaction, we can just use the `.knock_out()` method of the reaction:

In [None]:
print('Complete model - PYK max:', model.reactions.PYK.upper_bound)
print('Complete model - Biomass: ', model.optimize().objective_value)
with model as local_model:
    local_model.reactions.PYK.knock_out()
    print('PYK KO model - PYK max:', local_model.reactions.PYK.upper_bound)
    print('PYK KO model - Biomass: ', local_model.optimize().objective_value)

We see that, when the PYK reaction is knocked out, the maximum allowed flow of the reaction is 0. This is what it means to have a reaction knocked out: no flux is allowed through it. More substantially, we also see that the biomass production is slightly lower than when the reaction is allowed.

***

#### Exercise

Could the biomass production be higher when a reaction is knocked out? Why?

*Write your answer here*

***

Removing entire reactions, however, is not very realistic in most biologically relevant situations. Rather, for evaluating genetic manipulation strategies, it is more interesting to examine what happens if given genes are knocked out as doing so can affect no reactions in case of redundancy, or more reactions when a gene participates in more than one reaction. Gene knock-outs are similar to reaction knock-outs, except that they act on genes rather than reactions.

Let's see how this works on the genes regulating the PYK reaction. Remember that this reaction is catalyzed indistinctly by two enzymes:

In [None]:
print(model.reactions.PYK.gene_name_reaction_rule)
print(model.reactions.PYK.gene_reaction_rule)

In [None]:
print('complete model: ', model.optimize().objective_value)

# Knock out each gene separately
with model as local_model:
    local_model.genes.b1854.knock_out()
    local_model.genes.b1676.knock_out()
    print('pykA and pykF knocked out: ', local_model.optimize().objective_value)

***

# Final exercise - Essentiality and synthetic lethality in *E. coli*

1. Load the model for the complete *E. coli* metabolism into an object called `model2`.
2. Simulate gene knock-outs for each gene individually. Hint: you can do this using a `for` loop over all the genes in the model, or you can take a look at the [COBRApy documentation on deletions](https://cobrapy.readthedocs.io/en/latest/deletions.html) and find a more efficient way.
3. Using `matplotlib`, make a histogram of the biomass productions that you get with all the individual gene knock-outs in 2. 
4. Essential genes are those whose removal results in a biomass production of 0 (no growth), that is, in unviable organisms. List all essential genes in the *E. coli* model, and count how many essential genes are there. What is the fraction of essential genes in *E. coli* based on this analysis?
5. Synthetic lethality is a phenomenon in which two genes that are not essential individually (as in question 4) are essential together, that is, that lead to no biomass production when removed simultaneously. Identify at least one pair of genes in *E. coli* that show synthetic lethality. Hint: As int 2, you may want to look at the [COBRApy documentation on deletions](https://cobrapy.readthedocs.io/en/latest/deletions.html).

In [None]:
# Write your code here