# Part 1: FBA model reconstruction

In this tutorial, you are using the platform https://modelseed.org to perform a metabolic network reconstruction of *Acetobacter aceti*.

Overall you should follow these steps:

1. Obtaining gene sequence data from a database
2. Identify genes with a metabolic function
3. Perform automatic model reconstruction on https://modelseed.org
4. Check gene assignments in the automatic reconstruction
5. Evalution of the FBA solution

## General instructions

* You generally need to run all the code cells below in sequence. Some of them may be incomplete or empty, follow the instructions to work out a code solution for them.
* Explanatory text comes in markup text cells that have already be formatted, you can skip over these.

## Obtaining gene sequence data

In this exercise, you are constructing a metabolic network model for the gram negative bacterium *Acetobacter aceti*, which is used in the food industry to procude vinegar from alcohol.

We are using the gene sequence available at the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) under the identifier 435.5 (Acetobacter aceti strain TMW2.1153).
You can obtain the sequence information from the database by the following steps:

- Go to [www.bv-brc.org](https://www.bv-brc.org/).
- Enter the strain name in the search field, and click on the returned Genome match which should have the ID 435.5.
- Press the button labelled "Download" in the right margin, tick "Protein Sequences in FASTA (*.faa)" in the popup window, and press "Download".
- The downloaded zip file will contain a file called "435.5.PATRIC.faa"; that is the one we need to work with.

For the following code examples to work completely, you should add the downloaded file with the name ``acetobacter_aceti.txt` to the working directory of this notebook.
In Google colaboratory, click on the folder symbol to the left and upload it there.

## Identify genes with metabolic function

The objective in this step is to identify, by means of a few examples, genes with important metabolic functions in the *Acetobacter aceti* genome.

1. Look in the FASTA file for gene annotations that indicate a metabolic function. To see just the gene identifiers and annotations, you can run the following code cell:

In [None]:
! grep '>' acetobacter_aceti.txt

2. Important enzymes for oxidation of ethanol to acetic acid are alcohol dehydrogenase (EC 1.1.1.1) and either acetate-coenzyme A ligase / acetyl-CoA synthetase (EC 6.2.1.1) or acetate kinase (2.7.2.1).
   1. Check whether you can find genes with an annotation that indicates a function as one of these enzymes. Search the sequence file for "alcohol", "acetate", or "coenzyme A", and select some of the resulting protein sequences to run through a blast homology search.
4. Determine the reaction equation for this gene either directly from uniprot.org or by looking up the EC number / gene name on https://biocyc.org.

### Example

The evaluation of a metabolic function will be shown for one example gene.

1. Looking at the gene identifier "fig|435.5.peg.1340|A0U92_05840" (line 8437 in the obtained FASTA file) we find that this is annotated as "Alcohol dehydrogenase (EC 1.1.1.1) [Acetobacter aceti strain TMW2.1153 | 435.5]" which indicates that this protein may act as an alcohol dedydrogenase. The full snippet of the FASTA file for this gene is as follows:

        >fig|435.5.peg.1340|A0U92_05840   Alcohol dehydrogenase (EC 1.1.1.1)   [Acetobacter aceti strain TMW2.1153 | 435.5]
        MAGKMKAAVAHEFNKPLTIEELDIPTINQNQILVKMDACGVCHTDLHAVRGDWPVKPTLP
        FIPGHEGVGHVVQVGSNVNWVKEGDYVGVPYLYSACGHCLHCLGGWETLCEKQEDTGYTV
        NGCFAEYVVADPNYVAHIPKGADPLQVAPVLCAGLTVYKALKMTDTKPGDWVAVSGVGGL
        GQMAMQYGVAMGKNMIAVDIDDEKLATAKKLGAALTVNARDTDPAAFIQKEVGGAQGVVV
        TAVSRIAFSQAMGYARRGGTIVLNGLPPGDFPVSIFDMVMNGTTVRGSIVGTRLDMIEAL
        SFFADGKVHSVVKPDKLENINRIFDDLENGRIDGRVVLDFRN

2. We enter this sequence in the search window at https://www.uniprot.org/blast, wait a moment for the algorithm to run, and look through the found matches.
3. In that case, there are many matches, which are indeed labelled as "Alcohol dehydrogenase". If we click on the one from *Acetobacter estunensis*, which has 90.9% sequence similarity, we can see the protein description which also includes the EC number 1.1.1.1. This can be looked up on e.g. https://biocyc.org to correspond to the reaction equation:

    ethanol + NAD+ <-> acetaldehyde + NADH + H+


## Automatic model reconstruction with modelseed.org

Instead of going through all genes manually, we will now upload the genome sequence file to the platform https://modelseed.org to perform an automated reconstruction.

1. In a browser, go to https://modelseed.org, log in, and select the tab "Build Model" in the top row.
2. Select "Upload microbes FASTA", select your sequence file, choose the genome type "Protein sequences" and the template for gram-negative microbes.
3. Press the button "Build model".
4. After some time, the build process should be completed. For the subsequent steps, you should download the resulting model as SBML file. For the code in the following steps, it is assumed that you save this model in a file named `a_aceti.sbml`.

## Check gene assignments in the automatic reconstruction

In this step, you are checking with the cobrapy toolbox which genes have been assigned to the metabolic network reconstruction, and try to verify the match with your previous blast results.

In [None]:
import cobra
aa_model = cobra.io.read_sbml_model("a_aceti.sbml")

Look at reactions in the model. They are stored in the SBML object "ListOfReactions". Some will have an attribute `gene_reaction_rule` that refers to the genes which have been assigned to this reaction. Searching through the model for our previously analysed gene identifier "A0U92_05840", we find multiple reactions which modelseed has assigned to that gene. One of them is:

In [None]:
r = aa_model.reactions.get_by_id("rxn00543_c0")
display(r)

In [None]:
aa_model.metabolites.cpd00029_c0

In order to check which reactions for metabolization to acetate have been included in the model, we can just look at all reactions in the model that include acetate as a reactant. Intracellular acetate has the identifier `cpd_00029_c0` in modelseed, so the following code will print the information for all reactions involve intracellular acetate:

In [None]:
for r in aa_model.metabolites.cpd00029_c0.reactions:
    display(r)

## Evaluating FBA results

A first FBA solution can be obtained by calling the `optimize` method from cobrapy on the loaded model, and printing the resulting model summary.

In [None]:
# Code to perform analysis of the FBA model
sol = aa_model.optimize()
print(aa_model.summary())

Since no medium constraints (nutrient uptake) have been defined during the reconstruction, we get a very high growth rate and some very high fluxes for different exchange reactions. You can inspect some of them with cobrapy:

In [None]:
aa_model.reactions.EX_cpd00100_e0

In order to get a more realistic FBA model from this automatic reconstruction, a number of additional steps would have to be performed:
* Add uptake / exchange constraints on extracellular metabolites to represent a realistic medium composition and uptake rate limitations.
* Verify that intracellular reactions that have been added by modelseed are a realistic representation of the organism's physiology, add or remove reactions to correct if not.