# Verifying the Accuracy of iGEM Toronto's Flux Scanning Based on Enforced objective Flux (FSEOF) Implementation

As part of our 2023 project, we (the Dry Lab sub-team of the University of Toronto's iGEM team) made a COBRApy implementation of the FSEOF (Flux Scanning based on Enforced Objective Flux) algorithm [(Choi et al., 2010)](https://doi.org/10.1128/AEM.00115-10). FSEOF identifies candidate genes that can be overexpressed in order to optimize for a metabolic engineering objective. In Choi et al. (2010), FSEOF was used to identify gene overexpression targets in *E. coli* for the increased production of lycopene. In order to validate our implementation, we attempt to replicate their findings. Throughout, we refer to our implementation of FSEOF as cobra-fseof, and the original Choi et al. algorithm as just FSEOF.


## Construction of the lycopene-producing *E. coli* model in COBRApy

The parental *E. coli* strain used by Choi et al. to experimentally validate the targets identified by FSEOF is a recombinant *E. coli* DH5α strain that contains the *Erwinia uredovora crtEIB* (lycopene biosynthesis) genes.
The genome scale metabolic model (GSMM) used in Choi et al. for the *in-silico* metabolic modeling (through MetaFluxNet) and FSEOF simulations of the recombinant *E. coli* DH5α strain, was EcoMBEL979, which was expanded to include the exogenous lycopene biosynthetic pathways and associated genes.

Instead of using the expanded EcoMBEL979 GSMM for the *in-silico* metabolic modeling and cobra-fseof simulations of the recombinant E. coli DH5 alpha strain, we used the already available BiGG model for *E. coli* DH5α, `iEC1368_DH5a`, and expanded it to include the necessary lycopene biosynthetic pathways and the genes *crtE, crtB* and *crtI*, that mediate these pathways. In addition to the lycopene biosynthetic genes, a lycopene demand reaction was added. Reference for the exact names of the genes, pathways and metabolites that were added can be found in Supplementary file 3, Table S3A and B of (Choi et al., 2010).

First, we will load the base COBRApy *E. coli* DH5α model.

In [1]:
import sys
sys.path.append("..")

import cobra


model = cobra.io.read_sbml_model("iEC1368_DH5a.xml")

Next, we will add the necessary lycopene biosynthetic pathways and the genes (i.e., *crtE, crtB*, and *crtI*) that mediate these pathways.

In [2]:
def add_single_gene_reaction_pair_lyc(
    model,
    gene_id,
    reaction_id,
    reaction_name,
    metabolites,
    gene_name=None,
):
    assert not model.genes.query(lambda k: k == gene_id, attribute="id")
    assert not model.reactions.query(lambda k: k == reaction_id, attribute="id")

    rxn = cobra.Reaction(id=reaction_id)

    if gene_name is None:
        gene_name = gene_id
    gene = cobra.Gene(gene_id, name=gene_name)

    model.add_reactions([rxn])
    model.genes.add(gene)

    rxn.name = reaction_name
    rxn.bounds = (-1000, 1000)
    rxn.add_metabolites(metabolites)
    rxn.gene_reaction_rule = gene_id


# Add crtE gene
ggpp = cobra.Metabolite(
    id="ggpp",
    formula="C20H33O7P2",
    name="geranylgeranyl diphosphate",
    charge=-3,
    compartment="c",
)
model.add_metabolites([ggpp])

phyto = cobra.Metabolite(
    id="phyto",
    formula="C40H64",
    name="phytoene",
    charge=0,
    compartment="c",
)
model.add_metabolites([phyto])

lyco = cobra.Metabolite(
    id="lyco",
    formula="C40H56",
    name="lycopene",
    charge=0,
    compartment="c",
)
model.add_metabolites([lyco])

add_single_gene_reaction_pair_lyc(
    model=model,
    gene_id="crtE",
    reaction_id="ZCRTE",
    reaction_name="Synthesis of geranylgeranyl pyrophosphate",
    metabolites={
        "ipdp_c": -1.0,
        "frdp_c": -1.0,
        "ggpp": 1.0,
        "ppi_c": 1.0
    },
)

# Add crtB gene 
add_single_gene_reaction_pair_lyc(
    model=model,
    gene_id="crtB",
    reaction_id="ZCRTB",
    reaction_name="Synthesis of phytoene",
    metabolites={
        "ggpp": -2.0,
        "phyto": 1.0,
        "ppi_c": 1.0
    },
)

# Add crtI gene 
add_single_gene_reaction_pair_lyc(
    model=model,
    gene_id="crtI",
    reaction_id="ZCRTI",
    reaction_name="Synthesis of lycopene from phytoene (dehydrogenation reaction)",
    metabolites={
        "phyto": -1.0,
        "fad_c": -8.0,
        "lyco": 1.0,
        "fadh2_c": 8.0
    },
)

# Add lycopene demand 
# https://cnls.lanl.gov/external/qbio2018/Slides/FBA%202/qBio-FBA-lab-slides.pdf (slide 21)
lyco_dem = cobra.Reaction("LYCOdem")
model.add_reactions([lyco_dem])
lyco_dem.name = "Lycopene demand reaction"
lyco_dem.lower_bound = 0
lyco_dem.upper_bound = 1000
lyco_dem.add_metabolites({"lyco": -1.0})

## Media setup

After the creation of the recombinant *E. coli* DH5α COBRApy model, the growth media must be simulated in COBRApy. The media used to grow the recombinant *E. coli* DH5α alpha cells was a [2xYT medium](https://sharebiology.com/2x-yt-medium/). Unfortunately, due to the lack of availability of exact nutrients/metabolites (exchange reactions) needed to simulate the 2xYT medium in COBRApy, we decided to simulate a closely related medium called LB medium, in which we were able to find all the metabolites here: [LB medium](https://github.com/cdanielmachado/carveme/blob/master/carveme/data/benchmark/media_db.tsv). The exact flux for all the nutrient exchange reactions was not available, so a flux of 3 mmol / [gDW h] was chosen for each.

In [3]:
LB_MEDIA_COMP = [
    "adn", "ala__L", "amp", "arg__L",
    "aso3", "asp__L", "ca2", "cbl1",
    "cd2", "cl", "cmp", "cobalt2",
    "cro4", "cu2", "cys__L", "dad_2",
    "dcyt", "fe2", "fe3", "fol", "glc__D",
    "glu__L", "gly", "gmp", "gsn", "h2o",
    "h2s", "h", "hg2", "his__L", "hxan",
    "ile__L", "ins", "k", "leu__L", "lipoate",
    "lys__L", "met__L", "mg2", "mn2", "mobd",
    "na1", "nac", "nh4", "ni2", "o2",
    "phe__L", "pheme", "pi", "pnto__R",
    "pro__L", "pydx", "ribflv", "ser__L",
    "so4", "thm", "thr__L", "thymd", "trp__L",
    "tyr__L", "ump", "ura", "uri", "val__L",
    "zn2",
]

for metabolite in LB_MEDIA_COMP:
    model.medium[f"EX_{metabolite}_e"] = 3

## Cobra-FSEOF simulations

FSEOF from cobra-fseof with 9 steps, setting lycopene production as enforced objective, biomass as main objective, and enforced direction as max, was performed on the recombinant *E. coli* DH5α COBRApy model from above, to see whether resulting reactions chosen as overexpression candidates match those reported in (Choi et al., 2010).

In [4]:
import cobra_fseof

results = cobra_fseof.fseof(model, 9, "LYCOdem", "BIOMASS_Ec_iJO1366_core_53p95M", "max")

FSEOF; Scanning: 100%|████████████████████████████| 9/9 [00:00<00:00, 26.96it/s]
FSEOF; Running FVA: 100%|█████████████████████████| 9/9 [02:24<00:00, 16.08s/it]


Here, we have printed the list of reactions that cobra-fseof identified as targets for overexpression.

In [5]:
scan = results.scan 
targets = set(scan[scan.target].index)
targets

{'AKGDH',
 'CDPMEK',
 'CYTK1',
 'DMATT',
 'DXPRIi',
 'DXPS',
 'FBA3',
 'FE3Ri',
 'FESD1s',
 'FESR',
 'GRTT',
 'IPDDI',
 'IPDPS',
 'MECDPDH2',
 'MECDPDH5',
 'MECDPS',
 'MEPCT',
 'MOX',
 'NDPK3',
 'PFK_3',
 'PGI',
 'PIt2rpp',
 'PItex',
 'POR5',
 'PYK',
 'RPE',
 'TKT1',
 'TKT2',
 'TPI',
 'ZCRTB',
 'ZCRTE',
 'ZCRTI'}

Below, we have listed out and categorized (by which main pathways they belong to) all target over-expression reactions for increased lycopene production identified by FSEOF. This data was taken from Supplemental Table 4A of (Choi et al., 2010). We first examined this table and matched them to the corresponding reactions in our COBRApy model, however some exceptions were made. Of the reactions identified:

* From the TCA cycle, `FUM_rxn` was not found in the COBRApy model so it is replaced with `FUM`.

* From the lycopene biosynthetic pathway, `MECHPDH` was not found in the COBRApy model. However, `MECDPDH2` and `MECDPDH5` which correspond to the same reaction were found and are listed here.

* From the glycolysis pathway, `FBA` replaced with `FBA3` and `PFK` replaced with `PFK_3`.

* Of all the other reactions, `CACTP` and `CRNt8pp` were not found in the COBRApy model.

In [6]:
# TCA cycle reactions
targets_tca = {"ACONT", "CS", "FUM", "ICDHyr", "MDH", "SUCDli", "SUCOAS", "AKGDH", "SUCD4"}

# Lycopene biosynthetic pathway
targets_lyc = {"CDPMEK", "DMATT", "DXPRIi", "DXPS","GRTT", "IPDDI", "IPDPS", "MECDPDH5", "MECDPDH2", "MECDPS", "MEPCT", "ZCRTE", "ZCRTB", "ZCRTI"}

# Glycolysis
targets_glyc = {"FBA3", "PFK_3", "PGI", "TPI"}

# Misc.
targets_other = {"ADK1", "ADK4", "CYTK1", "CACTP", "CRNt8pp", "PPA", "CO2t", "H2Ot", "PIt2rpp"}

Finally, we see how many of the above reactions cobra-fseof was able to recover.

In [7]:
targets_choi = targets_tca | targets_lyc | targets_glyc | targets_other

print(f"Number of reactions cobra-fseof identified: {len(targets)}")
print(f"Number of reactions Choi et al. identified: {len(targets_choi)}")
print(
    f"Percentage of reactions Choi et al. identified that cobra-fseof also identified: "
    f"{len(targets & targets_choi) / len(targets_choi):0.2%}"
)
print(
    f"Percentage of lycopene biosynthetic pathways Choi et al. identified that cobra-fseof also identified: "
    f"{len(targets & targets_lyc) / len(targets_lyc):0.2%}"
)

Number of reactions cobra-fseof identified: 32
Number of reactions Choi et al. identified: 36
Percentage of reactions Choi et al. identified that cobra-fseof also identified: 58.33%
Percentage of lycopene biosynthetic pathways Choi et al. identified that cobra-fseof also identified: 100.00%
