# Using_an_SBML_model (python3)

This is a copy of Using_an_SBML_model (Python2) but built with a Python3 kernel to make sure everything works.

# Using an SBML model

## Getting started

### Installing libraries

Before you start, you will need to install a couple of libraries:
   
The [ModelSeedDatabase](https://github.com/ModelSEED/ModelSEEDDatabase) has all the biochemistry we'll need. You can install that with `git clone`.
   
The [PyFBA](http://linsalrob.github.io/PyFBA) library has detailed [installation instructions](http://linsalrob.github.io/PyFBA/installation.html). Don't be scared, its mostly just `pip install`.

(Optional) Also, get the [SEED Servers](https://github.com/linsalrob/SEED_Servers_Python) as you can get a lot of information from them. You can install the git python repo from github.  Make sure that the SEED_Servers_Python is in your PYTHONPATH.

We start with importing some modules that we are going to use. 

We import *sys* so that we can use standard out and standard error if we have some error messages.<br>
We import *copy* so that we can make a deep copy of data structures for later comparisons.<br>
Then we import the *PyFBA* module to get started.

In [1]:
import sys
import os
import copy
import PyFBA
import pickle

We are using /home/redwards/.local/lib/python3.9/site-packages/PyFBA-2.0-py3.9.egg/PyFBA/Biochemistry/ModelSEEDDatabase for our data


## Running an SBML model

If you have run your genome through RAST, you can download the [SBML](http://www.sbml.org/) model and use that directly.

We have provided an [SBML model of *Citrobacter sedlakii*](https://raw.githubusercontent.com/linsalrob/PyFBA/master/example_data/Citrobacter/Citrobacter_sedlakii.sbml) that you can download and use. You can right-ctrl click on this link and save the SBML file in the same location you are running this iPython notebook.

We use this SBML model to demonstrate the key points of the FBA approach: defining the reactions, including the boundary, or drainflux, reactions; the compounds, including the drain compounds; the media; and the reaction bounds. 

We'll take it step by step!

We start by parsing the model:

In [2]:
sbml = PyFBA.parse.parse_sbml_file("../example_data/Citrobacter/Citrobacter_sedlakii.sbml")

We are logging to /home/redwards/GitHubsLinux/PyFBA/iPythonNotebooks/PyFBA.2021-06-08T14:52:32.478329.log
Added compound cpd00060 | L_Methionine: cpd00060 and L_Methionine
Added compound cpd00067 | H: cpd00067 and H
Added compound cpd00001 | H2O: cpd00001 and H2O
Added compound cpd00035 | L_Alanine: cpd00035 and L_Alanine
Added compound cpd11590 | met_L_ala_L: cpd11590 and met_L_ala_L
Added compound cpd00161 | L_Threonine: cpd00161 and L_Threonine
Added compound cpd11582 | ala_L_Thr_L: cpd11582 and ala_L_Thr_L
Added compound cpd11589 | gly_asp_L: cpd11589 and gly_asp_L
Added compound cpd00041 | L_Aspartate: cpd00041 and L_Aspartate
Added compound cpd00033 | Glycine: cpd00033 and Glycine
Added compound cpd00084 | L_Cysteine: cpd00084 and L_Cysteine
Added compound cpd15603 | Gly_Cys: cpd15603 and Gly_Cys
Added compound cpd00023 | L_Glutamate: cpd00023 and L_Glutamate
Added compound cpd11586 | ala_L_glu_L: cpd11586 and ala_L_glu_L
Added compound cpd00132 | L_Asparagine: cpd00132 and L_Asp

Added compound cpd00637_b | D_Methionine_b: cpd00637_b and D_Methionine_b
Added compound cpd00098_b | Choline_b: cpd00098_b and Choline_b
Added compound cpd00971_b | Na_b: cpd00971_b and Na_b
Added compound cpd00023_b | L_Glutamate_b: cpd00023_b and L_Glutamate_b
Added compound cpd00396_b | L_Rhamnose_b: cpd00396_b and L_Rhamnose_b
Added compound cpd03424_b | Vitamin_B12_b: cpd03424_b and Vitamin_B12_b
Added compound cpd00423_b | Vitamin_B12r_b: cpd00423_b and Vitamin_B12r_b
Added compound cpd00246_b | Inosine_b: cpd00246_b and Inosine_b
Added compound cpd03279_b | Deoxyinosine_b: cpd03279_b and Deoxyinosine_b
Added compound cpd00367_b | Cytidine_b: cpd00367_b and Cytidine_b
Added compound cpd00277_b | Deoxyguanosine_b: cpd00277_b and Deoxyguanosine_b
Added compound cpd00249_b | Uridine_b: cpd00249_b and Uridine_b
Added compound cpd00184_b | Thymidine_b: cpd00184_b and Thymidine_b
Added compound cpd00412_b | Deoxyuridine_b: cpd00412_b and Deoxyuridine_b
Added compound cpd00182_b | Aden

### Find all the reactions and identify those that are boundary reactions

We need a set of reactions to run in the model. In this case, we are going to run all the reactions in our SBML file. However, you can change this set if you want to knock out reactions, add reactions, or generally modify the model. We store those in the `reactions_to_run` set.

The boundary reactions refer to compounds that are secreted but then need to be removed from the `reactions_to_run` set. We usually include a consumption of those compounds that is open ended, as if they are draining away. We store those reactions in the `uptake_secretion_reactions` dictionary.


In [3]:
# Get a dict of reactions.
# The key is the reaction ID, and the value is a metabolism.reaction.Reaction object
reactions = sbml.reactions
reactions_to_run = set()
uptake_secretion_reactions = {}
biomass_equation = None
for r in reactions:
    if 'biomass_equation' == r:
        biomass_equation = reactions[r]
        print(f"Our biomass equation is {biomass_equation.readable_name}")
        continue
    is_boundary = False
    for c in reactions[r].all_compounds():
        if c.uptake_secretion:
            is_boundary = True
            break
    if is_boundary:
        reactions[r].is_uptake_secretion = True
        uptake_secretion_reactions[r] = reactions[r]
    else:
        reactions_to_run.add(r)


Our biomass equation is Citrobacter_sedlakii_119_auto_biomass


At this point, we can take a look at how many reactions are in the model, not counting the biomass reaction:

In [4]:
print(f"The biomass equation is {biomass_equation}")
print("There are {} reactions in the model".format(len(reactions)))
print("There are {}".format(len(uptake_secretion_reactions)),
      "uptake/secretion reactions in the model")
print("There are {}".format(len(reactions_to_run)),
      "reactions to be run in the model")

The biomass equation is biomass_equation: Citrobacter_sedlakii_119_auto_biomass
There are 1574 reactions in the model
There are 0 uptake/secretion reactions in the model
There are 1573 reactions to be run in the model


In [5]:
import pickle
with open('rgood.txt', 'w') as out:
    for r in reactions:
        out.write(f"{r}\n")

### Find all the compounds in the model, and filter out those that are secreted

We need to filter out uptake and secretion compounds from our list of all compounds before we can make a stoichiometric matrix.

In [6]:
# Get a dict of compounds. 
# The key is the string representation of the compound and
# the value is a metabolite.compound.Compound object
# Get a dict of compounds. 
# The key is the string representation of the compound and
# the value is a metabolite.compound.Compound object
all_compounds = sbml.compounds
# Filter for compounds that are boundary compounds
filtered_compounds = set()
for c in all_compounds:
    if not c.uptake_secretion:
        filtered_compounds.add(c)

Again, we can see how many compounds there are in the model.

In [7]:
print("There are {} total compounds in the model".format(len(all_compounds)))
print("There are {}".format(len(filtered_compounds)),
      "compounds that are not involved in uptake and secretion")

There are 1475 total compounds in the model
There are 1301 compounds that are not involved in uptake and secretion


And now we have the size of our stoichiometric matrix! Notice that the stoichiometric matrix is composed of the reactions that we are going to run and the compounds that are in those reactions (but not the uptake/secretion reactions and compounds).

In [8]:
print(f"The stoichiometric matrix will be {len(reactions_to_run):,} reactions by {len(filtered_compounds):,} compounds")

The stoichiometric matrix will be 1,573 reactions by 1,301 compounds


### Read the media file, and correct the media names

In our [media](https://github.com/linsalrob/PyFBA/tree/master/media) directory, we have a lot of different media formulations, most of which we use with the Genotype-Phenotype project. For this example, we are going to use Lysogeny Broth (LB). There are many different formulations of LB, but we have included the recipe created by the folks at Argonne so that it is comparable with their analysis. You can download [ArgonneLB.txt](https://raw.githubusercontent.com/linsalrob/PyFBA/master/media/ArgonneLB.txt) and put it in the same directory as this iPython notebook to run it.

Once we have read the file we need to correct the names in the compounds. Sometimes when compound names are exported to the SBML file they are modified slightly. This just corrects those names.

In [9]:
# Read the media file
media = PyFBA.parse.read_media_file("/home/redwards/.local/lib/python3.9/site-packages/PyFBA-2.1-py3.9.egg/PyFBA/Biochemistry/media/ArgonneLB.txt")
# Correct the names
media = sbml.correct_media(media)
print(f"The media has {len(media)} compounds")

The media has 65 compounds


Checking media compounds: Our compounds do not include  L-Cystine
Checking media compounds: Our compounds do not include  chromate
Checking media compounds: Our compounds do not include  Ni2+
Checking media compounds: Our compounds do not include  Thiamine phosphate
Checking media compounds: Our compounds do not include  Vitamin B12
Checking media compounds: Our compounds do not include  Molybdate
It just means that we did not find that compound anywhere in the reactions, and so it is unlikely to be
needed or used. We typically see a few of these in rich media.


In [10]:
print(f"There are {len(filtered_compounds):,} filtered compounds, {len(all_compounds):,} all compounds, and {len(reactions):,} reactions")

There are 1,301 filtered compounds, 1,475 all compounds, and 1,574 reactions


### Set the reaction bounds for uptake/secretion compounds

The uptake and secretion compounds typically have reaction bounds that allow them to be consumed (i.e. diffuse away from the cell) but not produced. However, our media components can also increase in concentration (i.e. diffuse to the cell) and thus the bounds are set higher. Whenever you change the growth media, you also need to adjust the reaction bounds to ensure that the media can be consumed!


In [11]:
# Adjust the lower bounds of uptake secretion reactions
# for things that are not in the media
for u in uptake_secretion_reactions:
    is_media_component = False
    for c in uptake_secretion_reactions[u].all_compounds():
        if c in media:
            is_media_component = True
    if not is_media_component:
        reactions[u].lower_bound = 0.0
        uptake_secretion_reactions[u].lower_bound = 0.0

### Run the FBA

Now that we have constructed our model, we can run the FBA!

In [12]:
status, value, growth = PyFBA.fba.run_fba(filtered_compounds, reactions,
                                          reactions_to_run, media, biomass_equation,
                                          uptake_secretion_reactions, verbose=True)
print("The FBA completed with a flux value of {} --> growth: {}".format(value, growth))


create_stoichiometric_matrix is adding compound cpd00028: Heme (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00091: UMP (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00215: Pyridoxal (location: e) from media to compounds
create_stoichiometric_matrix is adding compound Media046: L-Cystine (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00066: L_Phenylalanine (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00239: H2S (location: e) from media to compounds
create_stoichiometric_matrix is adding compound Media047: chromate (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00311: Guanosine (location: e) from media to compounds
create_stoichiometric_matrix is adding compound Media064: Ni2+ (location: e) from media to compounds
create_stoichiometric_matrix is adding compound cpd00018: AMP (

The FBA completed with a flux value of 486.54008183384235 --> growth: True


Length of the media: 65
Number of reactions to run: 1573
Number of compounds in SM: 1163
Number of reactions in SM: 1758
Revised number of total reactions: 1758
Number of total compounds: 1319
SMat dimensions: 1163 x 1758


# Export the components of the model

I am trying to compare this model to one built just from reactions, and so to see what's working, I'm going to export all the components and then import them as need.

Went down a big rabbit hole on this one, because `reactions` and `compounds` are both recursive (`reactions` contain `compounds` and `compounds` are involved in `reactions`). The following code block fails for some compounds with an `AttributeError`:

```
for c in filtered_compounds:
    t = copy.deepcopy(c)
```

The same error occurs during pickling.

We initially could not solve this by a shallow copy of the data, mainly because `reaction.left_compounds` and `reaction.left_compound_abundance` end up not being the same thing, but I solved that by implementing `__setstate__` and `__getstate__` and now we can pickle and unpickle the data and the FBA still solves.

In [13]:
ec={}
ma = []
for c in filtered_compounds:
    try:
        pickle.dump(c, open('testcpd.pickle', 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
        l = pickle.load(open('testcpd.pickle', 'rb'))
        if c.name == 'Myristic_acid':
            d = c.__dict__.copy()
            d['pickle'] = 'passed'
            ma.append(d)
    except AttributeError:
        print(f"Error with {c}: |{c.name}|")
        if c.name == 'Myristic_acid':
            d = c.__dict__.copy()
            d['pickle'] = 'failed'
            ma.append(d)

print(ma)
    

[{'id': 'cpd03847', 'name': 'Myristic_acid', 'reactions': set(), 'model_seed_id': 'cpd03847', 'alternate_seed_ids': set(), 'abbreviation': 'cpd03847_c0', 'aliases': None, 'formula': None, 'mw': 0, 'common': False, 'charge': '0', 'is_cofactor': False, 'linked_compound': False, 'pka': 0, 'pkb': 0, 'is_obsolete': False, 'abstract_compound': False, 'uptake_secretion': False, 'is_core': False, 'inchikey': 0, 'location': 'c', 'pickle': 'passed'}, {'id': 'cpd03847', 'name': 'Myristic_acid', 'reactions': {<PyFBA.metabolism.reaction.Reaction object at 0x7f19f1e34af0>}, 'model_seed_id': 'cpd03847', 'alternate_seed_ids': set(), 'abbreviation': 'cpd03847_e0', 'aliases': None, 'formula': None, 'mw': 0, 'common': False, 'charge': '0', 'is_cofactor': False, 'linked_compound': False, 'pka': 0, 'pkb': 0, 'is_obsolete': False, 'abstract_compound': False, 'uptake_secretion': False, 'is_core': False, 'inchikey': 0, 'location': 'e', 'pickle': 'passed'}]


In [14]:
pickle.dump(filtered_compounds, open('compounds.pickle', 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
pickle.dump(reactions, open('reactions.pickle', 'wb'))
pickle.dump(reactions_to_run, open('reactions_to_run.pickle', 'wb'))
pickle.dump(media, open('media.pickle', 'wb'))
pickle.dump(biomass_equation, open('sbml_biomass.pickle', 'wb'))
pickle.dump(uptake_secretion_reactions, open('uptake_secretion_reactions.pickle', 'wb'))

In [15]:
sbml_filtered_compounds = pickle.load(open('compounds.pickle', 'rb'))
sbml_reactions = pickle.load(open('reactions.pickle', 'rb'))
sbml_reactions_to_run = pickle.load(open('reactions_to_run.pickle', 'rb'))
sbml_media = pickle.load(open('media.pickle', 'rb'))
sbml_biomass_equation = pickle.load(open('sbml_biomass.pickle', 'rb'))
sbml_uptake_secretion_reactions = pickle.load(open('uptake_secretion_reactions.pickle', 'rb'))

In [16]:
status, value, growth = PyFBA.fba.run_fba(sbml_filtered_compounds, sbml_reactions,
                                          sbml_reactions_to_run, sbml_media, sbml_biomass_equation,
                                          sbml_uptake_secretion_reactions, verbose=True)
print("The FBA completed with a flux value of {} --> growth: {}".format(value, growth))


create_stoichiometric_matrix found 184 reactions
In the model there are : 1163 compounds and 1758 reactions
We are loading 1163 rows and 1758 columns


The FBA completed with a flux value of 486.54008183384235 --> growth: True


Length of the media: 65
Number of reactions to run: 1573
Number of compounds in SM: 1163
Number of reactions in SM: 1758
Revised number of total reactions: 1758
Number of total compounds: 1319
SMat dimensions: 1163 x 1758


In [17]:
for r in sbml_reactions:
    for c in sbml_reactions[r].all_compounds():
        if '_' in c.name:
            print(c.name)

L_Methionine
met_L_ala_L
L_Alanine
L_Alanine
ala_L_Thr_L
L_Threonine
gly_asp_L
L_Aspartate
Gly_Cys
L_Cysteine
L_Glutamate
L_Alanine
ala_L_glu_L
L_Asparagine
gly_asn_L
L_Leucine
Gly_Leu
L_alanylglycine
L_Alanine
L_Alanine
ala_L_asp_L
L_Aspartate
Cys_Gly
L_Cysteine
Gly_Gln
L_Glutamine
Gly_Phe
L_Phenylalanine
Ala_His
L_Histidine
L_Alanine
L_Glutamate
gly_glu_L
L_Alanine
Ala_Gln
L_Glutamine
Gly_Tyr
L_Tyrosine
gly_pro_L
L_Proline
L_Alanine
Ala_Leu
L_Leucine
L_Methionine
Gly_Met
Glucose_1_phosphate
Butyryl_CoA
Crotonyl_CoA
Isobutyryl_CoA
Methacrylyl_CoA
2_Methylbutyryl_CoA
Tiglyl_CoA
Acetyl_CoA
N_Acetyl_D_glucosamine1_phosphate
D_Glucosamine1_phosphate
N_Acetyl_D_glucosamine1_phosphate
UDP_N_acetylglucosamine
L_Argininosuccinate
L_Arginine
4_Carboxymuconolactone
3_oxoadipate_enol_lactone
2_Demethylmenaquinol_8
2_Demethylmenaquinone_8
Ubiquinone_8
Ubiquinol_8
Menaquinol_8
Menaquinone_8
1_anteisoheptadecanoyl_sn_glycerol_3_phosphate
1_2_dianteisoheptadecanoyl_sn_glycerol_3_phosphate
1_2_diisoh

S_Succinyldihydrolipoamide
Succinyl_CoA
4_Hydroxybenzoate
3_Octaprenyl_4_hydroxybenzoate
UDP_MurNAc
UDP_N_acetylglucosamine_enolpyruvate
kdo2_lipid_a
ADP_L_glycero_D_manno_heptose
heptosyl_kdo2_lipidA
L_Threonine
L_2_Amino_acetoacetate
L_Serine
6_phospho_D_glucono_1_5_lactone
6_Phospho_D_gluconate
dTDP_rhamnose
dTDP_4_oxo_L_rhamnose
Peptide_L_methionine
Peptide_L_methionine_R_S_oxide
Glycerone_phosphate
L_Rhamnulose_1_phosphate
L_Lactaldehyde
L_Xylulose_5_phosphate
L_ribulose_5_phosphate
2_C_methyl_D_erythritol4_phosphate
4__cytidine5_diphospho_2_C_methyl_D_erythritol
D_Tagatose_1_6_biphosphate
Glycerone_phosphate
Glyceraldehyde3_phosphate
L_Arabinono_1_4_lactone
L_Arabinose
Ala_Ala
D_Alanine
beta_Alanine
L_Aspartate
L_Tyrosine
L_Aspartate
L_Glutamate
2_isopropyl_3_oxosuccinate
R_Allantoin
5_Hydroxyisourate
5_Hydroxyisourate
L_Glutamate5_semialdehyde
1_Pyrroline_5_carboxylate
3_Amino_2_oxopropyl_phosphate
2_Amino_3_oxo_4_phosphonooxybutyrate
L_2_Amino_6_oxopimelate
L_2_Amino_acetoaceta