## Porting genome scale metabolic models for metabolomics (AGORA)

- to make formats compatible to mummichog
- to link to a common compound table
- from compound table, we generated predicted mass peaks based on formula

As mummichog 3 is under development, treat this as part of development.

*Use cobra to parse SBML models whereas applicable*

Not all models comply with the formats in cobra. Models from USCD and Thiele labs should comply.

*Base our code on metDataModel*

Each model needs a list of Reactions, a list of Pathways, and a list of Compounds. It's important to include Compounds with all linked identifiers to other DBs (HMDB, PubChem, etc), and with formulae (usually charged form in these models) when available. We can always update the data later. E.g. the neural formulae can be retrieved from HMDB if linked. Save in Python pickle and in JSON.

Georgi Kolishovski, 2021-05-12

In [1]:
pip install cobra




In [2]:
pip install --upgrade metDataModel

Requirement already up-to-date: metDataModel in c:\users\kolisg\anaconda3\lib\site-packages (0.3.1)
Note: you may need to restart the kernel to use updated packages.


In [3]:
# https://cobrapy.readthedocs.io/en/latest/io.html#SBML
import cobra

from metDataModel.core import Compound, Reaction, Pathway, metabolicModel

In [4]:
# cloned on 2021-05-12 from https://github.com/VirtualMetabolicHuman

# source data directory: change on your instance
source_dir = "AGORA\CurrentVersion\AGORA_1_03\AGORA_1_03_sbml"
agora = f"{source_dir}\Fusobacterium_nucleatum_subsp_animalis_3_1_33.xml"

model = cobra.io.read_sbml_model(agora)
model

0,1
Name,M_Fusobacterium_nucleatum_subsp_animalis_3_1_33__44____32__AGORA__32__version__32__1__46__03
Memory address,0x02466d114bb0
Number of metabolites,917
Number of reactions,973
Number of groups,69
Objective expression,1.0*biomass525 - 1.0*biomass525_reverse_5c178
Compartments,"Cytoplasm, Extracellular"


In [5]:
[
    model.name,
    model.metabolites[33].formula,
    model.metabolites[33].charge,
    model.metabolites[33].name,
    model.metabolites[33].id,
    model.metabolites[33]._id,
    model.metabolites[33].annotation
]

['Fusobacterium nucleatum subsp. animalis 3_1_33',
 'C27H51N2O8PRS',
 -1,
 '14-methyl-pentadecanoyl-ACP',
 '14mpentdecACP[c]',
 '14mpentdecACP[c]',
 {}]

In [6]:
model.reactions[33]

0,1
Reaction identifier,3HAD80
Name,3-hydroxyacyl-[acyl-carrier-protein] dehydratase (n-C8:0)
Memory address,0x02466d2eb8e0
Stoichiometry,3hoctACP[c] --> h2o[c] + toct2eACP[c]  (R)-3-Hydroxyoctanoyl-[acyl-carrier protein] --> Water + trans-Oct-2-enoyl-[acyl-carrier protein]
GPR,469603.3.peg.782
Lower bound,0.0
Upper bound,1000.0


In [7]:
def metabolite2compound(M):
    # convert cobra Metabolite to metDataModel Compound
    Cpd = Compound()
    Cpd.src_id = M.id
    Cpd.id = M.id.split("[")[0]
    Cpd.name = M.name
    Cpd.charge = M.charge
    Cpd.charged_formula = M.formula
    Cpd.db_ids = M.annotation
    return Cpd

metabolite2compound(model.metabolites[33]).id

'14mpentdecACP'

In [8]:
# list of Compounds
myCpds = []
anno = {}
for M in model.metabolites:
    anno[M.id.split("[")[0]] = M.annotation
    myCpds.append(metabolite2compound(M))
    
print("total, ", len(myCpds), len(anno))

total,  917 816


In [9]:
# de-compartmentalize metabolites

# this overrides repeated ids
myDict = {}
for M in myCpds: myDict[M.id] = M
print(len(myDict))

## Compounds to port
AGORA_Compounds = list(myDict.values()) 

816


In [10]:
## this is model dependent, as some models use symbols other than "[" !!!
def mclean(x): return x.split("[")[0]

# port reactions
def port_reaction(R):
    new = Reaction()
    new.id = R.id
    new.reactants = [mclean(m.id) for m in R.reactants] 
    new.products = [mclean(m.id) for m in R.products] 
    return new

test199 = port_reaction(model.reactions[199])
[test199.id,
 test199.reactants,
 test199.products,
]

['AMOPBHL', ['amopbut_L', 'h'], ['3a2opp', 'co2']]

In [11]:
# this is the compartmentalized version of reactions
# Reactions to port
AGORA_Reactions = []
for R in model.reactions:
    AGORA_Reactions.append(port_reaction(R))

print(len(AGORA_Reactions))

973


In [12]:
# pathways, using group as pathway from AGORA. Other models may use subsystem etc.

def port_pathway(P):
    new = Pathway()
    new.id = P.id
    new.source = ['AGORA',]
    new.name = P.name
    new.list_of_reactions = [x.id for x in P.members]
    return new

p = port_pathway(model.groups[33])

[p.id, p.name, p.list_of_reactions]

['group34', 'Nitrogen metabolism', ['CBMTHL2']]

In [13]:
## Pathways to port
AGORA_Pathways = []
for P in model.groups:
    AGORA_Pathways.append(port_pathway(P))

len(AGORA_Pathways)

69

In [15]:
note = """AGORA cloned from https://github.com/VirtualMetabolicHuman, 2021-05-12.
Compounds are decompartmentalized, but Reactions are not. 
The redundant metabolites will be merged ad hoc when pathways and reactions are pulled.
"""

## metabolicModel to export
MM = metabolicModel()
MM.id = f"az_AGORA_20210512_{model.name}"
MM.meta_data = {
            'species': model.name,
            'version': '',
            'sources': ['https://github.com/VirtualMetabolicHuman, retrieved 2021-05-12'],
            'status': '',
            'last_update': '20210512',
            'note': note,
        }
MM.list_of_pathways = [P.serialize() for P in AGORA_Pathways]
MM.list_of_reactions = [R.serialize() for R in  AGORA_Reactions]
MM.list_of_compounds = [C.serialize() for C in AGORA_Compounds]

In [18]:
# check output
[
    MM.id,
    MM.list_of_pathways[:2], 
    MM.list_of_reactions[:2], 
    MM.list_of_compounds[100:102],
]

['az_AGORA_20210512_Fusobacterium nucleatum subsp. animalis 3_1_33',
 [{'id': 'group1',
   'name': 'Alanine and aspartate metabolism',
   'list_of_reactions': ['ALAALA',
    'ALAR',
    'ARGSL',
    'ARGSSr',
    'ASNS2',
    'ASPT',
    'ASPTA',
    'r0127']},
  {'id': 'group2',
   'name': 'Aminosugar metabolism',
   'list_of_reactions': ['AGDC',
    'G6PDA',
    'GF6PTA',
    'UAG4E',
    'UAGAAT',
    'UAGCVT_r',
    'UAGDP']}],
 [{'id': '12PPDSDH', 'reactants': ['12ppd_S'], 'products': ['h2o', 'ppal']},
  {'id': '15DAPt', 'reactants': ['15dap'], 'products': ['15dap']}],
 [{'id': '3hbcoa_R',
   'name': '(R)-3-hydroxybutanoyl-CoA(4-)',
   'identifiers': {'hmdb': 'HMDB01166',
    'kegg.compound': 'C03561',
    'pubchem.compound': '11966146'},
   'neutral_formula': '',
   'charge': -4,
   'charged_formula': 'C25H38N7O18P3S',
   'neutral_mono_mass': 0.0,
   'SMILES': '',
   'inchi': ''},
  {'id': '3hdcoa',
   'name': '(S)-3-Hydroxydecanoyl-CoA',
   'identifiers': {'hmdb': 'HMDB03938',
 

In [None]:
import pickle

# pickled object can be imported later, and for Database upload
with open('metabolicModel_AGORA_20210512.pickle', 'wb') as f:
    pickle.dump(MM.serialize(), f, pickle.HIGHEST_PROTOCOL)

In [None]:
import json

s = json.JSONEncoder().encode(MM.serialize())
with open("metabolicModel_AGORA_20210512.json", "w") as O:
    O.write(s)

## Summary

This ports reactions, pathways and compounds. Gene and enzyme information is not included. They should be when someone has time to do it.

The exported pickles can be re-imported and uploaded to database easily.

This notebook, the pickle file and the JSON file go to GitHub repo (https://github.com/shuzhao-li/Azimuth).