# Porting genome scale metabolic models for metabolomics

**Human-GEM as default human model, for better compatibility**

https://github.com/SysBioChalmers/Human-GEM

As mummichog 3 is under development, treat this as part of development.

**Use cobra to parse SBML models whereas applicable**

Not all models comply with the formats in cobra. Models from USCD and Thiele labs should comply.

**Base our code on metDataModel**

Each model needs a list of Reactions, list of Pathways, and a list of Compounds.
It's important to include with Compounds with all linked identifiers to other DBs (HMDB, PubChem, etc), and with formulae (usually charged form in these models) when available.
We can alwasy update the data later. E.g. the neural formulae can be retrieved from HMDB if linked.
Save in Python pickle and in JSON.

**Compartmentalized for now**

Remove compartments as a separate model.

Shuzhao Li, 2021-10-21

In [1]:
! pip install cobra



In [2]:
# https://github.com/shuzhao-li/metDataModel/ 
!pip install --upgrade metDataModel



In [3]:
from metDataModel.core import Compound, Reaction, Pathway, MetabolicModel

In [4]:
!pip install --upgrade numpy pandas



In [5]:
# https://cobrapy.readthedocs.io/en/latest/io.html#SBML
import cobra

In [6]:
xmlFile = 'Human-GEM/model/Human-GEM.xml'
model = cobra.io.read_sbml_model(xmlFile)

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled


In [7]:
model

0,1
Name,HumanGEM
Memory address,0x07f9e230bc490
Number of metabolites,8370
Number of reactions,13078
Number of groups,142
Objective expression,1.0*MAR13082 - 1.0*MAR13082_reverse_11d67
Compartments,"Cytosol, Extracellular, Lysosome, Endoplasmic reticulum, Mitochondria, Peroxisome, Golgi apparatus, Nucleus, Inner mitochondria"


In [8]:
model.metabolites[990]

0,1
Metabolite identifier,MAM00599m
Name,20-OH-LTB4
Memory address,0x07f9e22f3a940
Formula,C20H31O5
Compartment,m
In 2 reaction(s),"MAR01127, MAR01130"


In [9]:
model.reactions[33]

0,1
Reaction identifier,MAR07747
Name,
Memory address,0x07f9e217806d0
Stoichiometry,MAM01285c + MAM01965c --> MAM01334c + MAM01968c + MAM02039c  ADP + glucose --> AMP + glucose-6-phosphate + H+
GPR,ENSG00000159322
Lower bound,0.0
Upper bound,1000.0


In [10]:
model.reactions[33].genes

frozenset({<Gene ENSG00000159322 at 0x7f9e21b60850>})

In [11]:
[model.groups[11].name, model.groups[11].id, model.groups[11].members]

['Beta oxidation of branched-chain fatty acids (mitochondrial)',
 'group12',
 [<Reaction MAR03522 at 0x7f9e2010e610>,
  <Reaction MAR03523 at 0x7f9e2010ee50>,
  <Reaction MAR03524 at 0x7f9e2010e9a0>,
  <Reaction MAR03525 at 0x7f9e2010ef10>,
  <Reaction MAR03526 at 0x7f9e2010e790>,
  <Reaction MAR03527 at 0x7f9e2010eee0>,
  <Reaction MAR03528 at 0x7f9e2010efa0>,
  <Reaction MAR03529 at 0x7f9e2010ef40>,
  <Reaction MAR03530 at 0x7f9e20122730>,
  <Reaction MAR03531 at 0x7f9e20122dc0>,
  <Reaction MAR03532 at 0x7f9e20122f40>,
  <Reaction MAR03533 at 0x7f9e20122f10>,
  <Reaction MAR03534 at 0x7f9e20122940>]]

In [12]:
[model.metabolites[33].formula,
model.metabolites[33].charge,
 model.metabolites[33].name,
 model.metabolites[33].id,
 model.metabolites[33]._id,
 model.metabolites[33].annotation
]

['C41H64N7O17P3S',
 -4,
 '(11Z,14Z,17Z)-eicosatrienoyl-CoA',
 'MAM00012c',
 'MAM00012c',
 {'sbo': 'SBO:0000247',
  'kegg.compound': 'C16179',
  'lipidmaps': 'LMFA07050044',
  'vmhmetabolite': 'M00012',
  'metanetx.chemical': ['MNXM162872', 'MNXM6497']}]

In [13]:
def metabolite2compound(M):
    # convert cobra Metabolite to metDataModel Compound
    Cpd = Compound()
    Cpd.src_id = M.id
    Cpd.id = M.id               #.split("[")[0]
    Cpd.name = M.name
    Cpd.charge = M.charge
    Cpd.charged_formula = M.formula
    Cpd.db_ids = M.annotation
    return Cpd

metabolite2compound(model.metabolites[990]).db_ids

{'sbo': 'SBO:0000247',
 'bigg.metabolite': 'leuktrB4woh',
 'kegg.compound': 'C04853',
 'hmdb': 'HMDB01509',
 'chebi': 'CHEBI:15646',
 'pubchem.compound': '5280745',
 'lipidmaps': 'LMFA03020018',
 'vmhmetabolite': 'leuktrB4woh',
 'metanetx.chemical': ['MNXM1169', 'MNXM92716']}

In [14]:
myCpds = []
for M in model.metabolites:
    myCpds.append(metabolite2compound(M))

In [15]:
model.reactions[6].annotation

{'sbo': 'SBO:0000176',
 'ec-code': ['1.2.4.1', '2.3.1.12', '1.8.1.4', '1.2.1.51'],
 'kegg.reaction': 'R00209',
 'bigg.reaction': 'PDHm',
 'vmhreaction': 'PDHm',
 'metanetx.reaction': 'MNXR102425',
 'rhea': ['28043', '28042']}

In [16]:
## This is model dependent, as some models use symbol other than "[" !!!
## def mclean(x): return x.split("[")[0]

# port reactions, to include genes and enzymes
def port_reaction(R):
    new = Reaction()
    new.id = R.id
    new.reactants = [m.id for m in R.reactants] 
    new.products = [m.id for m in R.products] 
    new.genes = [g.id for g in R.genes]
    ecs = R.annotation.get('ec-code', [])
    if isinstance(ecs, list):
        new.enzymes = ecs
    else:
        new.enzymes = [ecs]       # this version of human-GEM may have it as string
    return new

test99 = port_reaction(model.reactions[199])
[test99.id,
 test99.reactants,
 test99.products,
 test99.genes,
 test99.enzymes
]

['MAR04501',
 ['MAM01761c', 'MAM02845c'],
 ['MAM01939c', 'MAM02884c'],
 ['ENSG00000007350', 'ENSG00000163931', 'ENSG00000151005'],
 ['2.2.1.1']]

In [17]:
# this is the compartmentalized version of reactions
## Reactions to port
myRxns = []
for R in model.reactions:
    myRxns.append( port_reaction(R) )
    
print(len(myRxns))

13078


In [18]:
# pathways, using group as pathway. Other models may use subsystem etc.

def port_pathway(P):
    new = Pathway()
    new.id = P.id
    new.source = ['Human-GEM v1.10.0',]
    new.name = P.name
    new.list_of_reactions = [x.id for x in P.members]
    return new

p = port_pathway(model.groups[33])

[p.id, p.name, p.list_of_reactions[:5]]

['group34',
 'Carnitine shuttle (endoplasmic reticular)',
 ['MAR02778', 'MAR02780', 'MAR02783', 'MAR02785', 'MAR02787']]

In [19]:
## Pathways to port
myPathways = []
for P in model.groups:
    myPathways.append(port_pathway(P))

len(myPathways)

142

## Collected data; now output

In [20]:
note = """Human-GEM compartmentalized, with genes and ECs."""

## metabolicModel to export
MM = MetabolicModel()
MM.id = 'az_HumanGEM_20211021'
MM.meta_data = {
            'species': 'human',
            'version': '',
            'sources': ['https://github.com/SysBioChalmers/Human-GEM, retrieved 2021-10-07'],
            'status': '',
            'last_update': '20211021',
            'note': note,
        }
MM.list_of_pathways = [P.serialize() for P in myPathways]
MM.list_of_reactions = [R.serialize() for R in  myRxns]
MM.list_of_compounds = [C.serialize() for C in myCpds]

In [21]:
# check output
[
MM.list_of_pathways[2],
MM.list_of_reactions[:2],
MM.list_of_compounds[100:102],
]

[{'id': 'group3',
  'name': 'Alanine, aspartate and glutamate metabolism',
  'list_of_reactions': ['MAR03802',
   'MAR03804',
   'MAR03811',
   'MAR03813',
   'MAR03822',
   'MAR03827',
   'MAR03829',
   'MAR03831',
   'MAR03862',
   'MAR03865',
   'MAR03870',
   'MAR03873',
   'MAR08654',
   'MAR03890',
   'MAR03892',
   'MAR09802',
   'MAR03899',
   'MAR03903',
   'MAR04109',
   'MAR04114',
   'MAR04115',
   'MAR04118',
   'MAR04172',
   'MAR04196',
   'MAR04197',
   'MAR04287',
   'MAR04690',
   'MAR04693',
   'MAR06780',
   'MAR06968',
   'MAR06969',
   'MAR06970',
   'MAR06971',
   'MAR06972',
   'MAR07641',
   'MAR07642',
   'MAR08626',
   'MAR08628',
   'MAR04285',
   'MAR11565']},
 [{'id': 'MAR03905',
   'reactants': ['MAM01796c', 'MAM02552c'],
   'products': ['MAM01249c', 'MAM02039c', 'MAM02553c'],
   'genes': ['ENSG00000248144',
    'ENSG00000197894',
    'ENSG00000172955',
    'ENSG00000198099',
    'ENSG00000187758',
    'ENSG00000196616',
    'ENSG00000147576',
    'ENSG00

In [22]:
import pickle

# pickled object can be imported later, and for Database upload
with open('MetabolicModel_az_HumanGEM_20211021.pickle', 'wb') as f:
    pickle.dump(MM.serialize(), f, pickle.HIGHEST_PROTOCOL)

In [23]:
import json

s = json.JSONEncoder().encode( MM.serialize() )
with open("metabolicModel_az_HumanGEM_20211021.json", "w") as O:
    O.write(s)

## Summary

This ports reactions, pathways and compounds. Gene and enzyme information is now included. 

The exported pickle can be re-imported and uploaded to Database easily.

This notebook, the pickle file and the JSON file go to GitHub repo (https://github.com/shuzhao-li/Azimuth).