# Porting genome scale metabolic models for metabolomics

**Human-GEM as default human model, for better compatibility**
https://github.com/SysBioChalmers/Human-GEM

**Use cobra to parse SBML models whereas applicable**

Not all models comply with the formats in cobra. Models from USCD and Thiele labs should comply.

**Base our code on metDataModel**

Each model needs a list of Reactions, list of Pathways, and a list of Compounds.
It's important to include with Compounds with all linked identifiers to other DBs (HMDB, PubChem, etc), and with formulae (usually charged form in these models) when available.
We can alwasy update the data later. E.g. the neural formulae can be inferred from charged formula or retrieved from public metabolite database (e.g., HMDB) if linked.
Save in Python pickle and in JSON.

**No compartmentalization**
- After decompartmentalization,
  - transport reactions can be removed - they are identified by reactants and products being the same.
  - redundant reactions can be merge - same reactions in diff compartments become one.

Shuzhao Li, 2021-10-21|
Minghao Gong, 2022-04-19

In [1]:
# !pip install cobra --user --ignore-installed ruamel.yaml
# !pip install --upgrade metDataModel # https://github.com/shuzhao-li/metDataModel/ 
# !pip install --upgrade numpy pandas

In [2]:
import cobra # https://cobrapy.readthedocs.io/en/latest/io.html#SBML
from metDataModel.core import Compound, Reaction, Pathway, MetabolicModel
import requests
import sys

sys.path.append("/Users/gongm/Documents/projects/mass2chem/")
sys.path.append("/Users/gongm/Documents/projects/JMS/JMS/JMS")
from mass2chem.formula import *
from jms.formula import *
from jms.utils.gems import *

In [3]:
# download the most updated Human-GEM.xml
HG_xml_path = "../testdata/HumanGEM/Human-GEM.xml"
with open(HG_xml_path, 'w') as f:
    r = requests.get('https://github.com/SysBioChalmers/Human-GEM/blob/main/model/Human-GEM.xml?raw=true')
    f.write(r.text)

In [4]:
# Read the model via cobra
xmlFile = HG_xml_path
model = cobra.io.read_sbml_model(xmlFile)

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled


In [5]:
model

0,1
Name,HumanGEM
Memory address,0x07fc28e428190
Number of metabolites,8366
Number of reactions,13069
Number of groups,142
Objective expression,1.0*MAR13082 - 1.0*MAR13082_reverse_11d67
Compartments,"Cytosol, Extracellular, Lysosome, Endoplasmic reticulum, Mitochondria, Peroxisome, Golgi apparatus, Nucleus, Inner mitochondria"


In [6]:
# metabolite entries, readily convert to list of metabolites
model.metabolites[990] 

0,1
Metabolite identifier,MAM00599r
Name,20-OH-LTB4
Memory address,0x07fc2924ac0d0
Formula,C20H31O5
Compartment,r
In 6 reaction(s),"MAR01138, MAR10382, MAR01575, MAR01131, MAR08561, MAR01128"


In [7]:
# reaction entries, Readily convert to list of reactions
model.reactions[33] 

0,1
Reaction identifier,MAR07747
Name,
Memory address,0x07fc252f772b0
Stoichiometry,MAM01285c + MAM01965c --> MAM01334c + MAM01968c + MAM02039c  ADP + glucose --> AMP + glucose-6-phosphate + H+
GPR,ENSG00000159322
Lower bound,0.0
Upper bound,1000.0


In [8]:
# groups are similar to pathways? Readily convert to list of pathway
model.groups[11].__dict__

{'_id': 'group12',
 'name': 'Beta oxidation of branched-chain fatty acids (mitochondrial)',
 'notes': {},
 '_annotation': {'sbo': 'SBO:0000633'},
 '_members': [<Reaction MAR03522 at 0x7fc254882700>,
  <Reaction MAR03523 at 0x7fc254882eb0>,
  <Reaction MAR03524 at 0x7fc254894bb0>,
  <Reaction MAR03525 at 0x7fc254894fd0>,
  <Reaction MAR03526 at 0x7fc254882ee0>,
  <Reaction MAR03527 at 0x7fc254882880>,
  <Reaction MAR03528 at 0x7fc254894eb0>,
  <Reaction MAR03529 at 0x7fc254894fa0>,
  <Reaction MAR03530 at 0x7fc254894cd0>,
  <Reaction MAR03531 at 0x7fc254894b20>,
  <Reaction MAR03532 at 0x7fc254882af0>,
  <Reaction MAR03533 at 0x7fc254894910>,
  <Reaction MAR03534 at 0x7fc254894ac0>],
 '_kind': 'partonomy',
 '_model': <Model HumanGEM at 0x7fc28e428190>}

## Port metabolite

In [9]:
def port_metabolite(M):
    # convert cobra Metabolite to metDataModel Compound
    Cpd = Compound()
    Cpd.src_id = remove_compartment_by_substr(M.id,1)
    Cpd.id = remove_compartment_by_substr(M.id,1)              # temporarily the same with the source id
    Cpd.name = M.name
    Cpd.charge = M.charge
    Cpd.neutral_formula = adjust_charge_in_formula(M.formula,M.charge)
    Cpd.neutral_mono_mass = neutral_formula2mass(Cpd.neutral_formula)
    Cpd.charged_formula = M.formula
    Cpd.db_ids = [['humanGEM',Cpd.src_id]] # using src_id to also reference humanGEM ID in db_ids field
    for k,v in M.annotation.items():
        if k != 'sbo':
            if isinstance(v,list):
                Cpd.db_ids.append([[k,x] for x in v])
            else: 
                if ":" in v:
                    Cpd.db_ids.append([k,v.split(":")[1]])
                else:
                    Cpd.db_ids.append([k,v])
    
    inchi_list = [x[1].split('=')[1] for x in Cpd.db_ids if x[0] == 'inchi']
    if len(inchi_list) ==1:
        Cpd.inchi = inchi_list[0]
    elif len(inchi_list) >1:
        Cpd.inchi = inchi_list
    
    return Cpd

In [10]:
myCpds = []
for i in range(len(model.metabolites)):
    myCpds.append(port_metabolite(model.metabolites[i]))

In [11]:
len(myCpds)

8366

In [12]:
# remove duplicated compounds
myCpds = remove_duplicate_cpd(myCpds)

In [13]:
len(myCpds)

4112

## Port reactions

In [14]:
# port reactions, to include genes and enzymes
def port_reaction(R):
    new = Reaction()
    new.id = R.id
    new.reactants = [remove_compartment_by_substr(m.id,1) for m in R.reactants] # decompartmentalization
    new.products = [remove_compartment_by_substr(m.id,1) for m in R.products]   # decompartmentalization
    new.genes = [g.id for g in R.genes]
    ecs = R.annotation.get('ec-code', [])
    if isinstance(ecs, list):
        new.enzymes = ecs
    else:
        new.enzymes = [ecs]       # this version of human-GEM may have it as string
    return new

test99 = port_reaction(model.reactions[199])
[test99.id,
 test99.reactants,
 test99.products,
 test99.genes,
 test99.enzymes
]

['MAR04501',
 ['MAM01761', 'MAM02845'],
 ['MAM01939', 'MAM02884'],
 ['ENSG00000151005', 'ENSG00000007350', 'ENSG00000163931'],
 ['2.2.1.1']]

In [15]:
## Reactions to port
myRxns = []
for R in model.reactions:
    myRxns.append( port_reaction(R) )
    
print(len(myRxns))

13069


In [16]:
# remove duplicated reactions after decompartmentalization
myRxns = remove_duplicate_rxn(myRxns)

In [17]:
len(myRxns)

8876

## Port pathway

In [18]:
# pathways, using group as pathway. Other models may use subsystem etc.

def port_pathway(P):
    new = Pathway()
    new.id = P.id
    new.source = ['Human-GEM v1.10.0',]
    new.name = P.name
    new.list_of_reactions = [x.id for x in P.members]
    return new

p = port_pathway(model.groups[12])

[p.id, p.name, p.list_of_reactions[:5]]

['group13',
 'Beta oxidation of di-unsaturated fatty acids (n-6) (mitochondrial)',
 ['MAR03275', 'MAR03277', 'MAR03278', 'MAR03279', 'MAR03280']]

In [19]:
## Pathways to port
myPathways = []
for P in model.groups:
    myPathways.append(port_pathway(P))

len(myPathways)

142

In [20]:
# retain the valid reactions in list of pathway
myPathways = retain_valid_Rxns_in_Pathways(myPathways,myRxns)

In [21]:
# test if the length of unique reactions matched with the length of decompartmentalized reaction list 
test_list_Rxns = []
for pathway in myPathways:
    for y in pathway.list_of_reactions:
        test_list_Rxns.append(y)

len(set(test_list_Rxns))

8876

## Collected data; now output

In [22]:
from datetime import datetime
today =  str(datetime.today()).split(" ")[0]

In [23]:
today

'2022-04-23'

In [24]:
note = """Human-GEM compartmentalized, with genes and ECs."""

## metabolicModel to export
MM = MetabolicModel()
MM.id = f'az_HumanGEM_{today}' #
MM.meta_data = {
            'species': 'human',
            'version': '',
            'sources': [f'https://github.com/SysBioChalmers/Human-GEM, retrieved {today}'], #
            'status': '',
            'last_update': today,  #
            'note': note,
        }
MM.list_of_pathways = [P.serialize() for P in myPathways]
MM.list_of_reactions = [R.serialize() for R in  myRxns]
MM.list_of_compounds = [C.serialize() for C in myCpds]

In [25]:
# check output
[
MM.list_of_pathways[2],
MM.list_of_reactions[:2],
MM.list_of_compounds[100:102],
]

[{'id': 'group3',
  'name': 'Alanine, aspartate and glutamate metabolism',
  'list_of_reactions': ['MAR03802',
   'MAR03804',
   'MAR03811',
   'MAR03813',
   'MAR03822',
   'MAR03827',
   'MAR03829',
   'MAR03831',
   'MAR03862',
   'MAR03865',
   'MAR03870',
   'MAR03873',
   'MAR08654',
   'MAR03890',
   'MAR03892',
   'MAR09802',
   'MAR03899',
   'MAR03903',
   'MAR04109',
   'MAR04114',
   'MAR04115',
   'MAR04118',
   'MAR04172',
   'MAR04196',
   'MAR04197',
   'MAR04287',
   'MAR04690',
   'MAR04693',
   'MAR06780',
   'MAR06968',
   'MAR06969',
   'MAR06970',
   'MAR06971',
   'MAR06972',
   'MAR07641',
   'MAR07642',
   'MAR08626',
   'MAR08628',
   'MAR04285',
   'MAR11565']},
 [{'id': 'MAR03905',
   'reactants': ['MAM01796', 'MAM02552'],
   'products': ['MAM01249', 'MAM02039', 'MAM02553'],
   'genes': ['ENSG00000196616',
    'ENSG00000187758',
    'ENSG00000248144',
    'ENSG00000196344',
    'ENSG00000147576',
    'ENSG00000172955',
    'ENSG00000180011',
    'ENSG0000019

In [26]:
output_fdr = "../testdata/HumanGEM/"

In [27]:
import pickle
import os

# Write pickle file
export_pickle(os.path.join(output_fdr,f'{MM.id}.pickle'), MM)

In [28]:
# Write json file
export_json(os.path.join(output_fdr,f'{MM.id}.json'), MM)

In [29]:
# Write dataframe 
import pandas as pd
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_compounds.csv'),MM, 'list_of_compounds')
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_reactions.csv'),MM, 'list_of_reactions')
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_pathways.csv'),MM, 'list_of_pathways')

## Summary

This ports reactions, pathways and compounds. Gene and enzyme information is now included. 

The exported pickle can be re-imported and uploaded to Database easily.

This notebook, the pickle file and the JSON file go to GitHub repo (https://github.com/shuzhao-li/Azimuth).