# Porting genome scale metabolic models for metabolomics

**mouse-GEM as default mouse model, for better compatibility**
https://github.com/SysBioChalmers/Mouse-GEM

**Use cobra to parse SBML models whereas applicable**

Not all models comply with the formats in cobra. Models from USCD and Thiele labs should comply.

**Base our code on metDataModel**

Each model needs a list of Reactions, list of Pathways, and a list of Compounds.
It's important to include with Compounds with all linked identifiers to other DBs (HMDB, PubChem, etc), and with formulae (usually charged form in these models) when available.
We can alwasy update the data later. E.g. the neural formulae can be inferred from charged formula or retrieved from public metabolite database (e.g., HMDB) if linked.
Save in Python pickle and in JSON.

**No compartmentalization**
- After decompartmentalization,
  - transport reactions can be removed - they are identified by reactants and products being the same.
  - redundant reactions can be merge - same reactions in diff compartments become one.

Shuzhao Li, 2021-10-21|
Minghao Gong, 2022-04-26

In [1]:
# !pip install cobra --user --ignore-installed ruamel.yaml
# !pip install --upgrade metDataModel # https://github.com/shuzhao-li/metDataModel/ 
# !pip install --upgrade numpy pandas

In [2]:
import cobra # https://cobrapy.readthedocs.io/en/latest/io.html#SBML
from metDataModel.core import Compound, Reaction, Pathway, MetabolicModel
import requests
import sys

sys.path.append("/Users/gongm/Documents/projects/mass2chem/")
sys.path.append("/Users/gongm/Documents/projects/JMS/JMS/JMS")
from mass2chem.formula import *
from jms.formula import *
from jms.utils.gems import *

In [3]:
# download the most updated Mouse-GEM.xml
model_name = 'Mouse-GEM'
xml_url = f'https://github.com/SysBioChalmers/{model_name}/blob/main/model/{model_name}.xml'
local_path = output_fdr = f'../testdata/{model_name}/'

try:
    os.mkdir(local_path)
except:
    None

xml_file_name = f'{model_name}.xml'
git_download_from_file(xml_url,local_path,xml_file_name)

In [4]:
# Read the model via cobra
xmlFile = os.path.join(local_path,xml_file_name)
model = cobra.io.read_sbml_model(xmlFile)

https://identifiers.org/taxonomy/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id


Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled


In [5]:
model

0,1
Name,MouseGEM
Memory address,0x07f952ebc1820
Number of metabolites,8370
Number of reactions,13063
Number of groups,146
Objective expression,1.0*MAR00021 - 1.0*MAR00021_reverse_97974
Compartments,"Cytosol, Extracellular, Lysosome, Endoplasmic reticulum, Mitochondria, Peroxisome, Golgi apparatus, Nucleus, Inner mitochondria"


In [6]:
# metabolite entries, readily convert to list of metabolites
model.metabolites[990] 

0,1
Metabolite identifier,MAM00600c
Name,20-oxo-LTB4
Memory address,0x07f950b8a4580
Formula,C20H29O5
Compartment,c
In 2 reaction(s),"MAR01129, MAR01132"


In [7]:
# reaction entries, Readily convert to list of reactions
model.reactions[33]

0,1
Reaction identifier,MAR08360
Name,
Memory address,0x07f950d231b80
Stoichiometry,MAM01796x + MAM02041x --> MAM01249x + 2.0 MAM02040x  ethanol + H2O2 --> acetaldehyde + 2.0 H2O
GPR,Cat
Lower bound,0.0
Upper bound,1000.0


In [8]:
# groups are similar to pathways? Readily convert to list of pathway
model.groups[11].__dict__

{'_id': 'group12',
 'name': 'Ascorbate and aldarate metabolism',
 'notes': {},
 '_annotation': {'sbo': 'SBO:0000633'},
 '_members': [<Reaction MAR06393 at 0x7f94ede868e0>,
  <Reaction MAR06394 at 0x7f94ede86b80>,
  <Reaction MAR06396 at 0x7f94ede86f70>,
  <Reaction MAR06405 at 0x7f94ede72400>,
  <Reaction MAR08345 at 0x7f94ede86d60>,
  <Reaction MAR08346 at 0x7f94ede86f10>,
  <Reaction MAR08348 at 0x7f94ede864c0>,
  <Reaction MAR08349 at 0x7f94ede726a0>,
  <Reaction MAR08619 at 0x7f94ede86b50>,
  <Reaction MAR08620 at 0x7f94ede86ca0>,
  <Reaction MAR08621 at 0x7f94ede9bee0>,
  <Reaction MAR08622 at 0x7f94ede9ba00>,
  <Reaction MAR08623 at 0x7f94ede9bd90>,
  <Reaction MAR08624 at 0x7f94ede9b9d0>,
  <Reaction MAR08625 at 0x7f94ede9bd60>,
  <Reaction MAR20008 at 0x7f94f1009be0>,
  <Reaction MAR20009 at 0x7f94f1009bb0>],
 '_kind': 'partonomy',
 '_model': <Model MouseGEM at 0x7f952ebc1820>}

## Port metabolite

In [9]:
def port_metabolite(M):
    # convert cobra Metabolite to metDataModel Compound
    Cpd = Compound()
    Cpd.src_id = remove_compartment_by_substr(M.id,1)
    Cpd.id = remove_compartment_by_substr(M.id,1)              # temporarily the same with the source id
    Cpd.name = M.name
    Cpd.charge = M.charge
    Cpd.neutral_formula = adjust_charge_in_formula(M.formula,M.charge)
    Cpd.neutral_mono_mass = neutral_formula2mass(Cpd.neutral_formula)
    Cpd.charged_formula = M.formula
    Cpd.db_ids = [[model_name,Cpd.src_id]] # using src_id to also reference mouseGEM ID in db_ids field
    for k,v in M.annotation.items():
        if k != 'sbo':
            if isinstance(v,list):
                Cpd.db_ids.append([[k,x] for x in v])
            else: 
                if ":" in v:
                    Cpd.db_ids.append([k,v.split(":")[1]])
                else:
                    Cpd.db_ids.append([k,v])
    
    inchi_list = [x[1].split('=')[1] for x in Cpd.db_ids if x[0] == 'inchi']
    if len(inchi_list) ==1:
        Cpd.inchi = inchi_list[0]
    elif len(inchi_list) >1:
        Cpd.inchi = inchi_list
        
    return Cpd

In [10]:
myCpds = []
for i in range(len(model.metabolites)):
    myCpds.append(port_metabolite(model.metabolites[i]))

In [11]:
len(myCpds)

8370

In [12]:
# remove duplicated compounds
myCpds = remove_duplicate_cpd(myCpds)

In [13]:
myCpds[50].__dict__

{'internal_id': '',
 'id': 'MAM00051',
 'name': '(2E)-hexadecenoyl-CoA',
 'db_ids': [['Mouse-GEM', 'MAM00051'],
  ['bigg.metabolite', 'hdd2coa'],
  ['chebi', '28935'],
  ['hmdb', 'HMDB06533'],
  ['kegg.compound', 'C05272'],
  ['lipidmaps', 'LMFA07050020'],
  ['metanetx.chemical', 'MNXM581'],
  ['pubchem.compound', '46173176'],
  ['vmhmetabolite', 'hdd2coa']],
 'neutral_formula': 'C37H64N7O17P3S',
 'neutral_mono_mass': 1003.329226,
 'charge': -4,
 'charged_formula': 'C37H60N7O17P3S',
 'SMILES': '',
 'inchi': '',
 'src_id': 'MAM00051'}

In [14]:
len(myCpds)

4113

In [15]:
fetch_MetabAtlas_GEM_identifiers(compound_list = myCpds,
                                 modelName = model_name,
                                 local_path = local_path,
                                 metab_file_name = 'metabolites.tsv',
                                 overwrite = True)

In [16]:
myCpds[50].__dict__

{'internal_id': '',
 'id': 'MAM00051',
 'name': '(2E)-hexadecenoyl-CoA',
 'db_ids': [('BiGG', 'hdd2coa'),
  ('ChEBI', 'CHEBI:28935'),
  ('HMDB', 'HMDB06533'),
  ('HMR2', 'm00051m'),
  ('HepatoNET1', 'HC01411'),
  ('KEGG', 'C05272'),
  ('LipidMaps', 'LMFA07050020'),
  ('MetaNetX', 'MNXM581'),
  ('PubChem', '46173176'),
  ('Recon3D', 'hdd2coa')],
 'neutral_formula': 'C37H64N7O17P3S',
 'neutral_mono_mass': 1003.329226,
 'charge': -4,
 'charged_formula': 'C37H60N7O17P3S',
 'SMILES': '',
 'inchi': '',
 'src_id': 'MAM00051'}

## Port reactions

In [17]:
# port reactions, to include genes and enzymes
def port_reaction(R):
    new = Reaction()
    new.id = R.id
    new.reactants = [remove_compartment_by_substr(m.id,1) for m in R.reactants] # decompartmentalization
    new.products = [remove_compartment_by_substr(m.id,1) for m in R.products]   # decompartmentalization
    new.genes = [g.id for g in R.genes]
    ecs = R.annotation.get('ec-code', [])
    if isinstance(ecs, list):
        new.enzymes = ecs
    else:
        new.enzymes = [ecs]       # this version of mouse-GEM may have it as string
    return new

test99 = port_reaction(model.reactions[199])
[test99.id,
 test99.reactants,
 test99.products,
 test99.genes,
 test99.enzymes
]

['MAR04565',
 ['MAM01939', 'MAM02884'],
 ['MAM01785', 'MAM01845'],
 ['Taldo1'],
 ['2.2.1.2']]

In [18]:
## Reactions to port
myRxns = []
for R in model.reactions:
    myRxns.append( port_reaction(R) )
    
print(len(myRxns))

13063


In [19]:
# remove duplicated reactions after decompartmentalization
myRxns = remove_duplicate_rxn(myRxns)

In [20]:
len(myRxns)

8867

In [21]:
myRxns[0].__dict__

{'azimuth_id': '',
 'id': 'MAR03905',
 'source': [],
 'version': '',
 'status': '',
 'reactants': ['MAM01796', 'MAM02552'],
 'products': ['MAM01249', 'MAM02039', 'MAM02553'],
 'enzymes': ['1.1.1.1', '1.1.1.71'],
 'genes': ['Adh7', 'Adhfe1', 'Adh1', 'Adh4', 'Adh5', 'Adh6b'],
 'pathways': [],
 'ontologies': [],
 'species': '',
 'compartments': [],
 'cell_types': [],
 'tissues': []}

## Port pathway

In [22]:
# pathways, using group as pathway. Other models may use subsystem etc.

def port_pathway(P):
    new = Pathway()
    new.id = P.id
    new.source = ['mouse-GEM v1.10.0',]
    new.name = P.name
    new.list_of_reactions = [x.id for x in P.members]
    return new

p = port_pathway(model.groups[12])

[p.id, p.name, p.list_of_reactions[:5]]

['group13',
 'Beta oxidation of branched-chain fatty acids (mitochondrial)',
 ['MAR03522', 'MAR03523', 'MAR03524', 'MAR03525', 'MAR03526']]

In [23]:
## Pathways to port
myPathways = []
for P in model.groups:
    myPathways.append(port_pathway(P))

len(myPathways)

146

In [24]:
# retain the valid reactions in list of pathway
myPathways = retain_valid_Rxns_in_Pathways(myPathways,myRxns)

In [25]:
# test if the length of unique reactions matched with the length of decompartmentalized reaction list 
test_list_Rxns = []
for pathway in myPathways:
    for y in pathway.list_of_reactions:
        test_list_Rxns.append(y)

len(set(test_list_Rxns))

8867

## Collected data; now output

In [26]:
from datetime import datetime
today =  str(datetime.today()).split(" ")[0]

In [27]:
today

'2022-04-26'

In [28]:
note = """Mouse-GEM compartmentalized, with genes and ECs."""

## metabolicModel to export
MM = MetabolicModel()
MM.id = f'az_{model_name}_{today}' #
MM.meta_data = {
            'species': model_name.split('-')[0],
            'version': '',
            'sources': [f'https://github.com/SysBioChalmers/{model_name}, retrieved {today}'], #
            'status': '',
            'last_update': today,  #
            'note': note,
        }
MM.list_of_pathways = [P.serialize() for P in myPathways]
MM.list_of_reactions = [R.serialize() for R in  myRxns]
MM.list_of_compounds = [C.serialize() for C in myCpds]

In [29]:
# check output
[
MM.list_of_pathways[2],
MM.list_of_reactions[:2],
MM.list_of_compounds[100:102],
]

[{'id': 'group3',
  'name': 'Alanine, aspartate and glutamate metabolism',
  'list_of_reactions': ['MAR03802',
   'MAR03804',
   'MAR03811',
   'MAR03813',
   'MAR03822',
   'MAR03827',
   'MAR03829',
   'MAR03831',
   'MAR03862',
   'MAR03865',
   'MAR03870',
   'MAR03873',
   'MAR08654',
   'MAR03890',
   'MAR03892',
   'MAR09802',
   'MAR03899',
   'MAR03903',
   'MAR04109',
   'MAR04114',
   'MAR04115',
   'MAR04118',
   'MAR04172',
   'MAR04196',
   'MAR04197',
   'MAR04287',
   'MAR04690',
   'MAR04693',
   'MAR06780',
   'MAR06968',
   'MAR06969',
   'MAR06970',
   'MAR06971',
   'MAR06972',
   'MAR07641',
   'MAR07642',
   'MAR08626',
   'MAR08628',
   'MAR04285',
   'MAR11565']},
 [{'id': 'MAR03905',
   'reactants': ['MAM01796', 'MAM02552'],
   'products': ['MAM01249', 'MAM02039', 'MAM02553'],
   'genes': ['Adh7', 'Adhfe1', 'Adh1', 'Adh4', 'Adh5', 'Adh6b'],
   'enzymes': ['1.1.1.1', '1.1.1.71']},
  {'id': 'MAR03907',
   'reactants': ['MAM01796', 'MAM02554'],
   'products': ['M

In [30]:
import pickle
import os

# Write pickle file
export_pickle(os.path.join(output_fdr,f'{MM.id}.pickle'), MM)

In [31]:
# Write json file
export_json(os.path.join(output_fdr,f'{MM.id}.json'), MM)

In [32]:
# Write dataframe 
import pandas as pd
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_compounds.csv'),MM, 'list_of_compounds')
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_reactions.csv'),MM, 'list_of_reactions')
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_pathways.csv'),MM, 'list_of_pathways')

## Summary

This ports reactions, pathways and compounds. Gene and enzyme information is now included. 

The exported pickle can be re-imported and uploaded to Database easily.

This notebook, the pickle file and the JSON file go to GitHub repo (https://github.com/shuzhao-li/Azimuth).