# Porting genome scale metabolic models for metabolomics

**yeast-GEM as default yeast model, for better compatibility**
https://github.com/SysBioChalmers/yeast-GEM

**Use cobra to parse SBML models whereas applicable**

Not all models comply with the formats in cobra. Models from USCD and Thiele labs should comply.

**Base our code on metDataModel**

Each model needs a list of Reactions, list of Pathways, and a list of Compounds.
It's important to include with Compounds with all linked identifiers to other DBs (HMDB, PubChem, etc), and with formulae (usually charged form in these models) when available.
We can alwasy update the data later. E.g. the neural formulae can be inferred from charged formula or retrieved from public metabolite database (e.g., HMDB) if linked.
Save in Python pickle and in JSON.

**No compartmentalization**
- After decompartmentalization,
  - transport reactions can be removed - they are identified by reactants and products being the same.
  - redundant reactions can be merge - same reactions in diff compartments become one.

Shuzhao Li, 2021-10-21|
Minghao Gong, 2022-04-25

In [87]:
# !pip install cobra --user --ignore-installed ruamel.yaml
# !pip install --upgrade metDataModel # https://github.com/shuzhao-li/metDataModel/ 
# !pip install --upgrade numpy pandas

In [88]:
import cobra # https://cobrapy.readthedocs.io/en/latest/io.html#SBML
from metDataModel.core import Compound, Reaction, Pathway, MetabolicModel
import requests
import sys

sys.path.append("/Users/gongm/Documents/projects/mass2chem/")
sys.path.append("/Users/gongm/Documents/projects/JMS/JMS/JMS")
from mass2chem.formula import *
from jms.formula import *
from jms.utils.gems import *

In [89]:
# download the most updated yeast-GEM.xml
model_name = 'yeast-GEM'
xml_url = f'https://github.com/SysBioChalmers/{model_name}/blob/main/model/{model_name}.xml'
local_path = output_fdr = f'../testdata/{model_name}/'
xml_file_name = f'{model_name}.xml'
git_download_from_file(xml_url,local_path,xml_file_name)

In [90]:
# Read the model via cobra
xmlFile = os.path.join(local_path,xml_file_name)
model = cobra.io.read_sbml_model(xmlFile)

In [91]:
model

0,1
Name,M_yeastGEM_v8__46__5__46__0
Memory address,0x07f8171d66d00
Number of metabolites,2742
Number of reactions,4058
Number of groups,0
Objective expression,1.0*r_2111 - 1.0*r_2111_reverse_58b69
Compartments,"cell envelope, cytoplasm, extracellular, mitochondrion, nucleus, peroxisome, endoplasmic reticulum, Golgi, lipid particle, vacuole, endoplasmic reticulum membrane, vacuolar membrane, Golgi membrane, mitochondrial membrane"


In [92]:
# metabolite entries, readily convert to list of metabolites
model.metabolites[990] 

0,1
Metabolite identifier,s_1260[c]
Name,oleate [cytoplasm]
Memory address,0x07f816e1fda90
Formula,C18H33O2
Compartment,c
In 8 reaction(s),"r_4309, r_3611, r_3513, r_3978, r_2231, r_2192, r_3574, r_4613"


In [93]:
# reaction entries, Readily convert to list of reactions
model.reactions[33]

0,1
Reaction identifier,r_0039
Name,3-dehydroquinate dehydratase
Memory address,0x07f8174728c40
Stoichiometry,s_0210[c] --> s_0211[c] + s_0803[c]  3-dehydroquinate [cytoplasm] --> 3-dehydroshikimate [cytoplasm] + H2O [cytoplasm]
GPR,YDR127W
Lower bound,0.0
Upper bound,1000.0


# There are no group/pathway information in this 

In [94]:
model.metabolites[990].annotation

{'sbo': ['SBO:0000247'],
 'bigg.metabolite': 'ocdcea',
 'chebi': 'CHEBI:30823',
 'kegg.compound': 'C00712',
 'metanetx.chemical': 'MNXM306'}

## Port metabolite

In [95]:
def port_metabolite(M):
    # convert cobra Metabolite to metDataModel Compound
    Cpd = Compound()
    Cpd.src_id = remove_compartment_by_split(M.id,'[')
    Cpd.id = remove_compartment_by_split(M.id,'[')              # temporarily the same with the source id
    Cpd.name = M.name
    Cpd.charge = M.charge
    Cpd.neutral_formula = adjust_charge_in_formula(M.formula,M.charge)
    Cpd.neutral_mono_mass = neutral_formula2mass(Cpd.neutral_formula)
    Cpd.charged_formula = M.formula
    Cpd.db_ids = [[model_name,Cpd.src_id]] # using src_id to also reference yeast-GEM ID in db_ids field
    for k,v in M.annotation.items():
        if k != 'sbo':
            if isinstance(v,list):
                Cpd.db_ids.append([[k,x] for x in v])
            else: 
                if ":" in v:
                    Cpd.db_ids.append([k,v.split(":")[1]])
                else:
                    Cpd.db_ids.append([k,v])
    
    inchi_list = [x[1].split('=')[1] for x in Cpd.db_ids if x[0] == 'inchi']
    if len(inchi_list) ==1:
        Cpd.inchi = inchi_list[0]
    elif len(inchi_list) >1:
        Cpd.inchi = inchi_list
        
    return Cpd

In [96]:
myCpds = []
for i in range(len(model.metabolites)):
    myCpds.append(port_metabolite(model.metabolites[i]))

In [97]:
len(myCpds)

2742

In [98]:
# remove duplicated compounds
myCpds = remove_duplicate_cpd(myCpds)

In [99]:
myCpds[100].__dict__

{'internal_id': '',
 'id': 's_0154',
 'name': '2-hexaprenyl-5-hydroxy-6-methoxy-3-methyl-1,4-benzoquinone [mitochondrion]',
 'db_ids': [['yeast-GEM', 's_0154'],
  ['bigg.metabolite', '2hpmhmbq'],
  ['chebi', '28753'],
  ['kegg.compound', 'C05805'],
  ['metanetx.chemical', 'MNXM5466']],
 'neutral_formula': 'C38H56O4',
 'neutral_mono_mass': 576.41786,
 'charge': 0,
 'charged_formula': 'C38H56O4',
 'SMILES': '',
 'inchi': '',
 'src_id': 's_0154'}

In [100]:
len(myCpds)

2742

# there are no metabolite.tsv in yeast-GEM

## Port reactions

In [101]:
# port reactions, to include genes and enzymes
def port_reaction(R):
    new = Reaction()
    new.id = R.id
    new.reactants = [remove_compartment_by_split(m.id,'[') for m in R.reactants] # decompartmentalization
    new.products = [remove_compartment_by_split(m.id,'[') for m in R.products]   # decompartmentalization
    new.genes = [g.id for g in R.genes]
    ecs = R.annotation.get('ec-code', [])
    if isinstance(ecs, list):
        new.enzymes = ecs
    else:
        new.enzymes = [ecs]       # this version of yeast-GEM may have it as string
    return new

test99 = port_reaction(model.reactions[199])
[test99.id,
 test99.reactants,
 test99.products,
 test99.genes,
 test99.enzymes
]

['r_0223',
 ['s_0394', 's_0785', 's_0794'],
 ['s_1284', 's_1322'],
 ['YCL050C'],
 ['2.7.7.5', '2.7.7.53']]

In [102]:
## Reactions to port
myRxns = []
for R in model.reactions:
    myRxns.append( port_reaction(R) )
    
print(len(myRxns))

4058


In [103]:
# remove duplicated reactions after decompartmentalization
myRxns = remove_duplicate_rxn(myRxns)

In [104]:
len(myRxns)

4058

In [105]:
myRxns[0].__dict__

{'azimuth_id': '',
 'id': 'r_0001',
 'source': [],
 'version': '',
 'status': '',
 'reactants': ['s_0025', 's_0709'],
 'products': ['s_0710', 's_1399'],
 'enzymes': ['1.1.2.4', '1.1.99.-'],
 'genes': ['YEL071W', 'YEL039C', 'YDL174C', 'YJR048W'],
 'pathways': [],
 'ontologies': [],
 'species': '',
 'compartments': [],
 'cell_types': [],
 'tissues': []}

## No pathway information for yeast

## Collected data; now output

In [106]:
from datetime import datetime
today =  str(datetime.today()).split(" ")[0]

In [107]:
today

'2022-04-26'

In [108]:
note = f"""{model_name} compartmentalized, with genes and ECs."""

## metabolicModel to export
MM = MetabolicModel()
MM.id = f'az_{model_name}_{today}' #
MM.meta_data = {
            'species': model_name.split('-')[0],
            'version': '',
            'sources': [f'https://github.com/SysBioChalmers/{model_name}, retrieved {today}'], #
            'status': '',
            'last_update': today,  #
            'note': note,
        }
MM.list_of_reactions = [R.serialize() for R in  myRxns]
MM.list_of_compounds = [C.serialize() for C in myCpds]

In [109]:
# check output
[
MM.list_of_reactions[:2],
MM.list_of_compounds[100:102],
]

[[{'id': 'r_0001',
   'reactants': ['s_0025', 's_0709'],
   'products': ['s_0710', 's_1399'],
   'genes': ['YEL071W', 'YEL039C', 'YDL174C', 'YJR048W'],
   'enzymes': ['1.1.2.4', '1.1.99.-']},
  {'id': 'r_0002',
   'reactants': ['s_0027', 's_0709'],
   'products': ['s_0710', 's_1401'],
   'genes': ['YDL178W', 'YEL039C', 'YJR048W'],
   'enzymes': ['1.1.2.4', '1.1.99.-']}],
 [{'id': 's_0154',
   'name': '2-hexaprenyl-5-hydroxy-6-methoxy-3-methyl-1,4-benzoquinone [mitochondrion]',
   'identifiers': [['yeast-GEM', 's_0154'],
    ['bigg.metabolite', '2hpmhmbq'],
    ['chebi', '28753'],
    ['kegg.compound', 'C05805'],
    ['metanetx.chemical', 'MNXM5466']],
   'neutral_formula': 'C38H56O4',
   'charge': 0,
   'charged_formula': 'C38H56O4',
   'neutral_mono_mass': 576.41786,
   'SMILES': '',
   'inchi': ''},
  {'id': 's_0155',
   'name': '2-hexaprenyl-6-methoxy-1,4-benzoquinone [mitochondrion]',
   'identifiers': [['yeast-GEM', 's_0155'],
    ['bigg.metabolite', '2hp6mbq'],
    ['chebi', '277

In [110]:
import pickle
import os

# Write pickle file
export_pickle(os.path.join(output_fdr,f'{MM.id}.pickle'), MM)

In [111]:
# Write json file
export_json(os.path.join(output_fdr,f'{MM.id}.json'), MM)

In [112]:
# Write dataframe 
import pandas as pd
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_compounds.csv'),MM, 'list_of_compounds')
export_table(os.path.join(output_fdr,f'{MM.id}_list_of_reactions.csv'),MM, 'list_of_reactions')
#export_table(os.path.join(output_fdr,f'{MM.id}_list_of_pathways.csv'),MM, 'list_of_pathways')

## Summary

This ports reactions, pathways and compounds. Gene and enzyme information is now included. 

The exported pickle can be re-imported and uploaded to Database easily.

This notebook, the pickle file and the JSON file go to GitHub repo (https://github.com/shuzhao-li/Azimuth).