# Demo to annotate user feature table using JMS

Any feature table from XCMS or other tools can be annotated using JMS.


Json's Metabolite Services (JMS):
https://github.com/shuzhao-li/JMS


SL 2022-02-27

In [1]:
# install packages
!pip install --upgrade mass2chem jms-metabolite-services

Requirement already up-to-date: mass2chem in /opt/conda/lib/python3.7/site-packages (0.3.0)
Requirement already up-to-date: jms-metabolite-services in /opt/conda/lib/python3.7/site-packages (0.3.2)


In [2]:
# to run from local source without pip install
# import sys
# clone from GitHub
# https://github.com/shuzhao-li/JMS.git
# move jms folder to this working dir

# sys.path.append('jms')
# sys.path.append('mass2chem')

In [3]:
import json

from mass2chem.epdsConstructor import epdsConstructor

from jms.dbStructures import ExperimentalEcpdDatabase, knownCompoundDatabase
from jms.io import read_table_to_peaks

In [4]:
# same test file from https://github.com/shuzhao-li/mummichog/tree/master/mummichog/tests
myTable = 'testdata0710.txt'

'''
m/z	retention_time	p-value	t-score	custom_id
85.0278	59	0.00265721703609	-3.55	AE_pos_85.0278_59
85.0472	124	0.730810186297	-0.35	AE_pos_85.0472_124
85.0653	68	0.0865089499162	1.83	AE_pos_85.0653_68
85.1007	16	0.0579161231675	-2.04	AE_pos_85.1007_16
86.0595	67	0.0767887441614	-1.89	AE_pos_86.0595_67
'''

'\nm/z\tretention_time\tp-value\tt-score\tcustom_id\n85.0278\t59\t0.00265721703609\t-3.55\tAE_pos_85.0278_59\n85.0472\t124\t0.730810186297\t-0.35\tAE_pos_85.0472_124\n85.0653\t68\t0.0865089499162\t1.83\tAE_pos_85.0653_68\n85.1007\t16\t0.0579161231675\t-2.04\tAE_pos_85.1007_16\n86.0595\t67\t0.0767887441614\t-1.89\tAE_pos_86.0595_67\n'

In [5]:
list_peaks = read_table_to_peaks(myTable, 
                                has_header=True, mz_col=0, rtime_col=1, feature_id=4,
                                )
print(len(list_peaks), list_peaks[555])

7995 {'id_number': 'AE_pos_144.0564_110\n', 'mz': 144.0564, 'rtime': 110.0, 'apex': 110.0}


In [6]:
# empCpd function by mass2chem.epdsConstructor

ECCON = epdsConstructor(list_peaks, mode='pos')
dict_empCpds = ECCON.peaks_to_epdDict(
            seed_search_patterns = ECCON.seed_search_patterns, 
            ext_search_patterns = ECCON.ext_search_patterns,
            mz_tolerance_ppm=5, 
            coelution_function='distance',
            check_isotope_ratio = False
            ) 



Annotating empirical compounds on 7995 features/peaks, ...
epdsConstructor - numbers of seeded epds and included peaks:  (926, 1965)


In [7]:
dict_empCpds[55]

{'interim_id': 55,
 'neutral_formula_mass': None,
 'neutral_formula': None,
 'Database_referred': [],
 'identity': [],
 'MS1_pseudo_Spectra': [{'id_number': 'AE_pos_167.9628_28\n',
   'mz': 167.9628,
   'rtime': 28.0,
   'apex': 28.0,
   'ion_relation': 'anchor'},
  {'id_number': 'AE_pos_168.9658_26\n',
   'mz': 168.9658,
   'rtime': 26.0,
   'apex': 26.0,
   'ion_relation': '13C/12C'},
  {'id_number': 'AE_pos_208.9894_29\n',
   'mz': 208.9894,
   'rtime': 29.0,
   'apex': 29.0,
   'ion_relation': 'anchor,+H2O'}],
 'MS2_Spectra': []}

In [8]:
EED = ExperimentalEcpdDatabase(mode='pos')
EED.dict_empCpds = dict_empCpds
EED.index_empCpds()

## load knownCompoundDatabase

In [9]:
KCD = knownCompoundDatabase()

# one may need to decompress the file list_compounds_HMDB4.json by:
# xz -d list_compounds_HMDB4.json.xz

list_compounds = json.load(open('jms/data/compounds/list_compounds_HMDB4.json'))
KCD.mass_index_list_compounds(list_compounds)
KCD.build_emp_cpds_index()

In [10]:
list(KCD.mass_indexed_compounds.values())[2222]

{'interim_id': 'C8H13N2O2_169.097154',
 'neutral_formula': 'C8H13N2O2',
 'neutral_formula_mass': 169.097154087,
 'compounds': [{'primary_id': 'HMDB0062696',
   'primary_db': 'HMDB',
   'name': 'Pyridoxaminium(1+)',
   'neutral_formula': 'C8H13N2O2',
   'neutral_formula_mass': 169.097154087,
   'SMILES': 'CC1=C(O)C(C[NH3+])=C(CO)C=N1',
   'inchikey': 'NHZMQXZHNVQTQA-UHFFFAOYSA-O',
   'other_ids': {'PubChem': '25245492', 'KEGG': '', 'ChEBI': '57761'}}]}

## Annoate

In [11]:
EED.extend_empCpd_annotation(KCD)
EED.annotate_singletons(KCD)

In [12]:
list(EED.dict_empCpds.items())[200]

(200,
 {'interim_id': 200,
  'neutral_formula_mass': 441.139681375,
  'neutral_formula': 'C19H19N7O6',
  'Database_referred': [],
  'identity': [],
  'MS1_pseudo_Spectra': [{'id_number': 'AE_pos_464.127_161\n',
    'mz': 464.127,
    'rtime': 161.0,
    'apex': 161.0,
    'ion_relation': 'anchor',
    'parent_epd_id': 200},
   {'id_number': 'AE_pos_465.1294_161\n',
    'mz': 465.1294,
    'rtime': 161.0,
    'apex': 161.0,
    'ion_relation': '13C/12C',
    'parent_epd_id': 200},
   {'id_number': 'AE_pos_486.1088_161\n',
    'mz': 486.1088,
    'rtime': 161.0,
    'apex': 161.0,
    'ion_relation': 'Na/H',
    'parent_epd_id': 200}],
  'MS2_Spectra': [],
  'list_matches': [('C19H19N7O6_441.139681', 'M+Na[1+]', 1),
   ('C19H27N3O5S2_441.139212', 'M+Na[1+]', 1)]})

In [15]:
# Export a JSON file
# cls=NpEncoder,
outfile = '_my_JSON_empCpds.json'
with open(outfile, 'w', encoding='utf-8') as f:
    json.dump(EED.dict_empCpds, f, ensure_ascii=False, indent=2)

## Summary

The JMS package deals with empCpd grouping in both DB derived and experiment derived data.

The default method uses peak parameters from asari, but user supplied feature tables can be annotated in demo here.
