# Demo using custom patterns for isotopes or adducts in khipu

- Goal: how to use custom ion patterns for isotopes or adducts in khipu 
- Citation: Li, S. and Zheng, S., 2023. Generalized tree structure to annotate untargeted metabolomics and stable isotope tracing data. Analytical chemistry, 95(15), pp.6212-6217. (https://pubs.acs.org/doi/10.1021/acs.analchem.2c05810)
- Original repo: https://github.com/shuzhao-li-lab/khipu


SL 2023-03-02

In [1]:
!pip install -q --upgrade khipu-metabolomics

In [2]:
from khipu.utils import *
from khipu.epdsConstructor import epdsConstructor

In [3]:
# inspect variables in current scope

print([x for x in dir() if x[:2] != '__'])

['In', 'Out', 'PROTON', '_', '_dh', '_exit_code', '_i', '_i1', '_i2', '_i3', '_ih', '_ii', '_iii', '_oh', 'add_data_to_tag', 'adduct_search_patterns', 'adduct_search_patterns_neg', 'assign_masstrack_ids_in_khipu', 'build_centurion_tree', 'electron', 'epdsConstructor', 'exit', 'export_json_trees', 'export_tsv_trees', 'extended_adducts', 'find_all_matches_centurion_indexed_list', 'find_trees_by_datatag', 'find_trees_by_datatag_list', 'get_adduct_edge_pairs', 'get_ipython', 'get_isotope_pattern_name', 'get_isotopic_edge_pairs', 'is_datatag_in_tree', 'isotope_search_patterns', 'json', 'make_edge_tag', 'make_expected_adduct_index', 'make_peak_dict', 'make_peak_tag', 'np', 'nx', 'peaks_to_networks', 'quit', 'read_features_from_text', 'realign_isotopes', 'realign_isotopes_reverse', 'relabel_dict', 'rt_compared_by_values', 'rt_matched_by_tolerance']


In [4]:
help(read_features_from_text)

Help on function read_features_from_text in module khipu.utils:

read_features_from_text(text_table, id_col=0, mz_col=1, rtime_col=2, intensity_cols=(3, 6), delimiter='\t')
    Read a text feature table into a list of features.
    Input
    -----
    text_table: Tab delimited feature table read as text. First line as header.
                    Recommended col 0 for ID, col 1 for m/z, col 2 for rtime.
    id_col: column for id. If feature ID is not given, row_number is used as ID.
    mz_col: column for m/z.
    rtime_col: column for retention time.
    intensity_cols: range of columns for intensity values. E.g. (3,5) includes only col 3 and 4.
    Return
    ------
    List of features: [{'id': '', 'mz': 0, 'rtime': 0, 
                        intensities: [], 'representative_intensity': 0, ...}, 
                        ...], 
                        where representative_intensity is mean value.



In [5]:
# The input file, yeast_pos_full.tsv, from khipu GitHub repo.

features = read_features_from_text(open('yeast_pos_full.tsv').read(), 
                                  id_col=0, mz_col=1, rtime_col=2, intensity_cols=(6, 9), delimiter='\t')

table header looks like:  ['id_number', 'mz', 'rtime', 'cSelectivity', 'goodness_fitting', 'snr', 'posi-Yeast-12C14N-a', 'posi-Yeast-12C14N-b', 'posi-Yeast-12C14N-c']
Read 14051 feature lines


In [6]:
features[99]

{'id': 'F100',
 'mz': 260.1688,
 'rtime': 185.63,
 'intensities': [943328.0, 666713.0, 671035.0],
 'representative_intensity': 760358.6666666666}

### Current patterns

In [7]:
for x in (adduct_search_patterns, isotope_search_patterns, extended_adducts):
    print(x, '\n')

[(21.982, 'Na/H'), (41.026549, 'ACN'), (35.9767, 'HCl'), (37.955882, 'K/H')] 

[(1.003355, '13C/12C', (0, 0.8)), (2.00671, '13C/12C*2', (0, 0.8)), (3.010065, '13C/12C*3', (0, 0.8)), (4.01342, '13C/12C*4', (0, 0.8)), (5.016775, '13C/12C*5', (0, 0.8)), (6.02013, '13C/12C*6', (0, 0.8)), (7.023485, '13C/12C*7', (0, 0.8)), (8.02684, '13C/12C*8', (0, 0.8)), (9.030195, '13C/12C*9', (0, 0.8)), (10.03355, '13C/12C*10', (0, 0.8)), (11.036905, '13C/12C*11', (0, 0.8)), (12.04026, '13C/12C*12', (0, 0.8))] 

[(1.0078, 'H'), (-1.0078, '-H'), (10.991, 'Na/H, double charged'), (0.5017, '13C/12C, double charged'), (117.02655, '-NH3'), (17.02655, 'NH3'), (-18.0106, '-H2O'), (18.0106, 'H2O'), (18.033823, 'NH4'), (27.01089904, 'HCN'), (37.94694, 'Ca/H2'), (32.026215, 'MeOH'), (43.96389, 'Na2/H2'), (67.987424, 'NaCOOH'), (83.961361, 'KCOOH'), (97.96737927, 'H2SO4'), (97.97689507, 'H3PO4')] 



### Redefine custom patterns

In [8]:
isotope_search_patterns = [(1.003355, '13C/12C', (0, 0.8)), (2.00671, '13C/12C*2', (0, 0.8))]

extended_adducts = [(1.0078, 'H'), (-1.0078, '-H'), (10.991, 'Na/H, double charged'), (0.5017, '13C/12C, double charged'), 
                    (117.02655, '-NH3'), (17.02655, 'NH3'), (-18.0106, '-H2O'), (18.0106, 'H2O'),
                    (55.96644655, 'KOH'), (60.02112937, 'C2H4O2'), 
                    (62.00039, 'H2CO3'), (62.9956429, 'HNO3'), (75.91176374, '2K-2H'),]

### Annotate features using custom patterns

In [9]:
help(epdsConstructor)

Help on class epdsConstructor in module khipu.epdsConstructor:

class epdsConstructor(builtins.object)
 |  epdsConstructor(peak_list, mode='pos')
 |  
 |  Wrapper class to organize a list of peaks/features into a list of empirical compounds.
 |  
 |  To-dos:
 |      add support of user input formats where rtime isn't precise or unavailable.
 |      add options of coelution_function (see mass2chem.epdsConstructor )
 |  
 |  Methods defined here:
 |  
 |  __init__(self, peak_list, mode='pos')
 |      Parameters
 |      ----------
 |      peak_list : [{'parent_masstrace_id': 1670, 'mz': 133.09702315984987, 'rtime': 654, 'height': 14388.0, 'id': 555}, ...]
 |      mz_tolerance_ppm: ppm tolerance in examining m/z patterns.
 |  
 |  peaks_to_epdDict(self, isotope_search_patterns, adduct_search_patterns, extended_adducts, mz_tolerance_ppm, rt_tolerance=2)
 |      Parameters
 |      ----------
 |      isotope_search_patterns : exact list used to retrieve the subnetworks. E.g. 
 |          [ (1

In [13]:
ECON = epdsConstructor(features, mode='pos')

khipu_dict = ECON.peaks_to_epdDict(
    isotope_search_patterns = isotope_search_patterns,
    adduct_search_patterns = adduct_search_patterns,
    extended_adducts = extended_adducts,
    mz_tolerance_ppm=5, 
    rt_tolerance=2, 
 )



Initial khipu search grid: 
               M+H+       Na/H        HCl        K/H        ACN
M0         1.007276  22.989276  36.983976  38.963158  42.033825
13C/12C    2.010631  23.992631  37.987331  39.966513  43.037180
13C/12C*2  3.013986  24.995986  38.990686  40.969868  44.040535


Downsized input network with 64 features, highest peak at F6410 
Unknown isotope match ~  (268.2636, 'F469')
Downsized input network with 297 features, highest peak at F8433 
Unknown isotope match ~  (376.2592, 'F6829')
Unknown isotope match ~  (392.1913, 'F8433')
Unknown isotope match ~  (393.1946, 'F8570')
Unknown isotope match ~  (393.2855, 'F8615')
Unknown isotope match ~  (394.1964, 'F8703')
Unknown isotope match ~  (394.2887, 'F3701')
Unknown isotope match ~  (397.237, 'F3971')
Unknown isotope match ~  (401.268, 'F4387')
Unknown isotope match ~  (402.2706, 'F4490')
Unknown isotope match ~  (420.2227, 'F6247')
Unknown isotope match ~  (421.317, 'F6424')
Unknown isotope match ~  (432.2739, 'F7583')




Unknown isotope match ~  (356.3162, 'F4242')
Unknown isotope match ~  (357.3195, 'F4416')
Unknown isotope match ~  (398.363, 'F4120')
Unknown isotope match ~  (399.3664, 'F4211')
Downsized input network with 25 features, highest peak at F4327 
Downsized input network with 17 features, highest peak at F6770 
Downsized input network with 18 features, highest peak at F4422 
Downsized input network with 17 features, highest peak at F4461 
Unknown isotope match ~  (399.2345, 'F4157')
Unknown isotope match ~  (446.2533, 'F4949')
Unknown isotope match ~  (447.2571, 'F5051')
Downsized input network with 40 features, highest peak at F8756 
Downsized input network with 24 features, highest peak at F4664 
Downsized input network with 34 features, highest peak at F6243 
Downsized input network with 35 features, highest peak at F9075 
Unknown isotope match ~  (363.1726, 'F5136')
Unknown isotope match ~  (452.1574, 'F5507')
Downsized input network with 24 features, highest peak at F6117 
Unknown iso

In [14]:
list(khipu_dict.items())[99]

('kp100_313.2617',
 {'interim_id': 'kp100_313.2617',
  'neutral_formula_mass': 313.26169603322995,
  'neutral_formula': None,
  'Database_referred': [],
  'identity': [],
  'MS1_pseudo_Spectra': [{'id': 'F683',
    'mz': 315.2723,
    'rtime': 159.79,
    'intensities': [25028245.0, 25883849.0, 24629612.0],
    'representative_intensity': 25180568.666666668,
    'parent_masstrack_id': '315.2723',
    'isotope': '13C/12C',
    'modification': 'M+H+',
    'ion_relation': '13C/12C,M+H+'},
   {'id': 'F578',
    'mz': 314.269,
    'rtime': 159.79,
    'intensities': [133991630.0, 135907142.0, 137575102.0],
    'representative_intensity': 135824624.66666666,
    'parent_masstrack_id': '314.269',
    'isotope': 'M0',
    'modification': 'M+H+',
    'ion_relation': 'M0,M+H+'}],
  'MS2_Spectra': []})

# Conclusion

This notebook shows how to use khipu.epdsConstructor.epdsConstructor to generate a dictionary of khipus (empCpds).
The epdsConstructor.peaks_to_epdDict() function takes custom patterns.