# PreAnnotation and Khipu

Pre-Annotation is the assignment of features to ion relations and isotopologues. This creates EmpiricalCompounds, collections of features likely to represent the same chemical entity. Pre-Annotation is not unique to Khipu; however, the empirical compounds are. These computable data structures empower annotation in the PCPFM. 

First we will install Khipu and demonstrate stand alone usage before working with the pipeline or applying this software to isotope labelled data. 

In [22]:
!pip3 install --upgrade khipu-metabolomics

Defaulting to user installation because normal site-packages is not writeable


In [24]:
from khipu.extended import *

# lets take a look at the mass patterns that we will use pre-annotation.

for x in (adduct_search_patterns, isotope_search_patterns, extended_adducts):
    print(x, '\n')

# each is a tuple of the form (mz delta, name, ...)

[(21.982, 'Na/H'), (41.026549, 'ACN'), (35.9767, 'HCl'), (37.955882, 'K/H')] 

[(1.003355, '13C/12C', (0, 0.8)), (2.00671, '13C/12C*2', (0, 0.8)), (3.010065, '13C/12C*3', (0, 0.8)), (4.01342, '13C/12C*4', (0, 0.8)), (5.016775, '13C/12C*5', (0, 0.8)), (6.02013, '13C/12C*6', (0, 0.8)), (7.023485, '13C/12C*7', (0, 0.8)), (8.02684, '13C/12C*8', (0, 0.8)), (9.030195, '13C/12C*9', (0, 0.8)), (10.03355, '13C/12C*10', (0, 0.8)), (11.036905, '13C/12C*11', (0, 0.8)), (12.04026, '13C/12C*12', (0, 0.8))] 

[(1.0078, 'H'), (-1.0078, '-H'), (1.9972, '37/35Cl'), (-17.02655, '-NH3'), (17.02655, 'NH3'), (-18.0106, '-H2O'), (18.0106, 'H2O'), (18.033823, 'NH4'), (27.01089904, 'HCN'), (27.99492, 'CO'), (32.026215, 'MeOH'), (-35.037114, '-NH3-H2O'), (37.94694, 'Ca/H2'), (43.96389, 'Na2/H2'), (46.00548, 'CO2H2'), (-46.00548, '-CO2H2'), (67.987424, 'NaCOOH'), (83.961361, 'KCOOH'), (97.96737927, 'H2SO4'), (97.97689507, 'H3PO4')] 



In [28]:
!khipu -i ../Datasets/ecoli_pos.tsv -o khipu_demo



~~~~~~~ Hello from Khipu (2.0.2) ~~~~~~~~~

Working on file  ../Datasets/ecoli_pos.tsv
table header looks like: 
   ['id_number', 'mz', 'rtime', '12C_Ecoli_20220321_004', '12C_Ecoli_20220321_004_20220322095030', '12C_Ecoli_20220321_004_20220322130235']
Read 3602 feature lines


Multiple charges considered: [1, 2, 3]


Khipu search grid: 
                 M+H+       Na/H        HCl        K/H        ACN
M0           1.007276  22.989276  36.983976  38.963158  42.033825
13C/12C      2.010631  23.992631  37.987331  39.966513  43.037180
13C/12C*2    3.013986  24.995986  38.990686  40.969868  44.040535
13C/12C*3    4.017341  25.999341  39.994041  41.973223  45.043890
13C/12C*4    5.020696  27.002696  40.997396  42.976578  46.047245
13C/12C*5    6.024051  28.006051  42.000751  43.979933  47.050600
13C/12C*6    7.027406  29.009406  43.004106  44.983288  48.053955
13C/12C*7    8.030761  30.012761  44.007461  45.986643  49.057310
13C/12C*8    9.034116  31.016116  45.010816  46.989998  50.06066

In [29]:
# khipu generates a tsv and json file

!ls -alhR

total 4536
drwxr-xr-x@ 11 mitchjo  JAX\Domain Users   352B Oct 22 01:05 [34m.[m[m
drwxr-xr-x@ 14 mitchjo  JAX\Domain Users   448B Oct 22 00:53 [34m..[m[m
-rw-r--r--@  1 mitchjo  JAX\Domain Users   4.3K Oct 22 00:27 2.1.Preannotation with Khipu.ipynb
-rw-r--r--@  1 mitchjo  JAX\Domain Users     0B Oct 17 13:32 2.2.Stable Isotope Tracing.ipynb
-rw-r--r--@  1 mitchjo  JAX\Domain Users    12K Oct 22 00:54 2.3.Advanced PCPFM Usage.ipynb
-rw-r--r--@  1 mitchjo  JAX\Domain Users   4.7K Oct 17 16:05 README.md
-rw-r--r--@  1 mitchjo  JAX\Domain Users   8.7K Oct 22 01:05 khipu.log
-rw-r--r--@  1 mitchjo  JAX\Domain Users   885K Oct 22 01:05 khipu_demo.json
-rw-r--r--@  1 mitchjo  JAX\Domain Users   214K Oct 22 01:05 khipu_demo.tsv
-rw-r--r--@  1 mitchjo  JAX\Domain Users   885K Oct 22 01:04 khipu_test_empricalCompounds.json
-rw-r--r--@  1 mitchjo  JAX\Domain Users   214K Oct 22 01:04 khipu_test_empricalCompounds.tsv


In [30]:
kt = pd.read_csv("./khipu_demo.tsv", sep="\t")
kt.head()

#note, that in this table, there is a mapping of feature to isotope and modification.

Unnamed: 0,id,mz,rtime,empCpd,neutral_formula_mass,isotope,modification,ion_relation
0,F661,126.022,18.68,kp1_73.9544,73.954374,13C/12C*10,ACN,"13C/12C*10,ACN"
1,F688,84.9954,18.68,kp1_73.9544,73.954374,13C/12C*10,M+H+,"13C/12C*10,M+H+"
2,F250,74.9614,17.98,kp1_73.9544,73.954374,M0,M+H+,"M0,M+H+"
3,F11,115.988,19.63,kp1_73.9544,73.954374,M0,ACN,"M0,ACN"
4,F1122,113.0346,30.32,kp2_71.0008,71.000847,M0,ACN,"M0,ACN"


In [31]:
import json
json.load(open("khipu_demo.json"))

#the JSON has all the information of the TSV but in a better format for programs

[{'interim_id': 'kp1_73.9544',
  'neutral_formula_mass': 73.95437403323,
  'neutral_formula': None,
  'Database_referred': [],
  'identity': [],
  'MS1_pseudo_Spectra': [{'id': 'F661',
    'mz': 126.022,
    'rtime': 18.68,
    'intensities': {'12C_Ecoli_20220321_004': 1648307.0,
     '12C_Ecoli_20220321_004_20220322095030': 2292119.0,
     '12C_Ecoli_20220321_004_20220322130235': 2899781.0},
    'representative_intensity': 2280069.0,
    'parent_masstrack_id': '126.022',
    'isotope': '13C/12C*10',
    'modification': 'ACN',
    'ion_relation': '13C/12C*10,ACN'},
   {'id': 'F688',
    'mz': 84.9954,
    'rtime': 18.68,
    'intensities': {'12C_Ecoli_20220321_004': 143904.0,
     '12C_Ecoli_20220321_004_20220322095030': 283507.0,
     '12C_Ecoli_20220321_004_20220322130235': 304366.0},
    'representative_intensity': 243925.66666666666,
    'parent_masstrack_id': '84.9954',
    'isotope': '13C/12C*10',
    'modification': 'M+H+',
    'ion_relation': '13C/12C*10,M+H+'},
   {'id': 'F250