# Pre-Annotation for LC-MS data (khipu)

Pre-Annotation is grouping isotopologues, adducts and fragments into tentative compounds.

Tools for pre-Annotation include CAMERA, Binner, Mz.Unit, xMSannotator and more.
We use Khipu here. The empirical compounds are unique to Khipu as a key data structure. Khipu enforces a tree structure that makes chained computing possible (Li, S. and Zheng, S., 2023. Analytical chemistry, 95(15), pp.6212-6217. (https://pubs.acs.org/doi/10.1021/acs.analchem.2c05810) ).

First we will install Khipu and demonstrate stand alone usage before working with the pipeline or applying this software to isotope labelled data. 

In [None]:
!pip3 install --upgrade khipu-metabolomics

import requests, zipfile, io, os

os.makedirs("./Datasets", exist_ok=True)

datasets = [
    "https://raw.githubusercontent.com/shuzhao-li-lab/khipu/refs/heads/main/testdata/ecoli_pos.tsv",
]

for dataset in datasets:
    r = requests.get(dataset)
    if dataset.endswith(".zip"):
        z = zipfile.ZipFile(io.BytesIO(r.content))
        z.extractall("./Datasets/")
    else:
        with open("./Datasets/" + os.path.basename(dataset), 'bw+') as out_fh:
            out_fh.write(r.content)

In [None]:
from khipu.extended import *

# lets take a look at the mass patterns that we will use pre-annotation.

for x in (adduct_search_patterns, isotope_search_patterns, extended_adducts):
    for z in x: 
        print(z)
    print("\n")

# each is a tuple of the form (mz delta, name, ...)

Note that the isotopologues Khipu searches for up to m+13C12. For unlabeled studies, this is excessive and better results may be generated by subsetting the isotopologues.

In [None]:
# example of limiting the isotopologues to m+13C3

m13c3_patterns = isotope_search_patterns[:3]
print(m13c3_patterns)

# as our example dataset is labelled, we can keep the default list.
# however, you could do isotope_search_patterns = isotope_search_patterns[:3]

In [None]:
!khipu -i ./Datasets/ecoli_pos.tsv -o khipu_demo

In [None]:
# khipu generates a tsv and json file

!ls -alhR

In [None]:
kt = pd.read_csv("./khipu_demo.tsv", sep="\t")
kt.head()

#note, that in this table, there is a mapping of feature to isotope and modification.

In [None]:
import json
empcpds = json.load(open("khipu_demo.json"))
empcpds

#the JSON has all the information of the TSV but in a better format for programs

In [None]:
from khipu.plot import plot_json_khipu

# this empCpd is possibly labeled but this could be natural abundance
plot_json_khipu(empcpds[28])

## Notebook Summary

Now you can use Khipu to pre-annotate your datasets and assign adducts and isotopologues to features in an untargeted manner. 