This repository contains Python code to process the output data from Compound Discoverer to indentify compounds that has incorporated an isotopically labelled substrate.
For example, mammalian cell lines grown in media with 50 % unlabelled Cysteine and 50 % 13C3, 15N1 Cysteine (m+4) will incorporate the same labelling fraction into downstream metabolites like glutathione and lactoylglutathione.
Three examples are provided:
- Cysteine and glutamine tracing in three cancer cell lines. Input and output in the folder
projects/three-cell-linesand notebook to process the data:peak-pair-analysis_three-cell-lines.ipynb - Cysteine tracing in two cancer cell lines treated with six alkylating agents. Input and output in the folder
projects/alkylation-agentsand notebook to process the data:peak-pair-analysis_alkylation-agents.ipynb - Cysteine tracing in a set of bile duct cancer cell lines. Input and output in the folder
projects/bile-duct-cells_cys-tracingand notebook to process the data:peak-pair-analysis_bile-duct-cells_cys-tracing.ipynb
- Clone this repository.
- Install Python packages that are dependencies (see below).
- Run one or several of the example notebooks.
- If you want to use this code with your own data start by copying one of the example notebooks and use this as a boilerplate.
- Prepare your input data according to the requirements described below. We recommend copying the folder structure of one of the examples e.g.
projects/bile-duct-cells_cys-tracingand use these input files as templates. - Adjust the parameters for peak filtering and peak pair finding. These are input dependent e.g. the mass shift and retention time tolerances must reflect the mass accuracy and chromatographic setup.
- Adjust the use of tracer (see below) if different from those used in the examples.
- Adjust the peak pair area ratio to reflect the mixing ratio of labelled and unlabelled tracer. This can also be performed empirically by inputting expected peak pairs to the
pick_ratiomethod shown in the example notebooks. - Generate peak pair using the
find_pairsmethod as shown in the example notebooks. - Inspect peak pairs and adjust filtering parameters. Repeat filter adjustments until satisfied.
We provide examples using two tracers: m+4 cysteine (13C3, 15N) and m+5 glutamine (13C5).
Other tracers can be used by defining their isotope composition (isotopomer) using a space separated list of [<nominal mass>]<element name><element quantity> e.g. the m+3 serine isotopologue with 18O on the hydroxyl and 15N on the amino would be indentified as: [15]N1 [18]O1 (if the element quantity is not supplied it is assumed 1).
Or m+3 serine with uniformly labelled 13C would be indentified as: [13]C3.
These can be specified as 'labels' in the params dictionary.
For example:
'labels': {
'ser_m3_1': '[15]N [18]O',
'ser_m3_2': '[13]C3',
},
These label names must be consistent with those used in the input JSON file that describes the samples.
They are also used to refer back to the peak area ratio: parent/(parent+heavy).
For example using a ratio of 0.4 to 0.6 for ser_m3_1 and 0.5 to 0.7 for ser_m3_2:
'area_ratio_cutoff': {
'ser_m3_1': ((0.4, 0.6),),
'ser_m3_2': ((0.5, 0.7),),
},
Finally, the mass shift for each label must be specified. This can be input manually like this:
'MW_shift': {
'ser_m3_1': 3.001279886900008,
'ser_m3_2': 3.010064506020001,
},
But can also be calculated in place using the isotope composition:
params['MW_shift'] = dict()
for label in params['labels']:
params['MW_shift'][label] = isotope_obj.isotopes2mass_shift(params['labels'][label])
We do not apply natural abundance correction for the simple reason that we do not know the chemical formula of the unknown RMA traced compounds. For some RMA traced compounds we of course have their chemical formula and could perform natural abundance correction; however, since the purpose of this method is to identify unknown metabolites, we chose to treat known and unknown compounds consistently and thus skip natural abundance correction all together. This choice is not a problem for tracers that have no substantial bleed over from natural abundance e.g. 13C3, 13C4, {13C3, 15N} etc. However, tracers with single labels like 15N and 13C, and to a lesser extent tracers with double labels like 15N2, 13C2 and {15N, 13C}, are not as robust. Prospective users must consider this when designing experiments.
- Excel file with the output of Compound Discoverer. This file should contain one row per compound and three columns for: "Name", "Molecular Weight", and "RT [min]". And then followed by the columns for the peak area per sample e.g.: "Area: KD031720_031720_BD_C01.raw (F1)", "Area: KD031720_031720_BD_C02.raw (F2)", etc. The peak area columns must contain the tag "Area", which it will automatically if the Compound Discoverer output is not altered.
- A JSON file describing each input sample, its type, label etc. See the examples provided in the
projectsfolder.
The following Python packages are required and can be install with pip or conda commands:
- jupyterlab
- pandas
- HTMLParser
- numpy
- seaborn
- matplotlib
We recommend using an Anaconda Python install which already has many of the above packages installed by default.