### Import pyOpenMS wrappers (cpp) for peak picking and metabolite deconvolution

In [1]:
from pyopenms import *

Determination of memory status is not supported on this 
 platform, measuring for memoryleaks will never fail


In [2]:
input_mzML = "data Thermo Orbitrap ID-X/FileFiltered Std/20210129_DR_UMETAB179_ISP2S_collinus3.mzML"
exp = MSExperiment()
MzMLFile().load(input_mzML, exp)

In [3]:
#OR:
#import sys
#MzMLFile().load(sys.argv[1], exp)

In [4]:
print(exp.getSourceFiles()[0].getNativeIDTypeAccession())
print(exp.getSourceFiles()[0].getNativeIDType())

MS:1000768
Thermo nativeID format


#### Sorts the data points by retention time.

In [5]:
exp.sortSpectra(True)

### Run mass trace detection

A mass trace extraction method that gathers peaks similar in m/z and moving along retention time.

Peaks of a MSExperiment are sorted by their intensity and stored in a list of potential chromatographic apex positions. Only peaks that are above the noise threshold (user-defined) are analyzed and only peaks that are n times above this minimal threshold are considered as apices. This saves computational resources and decreases the noise in the resulting output.

Starting with these, mass traces are extended in- and decreasingly in retention time. During this extension phase, the centroid m/z is computed on-line as an intensity-weighted mean of peaks.

The extension phase ends when either the frequency of gathered peaks drops below a threshold (min_sample_rate, see MassTraceDetection parameters) or when the number of missed scans exceeds a threshold (trace_termination_outliers, see MassTraceDetection parameters).

Finally, only mass traces that pass a filter (a certain minimal and maximal length as well as having the minimal sample rate criterion fulfilled) get added to the result.

In [6]:
mass_traces = []
mtd = MassTraceDetection()

#### parameters

In [7]:
mtd_par = mtd.getDefaults()
mtd_par.setValue("mass_error_ppm", 10.0) 
mtd_par.setValue("noise_threshold_int", 1.0e04)
mtd.setParameters(mtd_par)

In [8]:
mtd.run(exp, mass_traces, 0)  # 0 is default and does not restrict found mass traces

### Run elution peak detection

Extracts chromatographic peaks from a mass trace.

Mass traces may consist of several consecutively (partly overlapping) eluting peaks, e.g., stemming from (almost) isobaric compounds that are separated by retention time. Especially in metabolomics, isomeric compounds with exactly the same mass but different retentional behaviour may still be contained in the same mass trace.

This method first applies smoothing on the mass trace's intensities, then detects local minima/maxima in order to separate the chromatographic peaks from each other. Detection of maxima is performed on the smoothed intensities and uses a fixed peak width (given as parameter chrom_fwhm) within which only a single maximum is expected. Currently smoothing is done using SavitzkyGolay smoothing with a second order polynomial and a frame length of the fixed peak width.

Depending on the "width_filtering" parameters, mass traces are filtered by length in seconds ("fixed" filter) or by quantile.

The output of the algorithm is a set of chromatographic peaks for each mass trace, i.e. a vector of split mass traces (see ElutionPeakDetection parameters).

In general, a user would want to call the "detectPeaks" functions, potentially followed by the "filterByPeakWidth" function.



In [9]:
mass_traces_split = []
mass_traces_final = []
epd = ElutionPeakDetection()

#### parameters

In [10]:
epd_par = epd.getDefaults()
epd_par.setValue("width_filtering", "fixed")
epd.setParameters(epd_par)

In [11]:
epd.detectPeaks(mass_traces, mass_traces_split)
if (epd.getParameters().getValue("width_filtering") == "auto"):
    epd.filterByPeakWidth(mass_traces_split, mass_traces_final)
else:
    mass_traces_final = mass_traces_split

### Run feature detection

FeatureFinderMetabo assembles metabolite features from singleton mass traces.

Mass traces alone would allow for further analysis such as metabolite ID or statistical evaluation. However, in general, monoisotopic mass traces are accompanied by satellite C13 peaks and thus may render the analysis more difficult. FeatureFinderMetabo fulfills a further data reduction step by assembling compatible mass traces to metabolite features (that is, all mass traces originating from one metabolite). To this end, multiple metabolite hypotheses are formulated and scored according to how well differences in RT (optional), m/z or intensity ratios match to those of theoretical isotope patterns.

If the raw data scans contain the scan polarity information, it is stored as meta value "scan_polarity" in the output file.

Mass trace clustering can be done using either 13C distances or a linear model (Kenar et al) – see parameter 'ffm:mz_scoring_13C'. Generally, for lipidomics, use 13C, since lipids contain a lot of 13C. For general metabolites, the linear model is usually more appropriate. To decide what is better, the total number of features can be used as indirect measure.

the lower(!) the better (since more mass traces are assembled into single features). Detailed information is stored in the featureXML output: it contains meta-values for each feature about the mass trace differences (inspectable via TOPPView). If you want this in a tabular format, use TextExporter, i.e.,
TextExporter.exe -feature:add_metavalues 1 -in <ff_metabo.featureXML> -out <ff_metabo.csv>
By default, the linear model is used.

In [12]:
feature_map_FFM = FeatureMap()
feat_chrom = []
ffm = FeatureFindingMetabo()

#### parameters

In [13]:
ffm_par = ffm.getDefaults() 
ffm_par.setValue("isotope_filtering_model", "none")
#ffm_par.setValue("remove_single_traces", "true")
ffm_par.setValue("mz_scoring_by_elements", "true")
ffm.setParameters(ffm_par)

In [14]:
ffm.run(mass_traces_final, feature_map_FFM, feat_chrom)

In [15]:
print('# Mass traces filtered:', len(mass_traces_final))

# Mass traces filtered: 5726


#### Save FeatureXML file of FFM

In [16]:
feature_map_FFM.setUniqueIds()
fh = FeatureXMLFile()
print("Found", feature_map_FFM.size(), "features")
fh.store('./wf_testing/FeatureFindingMetabo.featureXML', feature_map_FFM)

Found 5187 features


#### Export it to csv -need to leave python

In [17]:
# /Applications/OpenMS-2.6.0-pre-nightly-2021-04-04/bin/TextExporter -feature:add_metavalues 1 -in /Users/eeko/Desktop/py4e/wf_testing/FeatureFindingMetabo.featureXML -out /Users/eeko/Desktop/py4e/wf_testing/FeatureFindingMetabo.csv

### Run metabolite adduct decharging detection
#### With SIRIUS you are only able to use singly charged adducts

In [18]:
mfd = MetaboliteFeatureDeconvolution()

#### parameters

In [19]:
mdf_par = mfd.getDefaults()
mdf_par.setValue("potential_adducts",  [b"H:+:0.6",b"Na:+:0.2",b"NH4:+:0.1", b"H2O:-:0.1"])
mdf_par.setValue("charge_min", 1, "Minimal possible charge")
mdf_par.setValue("charge_max", 1, "Maximal possible charge")
mdf_par.setValue("charge_span_max", 1)
mdf_par.setValue("max_neutrals", 1)
print(mdf_par.getValue("potential_adducts")) # test if adducts have been set correctly
mfd.setParameters(mdf_par)

[b'H:+:0.6', b'Na:+:0.1', b'NH4:+:0.1', b'K:+:0.1', b'H2O:-:0.1']


In [20]:
feature_map_DEC = FeatureMap()
cons_map0 = ConsensusMap()
cons_map1 = ConsensusMap()
mfd.compute(feature_map_FFM, feature_map_DEC, cons_map0, cons_map1)

### Save deconvoluted Feature XML file

In [21]:
fxml = FeatureXMLFile()
fxml.store("./wf_testing/devoncoluted.featureXML", feature_map_DEC)

## SIRIUS Adapter Algorithm

In [22]:
sirius_algo = SiriusAdapterAlgorithm()

#### parameters

In [23]:
sirius_algo_par = sirius_algo.getDefaults()

sirius_algo_par.setValue("preprocessing:filter_by_num_masstraces", 2) 
sirius_algo_par.setValue("preprocessing:precursor_mz_tolerance", 10.0)
sirius_algo_par.setValue("preprocessing:precursor_mz_tolerance_unit", "ppm")
#sirius_algo_par.setValue("preprocessing:precursor_rt_tolerance", 5.0)
sirius_algo_par.setValue("preprocessing:feature_only", "true")
sirius_algo_par.setValue("sirius:profile", "orbitrap")
#sirius_algo_par.setValue("sirius:db", "all")
#sirius_algo_par.setValue("sirius:ions_considered", "[M+H]+, [M-H2O+H]+, [M+Na]+, [M+NH4]+")
sirius_algo_par.setValue("sirius:candidates", 5)
sirius_algo_par.setValue("sirius:elements_enforced", "CHNOP") 
sirius_algo_par.setValue("project:processors", 2)
sirius_algo.setParameters(sirius_algo_par)

In [24]:
featureinfo = "./wf_testing/devoncoluted.featureXML"
fm_info = FeatureMapping_FeatureMappingInfo()
feature_mapping = FeatureMapping_FeatureToMs2Indices() 
sirius_algo.preprocessingSirius(featureinfo,
                                exp,
                                fm_info,
                                feature_mapping)

In [25]:
sirius_algo.logFeatureSpectraNumber(featureinfo, 
                                    feature_mapping,
                                    exp)

In [26]:
msfile = SiriusMSFile()
debug_level = 10
sirius_tmp = SiriusTemporaryFileSystemObjects(debug_level)
siriusstring= String(sirius_tmp.getTmpMsFile())

In [27]:
feature_only = sirius_algo.isFeatureOnly()
isotope_pattern_iterations = sirius_algo.getIsotopePatternIterations()
no_mt_info = sirius_algo.isNoMasstraceInfoIsotopePattern()
compound_info = []

msfile.store(exp, 
             String(sirius_tmp.getTmpMsFile()),
             feature_mapping, 
             feature_only,
             isotope_pattern_iterations, 
             no_mt_info, 
             compound_info)
print(String(sirius_tmp.getTmpMsFile())) #temp location of this file in case you want to check it

b'/private/var/folders/c_/ysz9v_bd1yb7h3ymmkn6m199jbv7x7/T/20210416_115148_nnfcb-l0682.clients.net.dtu.dk_17687_2.ms'


In [28]:
out_csifingerid = "./wf_testing/csifingerID.mzTab" # empty string " " : when no file is specified, no CSIFingerId Output will be generated
executable= "/Users/eeko/Desktop/software/THIRDPARTY/MacOS/64bit/Sirius/sirius"
subdirs = sirius_algo.callSiriusQProcess(String(sirius_tmp.getTmpMsFile()),
                                         String(sirius_tmp.getTmpOutDir()),
                                         String(executable),
                                         String(out_csifingerid),
                                         False)

In [29]:
candidates = sirius_algo.getNumberOfSiriusCandidates()
sirius_result = MzTab()
siriusfile = MzTabFile()
SiriusMzTabWriter.read(subdirs,
                        input_mzML,
                        candidates,
                        sirius_result)
siriusfile.store("./wf_testing/out_sirius_test.mzTab", sirius_result)

In [30]:
top_hits= 5
csi_result=MzTab()
csi_file=MzTabFile()
CsiFingerIdMzTabWriter.read(subdirs,
                    input_mzML,
                    top_hits,
                    csi_result)

csi_file.store("./wf_testing/csifingerID.mzTab", csi_result)