Skip to content
nbollis edited this page May 31, 2023 · 3 revisions

The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the protoeform in which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, many isotopic peaks, and charge states. The resulting lowered signal-to-noise rations for intact proteins complicates downstream analysis such as deconvolution and database searching. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of scans prior to averaging. Spectral averaging with outlier rejection improves top-down feature detection by refining the isotopic envelope. The count of features increase slightly, and their quality is improved, leading to an increase in proteoform identifications. Outlier rejection algorithms were adopted from astronomical image processing software PixInsight (https://pixinsight.com/) and are expanded upon below.

Creating a New Averaging Task

  • Load mass spec data files
  • Select "New Averaging Task" tab
  • Averaging Parameters: Select one of the three preset methods. For DDA, it is best to try averaging with both DDA methods, then use a subsequent search task to assess which works best.
  • Select "Add the AveragingTask"
  • Run all tasks!

How does the Averaging Task work?

The averaging task can take several minutes. Averaged spectra are generated by

  1. Grouping the scans to be averaged

There are 3 distinct modes for how to select which spectra are averaged together

  1. Average All  

    All spectra in the data file are averaged

  2. Average DDA Scans  

    The user provides a number of scans to be averaged together (default = 5 scans), and each averaged scan in the output data file is a moving average of the original MS1 scans

  3. Average every N scans  

    The user provides a number of scans to be averaged together (default = 5 scans), and each averaged scan in the output data file is a moving average of the original scans regardless of MSn order

  1. Normalizing each spectrum to the average total ion current of the scans to be averaged
  2. Creating a binned m/z axis of use defined bin size (default = 0.01 Th)
  3. Rejecting outlier from each bin

Outlier are rejected by one of the outlier rejection methods adopted from (https://pixinsight.com/) and are detailed below

  1. Calculating a weighted average value for each bin to produce final the m/z and intensity of the averaged spectrum

Output

A averaged .mzML file is produced for each input data file with the text "-averaged" appended to the original filename. File format for each .mzML file is version 1.1.0

A .toml file is produced for each averaged.mzML file and shares the same filename. The .toml file contains any file specific parameters for subsequent in-line G-PTMD-D or search tasks. As long as the .toml file is located in the same directory as the .mzML file, the file specific parameters in the .toml will be used.

Outlier Rejection Methods

Iterative Rejection Algorithms

For a given set of intensity values corresponding to an m/z measurement in a set of N consecutive mass spectra (an peak bin at a given m/z), iterative rejection algorithms remove values outside nσmin - median(intensity) and median(intensity) + nσmax, where n is defined by user input, and σ is an estimate of the standard deviation of the intensity values in the peak bin. During each iteration, σ and the median are calculated based on the non-rejected values of the previous iteration. The iterations continue until no more values are removed, or there are no more intensity values within the peak binthat fall outside the defined range.

Sigma Clipping is the simplest iterative rejection algorithm. For each iteration, the median and the standard deviation are calculated. Each intensity value in the peak bin is checked to see if it falls within nσmin – median and nσmax + median. If the intensity value falls outside the bounds, it is rejected. Iteration continues until there are no more outliers rejected or there are only two values remaining in the peak bin.

Winsorized Sigma Clipping adds winsorization of the values in the intensity value stack before applying sigma clipping rejection. Winsorization enables a more robust calculation of the median and the standard deviation of the peak bin by replacing outliers with the most extreme allowed values. In our implementation of winsorized sigma clipping, we replace all outliers outside the range of median ± 1.5σ with the value from the peak bin closest to the threshold values. After the determination of the winsorized σ and the winsorized median, sigma clipping proceeds with the winsorized values.

Averaged Sigma Clipping is like winsorized sigma clipping in that it implements a modified calculation of the standard deviation. In contrast to the previous sigma clipping algorithms, the calculation of σ is based on the assumption that the noise in the spectra is primarily Poisson-based shot noise.

Single Step Rejection Algorithms

Three single-step rejection algorithms are implemented which rejection criteria based on a single threshold value

MinMax Clipping removes the highest and lowest values in the intensity stack.

Percentile Clipping removes values from the peak bin if they are above or below the upper or lower intensity percentile cutoffs set by the user.

Below Threshold Clipping will remove an entire intensity stack if the stack contains less values than threshold defined by the user, 70% of the number of scans being averaged by default.

Troubleshooting

Input files that do not contain any spectra, or less than the number of spectra to be averaged, will cause the averaging task to fail.

Clone this wiki locally