Skip to content

Spectral Similarity Metrics

Sadjad F Baygi edited this page Dec 11, 2022 · 4 revisions

Noise removal

Li et., al. suggested to use a noise removal ratio (Default = 1%) to eliminate low abundant noisy peaks. The IDSL.FSA workflow also incorporates this data clean-up technique.

Spectra Markers

To accelerate the fragment matching workflow, IDSL.FSA attempts to match a number of characteristic fragmentation peaks from library and sample fragmentation spectra before any other peaks. These characteristic fragmentation peaks are called spectra markers in the IDSL.FSA workflow. SPEC0009 and SPEC0010 parameters are used to define spectra markers for library and experimental fragmentation spectra in the SpectraSimilarity tab of the FSA parameter spreadsheet. SPEC0009 indicates the minimum cutoff (%) for $RelativeIntensity = PeakIntensity/BasepeakIntensity$ to select spectra marker peaks above this cutoff. Likewise, SPEC0010 indicates the minimum percentage of the matched library and sample spectra markers (%) specified by SPEC0009. To accelerate the workflow, IDSL.FSA matches spectra merkers using similar rounded mass values by a digit represetned by SPEC0011. For instance, spectra markers for tryptophan spectra are shown below.

Li et., al. demonstrated spectral entropy can outperform dot product (also known as cosine similarity) in spectra similarity measurement. We incorporated spectral entropy as well as dot product and normalized Euclidean mass error (NEME) in the IDSL.FSA package to provide multi-dimensional comparison between two fragmentation spectra.

Spectral Entropy Similarity

Spectral entropy measurement includes all matched and unmatched peaks.

$$ S = - \sum_{p}I_p ln(I_p) $$

where $I_p$ values represent normalized intensities $(\sum I_p = 1)$.

$$ EntropySimilarity = 1 - {2S^{lib:exptl} - S^{lib} - S^{exptl} \over ln⁡4} $$

Cosine Similarity (Dot Product)

Cosine similarity measurement can only take into account peaks from the reference spectra.

$$ CosineSimilarity = \sum_{i=1}^{NP} \frac{I_i^{lib}I_i^{exptl}}{{\sqrt {\sum_{i=1}^{NP} I_i^{lib}}}{\sqrt {\sum_{i=1}^{NP} I_i^{exptl}}}} $$

where $I_i$, $NP$, superscripts of $lib$ and $exptl$ values represent peak intensities, number of peaks, and library and experimental fragmentation patterns, respectively.

Normalized Euclidean Mass Error (NEME)

Normalized Euclidean mass error (NEME) is a new metric which is able to utilize resolution power of high-resolution mass spectrometry (HRMS) instruments to evaluate quality of spectra matching. NEME is able to only cover matched peaks.

$$ NEME = \sqrt {{\sum_{i = 1}^{NP}\left(M_i^{lib} - M_i^{exptl}\right)^2}/{NP}} $$

where $M_i$, $NP$, superscripts of $lib$ and $exptl$ values represent peak masses, number of peaks, and library and experimental fragmentation patterns, respectively.

Weight transformation

Li et., al. presented a weight transformation formula to boost the intensity of low abundant peaks. Intensity of spectra with lower spectral entropies $(S < 3)$ are weight-transformed via $I_{new} = I^{w}$ where weights are calculated using $w=0.25+0.25*S$.

Citation

Li, Y., Kind, T., Folz, J., Vaniya, A., Mehta, S.S. Fiehn, O. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nature Methods, 2021, 18(12), 1524-1531.

Fakouri Baygi, S., Banerjee S. K., Chakraborty P., Kumar, Y. Barupal, D.K. IDSL.UFA assigns high confidence molecular formula annotations for untargeted LC/HRMS datasets in metabolomics and exposomics. Analytical Chemistry, 2022, 94(39), 13315–13322.