Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formation of spectral libraries by representative spectra #9

Open
percolator opened this issue Oct 13, 2019 · 2 comments
Open

Formation of spectral libraries by representative spectra #9

percolator opened this issue Oct 13, 2019 · 2 comments

Comments

@percolator
Copy link

Abstract

Methods to represent multiple spectra
Spectral library searching offers a sensitive yet fast method to match spectra from mass spectrometry-based proteomics experiments. The technique was first introduced for searching spectra from data-dependent acquisition (DDA)  but has proven essential for the analysis of data-independent acquisition spectra.
As an input, the technique requires spectral libraries. Such entities could be assembled from previously acquired DDA MS2 spectra. One critical step of this assembly process is the integration of the potentially large number of spectra that stem from an individual peptide-species into a single representative spectrum. Here, we will implement and benchmark a couple of such strategies to form representative spectra for the use in spectral libraries.

Work plan

Different strategies have been suggested for forming representative spectra. Frank et al. (JPR 2008) list five strategies, where one selects the representative spectrum to be:

  1. The "best spectrum”: the spectrum that maximizes a certain score, e.g., percent of explained intensity or percent of explained b/y ions.
  2. The “consensus spectrum”: a virtual spectrum constructed by averaging all spectra in the cluster. (Tabb et al. JASMS 2005) 
  3. The “most similar spectrum”: the spectrum that has the highest average similarity to the other cluster members (Tabb et al. Anal Chem 2003).
  4. The “de novo spectrum”: the spectrum that has the highest score when submitted to de novo sequencing.
  5. The random spectrum: a spectrum chosen from the cluster at random.

In this workshop, we will first establish datasets and code to benchmark different methods to form representative spectra. We will implement a couple of the methods mentioned above as well as further improvements from such methods, benchmark the methods and examine their properties. Ideally, we form separate teams implementing different methods.

Technical details

We will mainly use Python 3.7.

Contact information

Lukas Käll
KTH - Royal Institute of Technology
Stockholm, Sweden
lukas.kall@scilifelab.se

@ypriverol
Copy link

I'm in!!!

@percolator
Copy link
Author

A repository for the hackathon is available through this link
https://github.com/statisticalbiotechnology/specpride

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants