Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Down-sampling spectra #234

Open
cutleraging opened this issue Jan 29, 2022 · 3 comments
Open

Down-sampling spectra #234

cutleraging opened this issue Jan 29, 2022 · 3 comments

Comments

@cutleraging
Copy link

I would like to down-sample spectra in order to know if I am close to reaching saturation of proteins identified in a sample. First I would like to know if this seems reasonable, and if so, what properties of the data need to remain balanced when down-sampling. Second, I am thinking to do this by simply using the sample() function and getting different percentages of spectra. Any thoughts on this?

@jorainer
Copy link
Member

I guess downsampling should be reasonable, but I'm not into proteomics data analysis and I guess it will also depend on what type of data you have.

Technically, you could indeed simply use sps_sub <- sps[sample(100, length(sps))] to randomly select 100 spectra from a Spectra object sps - but eventually you might need to order these again or to randomly select among subsets of spectra in sps.

@cutleraging
Copy link
Author

cutleraging commented Jan 31, 2022 via email

@jorainer
Copy link
Member

Hm, unfortunately there is no backend that does not require reading at least some of the data from the original files. The MsBackendMzR should actually already be relatively fast because it reads only general information from the original files (not the peaks data). So, the subsetting operation would also be quite fast. The only thing that might eventually help is to parallelize this operation (on a per-file basis)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants