Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compareSpectra limits using joinPeaksGnps #239

Open
LiesaSalzer opened this issue Feb 28, 2022 · 4 comments
Open

compareSpectra limits using joinPeaksGnps #239

LiesaSalzer opened this issue Feb 28, 2022 · 4 comments

Comments

@LiesaSalzer
Copy link

Hi,
I was testing the GNPS functionality from MsCoreUtils with compareSpectra.
However, when I have a lot of MS2 spectra to compare (1603 MS2), it seems that computational limits are reached (plus computations take super long) and I get following Error message:

> GNPS_score <- compareSpectra(ms2_spectra_comb,
+                              MAPFUN = joinPeaksGnps,
+                              FUN = gnps,
+                              tolerance = tolerance,
+                              ppm = ppm, 
+                              type = "inner")
Error in solve_LSAP(score_mat, maximum = TRUE) : 
  long vectors (argument 1) are not supported in .C

I have no idea, if this can be solved somehow, but I just wanted to let you now.

@LiesaSalzer
Copy link
Author

Plus maybe it would be useful to have kind of a progress bar that shows if the code is still running or if it is maybe stuck somewhere?

@jorainer
Copy link
Member

jorainer commented Mar 4, 2022

Hm, the solve_LSAP is called in gnps to find the best match. From the error message it seems to complain that the score matrix (score_mat) is too large? Can you check with max(lengths(ms2_spectra_comb)) what the largest number of peaks in a spectrum is for your dataset?

Regarding a progress bar - yes, agree that that might be helpful - I'm just a little afraid this will slow down calculations even more ... I'll have a look into the function to see what we can do there...

@LiesaSalzer
Copy link
Author

max(lengths(ms2_spectra_comb)) was an excellent idea! I realized I still had a lot of noise in my MS2 spectra because the largest number of peaks was 41494. Therefore, its not surprisingly that the GNPS calculations took forever...

After that I applied a 10 % intensity filter which reduced the number of peaks to 247 - And with that the similarity calculation was successful :)

So maybe it would make sense to include that information in the compareSpectra function?
e.g. something like

if (max(lengths(sps)) > 1000)
warning ("Spectra contain a lot of peaks/ noise. Consider 'filterIntensity' to reduce calculation time")

@jorainer
Copy link
Member

jorainer commented Mar 9, 2022

This would be a good idea - only, I am a little hesitant to add this additional check, because a lengths call would actually loop over all spectra (eventually needing them to be loaded into memory from mzML files or retrieved from the database) to determine the number of peaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants