-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow performance of matchSpectra #38
Comments
Note that csp <- CompareSpectraParam(ppm = 10, requirePrecursor = FALSE, THRESHFUN = function(x) seq_along(x) )
system.time(mtches <- matchSpectra(pest_ms2[1:10], minimb[1:20], csp))
user system elapsed
0.294 0.004 0.298 Thus, I guess most of the performance loss is in calculating the reverse match. I will dig into that and see how we can improve the performance. |
You are right that CompareSpectraParam is a lot faster. But there is still also a ~8x difference between compareSpectra and CompareSpectraParam with a 20x20 comparison. |
On reason will of course be that Note that the biggest benefit of the |
I think the reason for the huge performance difference is the rather unrealistic library(Spectra)
library(msdata)
library(MSnbase)
library(MsCoreUtils)
library(MetaboAnnotation)
library(microbenchmark)
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata")
pest_ms2 <- filterMsLevel(Spectra(fl), 2L)
load(system.file("extdata", "minimb.RData", package = "MetaboAnnotation"))
csp <- MatchForwardReverseParam(ppm = 10, requirePrecursor = FALSE, THRESHFUN = function(x) seq_along(x))
res <- matchSpectra(pest_ms2[1:10], minimb[1:20], csp)
head(res@matches)
query_idx target_idx score reverse_score presence_ratio
1 1 1 0 NaN 0
2 1 2 0 NaN 0
3 1 3 0 NaN 0
4 1 4 0 NaN 0
5 1 5 0 NaN 0
6 1 6 0 NaN 0
nrow(res@matches)
[1] 200 as you see, as a result you get a match of each query spectrum (n = 10) with each target spectrum (n = 20). A more realistic (although exagerated example) would be the param below, that is already ~ 10 times faster. prm <- MatchForwardReverseParam(ppm = 10, requirePrecursor = FALSE, THRESHFUN = function(x) which(x > 0))
microbenchmark(
matchSpectra(pest_ms2[1:10], minimb[1:20], csp),
matchSpectra(pest_ms2[1:10], minimb[1:20], prm),
times = 5)
Unit: milliseconds
expr min lq mean
matchSpectra(pest_ms2[1:10], minimb[1:20], csp) 4370.2852 4513.4549 4750.8329
matchSpectra(pest_ms2[1:10], minimb[1:20], prm) 423.4785 427.4359 441.8056
median uq max neval cld
4598.5202 4921.852 5350.0527 5 b
432.3949 451.801 473.9177 5 a
Obviously, plain prm2 <- CompareSpectraParam(ppm = 10, requirePrecursor = FALSE, THRESHFUN = function(x) which(x > 0))
microbenchmark(
matchSpectra(pest_ms2[1:10], minimb[1:20], prm),
matchSpectra(pest_ms2[1:10], minimb[1:20], prm2),
compareSpectra(pest_ms2[1:10], minimb[1:20], ppm = 10),
times = 5)
Unit: milliseconds
expr min lq
matchSpectra(pest_ms2[1:10], minimb[1:20], prm) 401.91755 407.03423
matchSpectra(pest_ms2[1:10], minimb[1:20], prm2) 263.04207 265.80586
compareSpectra(pest_ms2[1:10], minimb[1:20], ppm = 10) 29.26873 30.54335
mean median uq max neval cld
418.44545 419.26842 420.16566 443.84138 5 b
338.93222 266.98529 274.53658 624.29129 5 b
31.72618 31.08381 33.29108 34.44391 5 a
Summarizing: if you can, use I will however see if I can improve the speed a little bit... so stay tuned :) |
You are right that |
Happy to announce that I could improve the speed, specifically if you're not using the precursor filter: library(Spectra)
library(msdata)
library(MetaboAnnotation)
library(microbenchmark)
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata")
pest_ms2 <- filterMsLevel(Spectra(fl), 2L)
load(system.file("extdata", "minimb.RData", package = "MetaboAnnotation"))
prm <- MatchForwardReverseParam(ppm = 10, requirePrecursor = TRUE, THRESHFUN = function(x) which(x > 0))
prm2 <- MatchForwardReverseParam(ppm = 10, requirePrecursor = FALSE, THRESHFUN = function(x) which(x > 0))
microbenchmark(
matchSpectra(pest_ms2[1:10], minimb[1:20], prm),
matchSpectra(pest_ms2[1:10], minimb[1:20], prm2),
compareSpectra(pest_ms2[1:10], minimb[1:20], ppm = 10),
times = 5)
Unit: milliseconds
expr min lq
matchSpectra(pest_ms2[1:10], minimb[1:20], prm) 54.21599 73.79888
matchSpectra(pest_ms2[1:10], minimb[1:20], prm2) 177.28701 180.51034
compareSpectra(pest_ms2[1:10], minimb[1:20], ppm = 10) 29.90533 30.33629
mean median uq max neval cld
73.29034 75.75084 78.90393 83.78208 5 b
195.55298 186.11268 198.68082 235.17402 5 c
33.86250 34.35269 35.72064 38.99757 5 a Using the precursor filter is (obviously faster), but also without it we're slightly faster now. |
@stanstrup , can you please check the updated version and eventually close the issue if you're ~ OK with it? |
With the above example I now get |
Now, that's very strange. The only |
Interesting. I checked the docs on my computer and it does not have that option. Seems like it was introduced very recently: https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2021/03/06#n2021-03-06 |
I can confirm that it works with R v 4.1.1. |
matchSpectra
seem to be very slow for some reason.matchSpectra
takes 5.8s while the 3 calls tocompareSpectra
takes 0.08s.With a larger comparison this turns into minutes vs. many hours...
Maybe related to #28?
The text was updated successfully, but these errors were encountered: