Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with calcPPS function #50

Closed
lauzikaite opened this issue Sep 1, 2017 · 11 comments
Closed

Error with calcPPS function #50

lauzikaite opened this issue Sep 1, 2017 · 11 comments

Comments

@lauzikaite
Copy link

I'm trying to optimize xcmsSet parameters for a large study by taking subsets of the data (10 - 20 mzML files) as representatives of different LC-MS modes and running them separately. optimizeXcmsSet() does work with some of the data subsets just fine. But with some datasets, function stops after a few DoE with an error:

Error in peaks(xset)[, c("mz", "rt", "sample", "into", "mzmin", "mzmax", :
subscript out of bounds
Calls: system.time ... xcmsSetExperimentsCluster -> sapply -> lapply -> FUN -> calcPPS

I tried running the code with the same mzMLs on two systems:

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 12 SP1

Code below is for running on x86_64-pc-linux-gnu cluster:

ppparam <- getDefaultXcmsSetStartingParams('centWave')
ppparam$min_peakwidth <- c(1,5) 
ppparam$max_peakwidth <- c(5,10)
ppparam$ppm <-c(20,25) 
ppparam$noise <- c(300, 600) 
ppparam$mzdiff <- c(-0.001, 0.010) 
ppparam$prefilter <- 8  
ppparam$value_of_prefilter <- 6000 
ppparam$integrate <- 2 # 
ppparam$mzCenterFun <- 'wMean' 
ppparam$snthresh <- 10 

opp <- optimizeXcmsSet(files = files,
                         params = ppparam,
                         BPPARAM = MulticoreParam(workers = 10),
                         nSlaves = 1,
                         subdir = output_dir)

Have you encountered such issue before? I don't understand why this issue occurs, since function works with the same, but fewer files (e.g., 2 files, 5 files or 8 files out of 10 files). Furthermore, these mzML files are processed succesfully with xcmsSet() and are not corrupted in any way.

Thank you.

@lauzikaite
Copy link
Author

I have just noticed that this error only occurs if files have no peaks detected with the specified parameters. Below is the output for optimizeXcmsSet() with 5 parameters to optimize for 15 files. First DoE was performed succesfully, but the second DoE crashed:

starting new DoE with:
min_peakwidth: c(3, 7)
max_peakwidth: c(7.5, 12.5)
ppm: c(22.5, 27.5)
mzdiff: c(-0.0065, 0.0045)
noise: c(150, 450)
snthresh: 10
prefilter: 8
value_of_prefilter: 6000
mzCenterFun: wMean
integrate: 2
fitgauss: FALSE
verbose.columns: FALSE

1

...

22
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 345 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 374 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 383 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 389 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 376 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 363 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 399 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 390 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 381 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 369 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 389 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 382 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 348 regions of interest ... OK: 1 found.
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 434 regions of interest ... FAIL: none found!
Detecting mass traces at 27.5 ppm ... OK
Detecting chromatographic peaks in 427 regions of interest ... FAIL: none found!
Error in peaks(xset)[, c("mz", "rt", "sample", "into", "mzmin", "mzmax", :
subscript out of bounds
Calls: system.time ... xcmsSetExperimentsCluster -> sapply -> lapply -> FUN -> calcPPS

@rietho
Copy link
Owner

rietho commented Sep 11, 2017

Hi!
Thank you for reporting your error. Sorry also for the late response, as I was out of office.
Can you also please report your session sessionInfo(), so that I know the used package versions. Then I'll look into it.
Can you also share the data with me or otherwise reproduce the error with public available data?

@lauzikaite
Copy link
Author

Hi!

Thank you for your reply.

Data is located on Dropbox. Same error has been produced with multiple datasets from my study and I uploaded just ten QC samples here.

sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 12 SP1

Matrix products: default
BLAS: /apps/R/3.4.0/lib64/R/lib/libRblas.so
LAPACK: /apps/R/3.4.0/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] IPO_1.2.0 CAMERA_1.32.0 rsm_2.8
[4] xcms_1.52.0 MSnbase_2.2.0 ProtGenerics_1.8.0
[7] mzR_2.10.0 Rcpp_0.12.11 BiocParallel_1.10.1
[10] Biobase_2.36.2 BiocGenerics_0.22.0

loaded via a namespace (and not attached):
[1] vsn_3.44.0 splines_3.4.0 foreach_1.4.3
[4] Formula_1.2-1 affy_1.54.0 stats4_3.4.0
[7] latticeExtra_0.6-28 RBGL_1.52.0 impute_1.50.1
[10] backports_1.1.0 lattice_0.20-35 limma_3.32.2
[13] digest_0.6.12 RColorBrewer_1.1-2 checkmate_1.8.2
[16] colorspace_1.3-2 sandwich_2.3-4 htmltools_0.3.6
[19] preprocessCore_1.38.1 Matrix_1.2-10 plyr_1.8.4
[22] MALDIquant_1.16.2 XML_3.98-1.9 zlibbioc_1.22.0
[25] xtable_1.8-2 mvtnorm_1.0-6 scales_0.4.1
[28] RANN_2.5.1 affyio_1.46.0 lsmeans_2.26-3
[31] htmlTable_1.9 tibble_1.3.3 IRanges_2.10.2
[34] ggplot2_2.2.1 TH.data_1.0-8 nnet_7.3-12
[37] lazyeval_0.2.0 MassSpecWavelet_1.42.0 survival_2.41-3
[40] magrittr_1.5 estimability_1.2 doParallel_1.0.10
[43] nlme_3.1-131 MASS_7.3-47 foreign_0.8-69
[46] graph_1.54.0 BiocInstaller_1.26.0 tools_3.4.0
[49] data.table_1.10.4 multcomp_1.4-6 stringr_1.2.0
[52] S4Vectors_0.14.3 munsell_0.4.3 cluster_2.0.6
[55] pcaMethods_1.68.0 compiler_3.4.0 mzID_1.14.0
[58] rlang_0.1.1 grid_3.4.0 iterators_1.0.8
[61] htmlwidgets_0.8 igraph_1.0.1 base64enc_0.1-3
[64] gtable_0.2.0 codetools_0.2-15 multtest_2.32.0
[67] gridExtra_2.2.1 zoo_1.8-0 knitr_1.16
[70] Hmisc_4.0-3 stringi_1.1.5 rpart_4.1-11
[73] acepack_1.4.1 coda_0.19-1

I do use a modified plotContours() funtion to prevent jpeg generation, since the Linux cluster does not support graphical display. It shouldn't have any effect on the overall process though.

source("/.../plotContours_1.R")
tmpfun <- get("plotContours", envir = asNamespace("IPO"))
environment(plotContours_1) <- environment(tmpfun)
attributes(plotContours_1) <- attributes(tmpfun)
assignInNamespace("plotContours", plotContours_1, ns="IPO")

plotContours_1.R looks like that:

plotContours_1 <- function(model, maximum_slice, plot_name = NULL) {
  print('Modified plotContours') # To check whether v2 was re-assigned to the package
  plots <- c()
  for(i in 1:(length(maximum_slice)-1)) {
    for(j in (i+1):length(maximum_slice)) {
      plots <- c(plots, as.formula(paste("~ x", i, "* x", j, sep="")))
    } 
  }
  
  # determine number of plot rows and column on single device
  plot_rows <- round(sqrt(length(plots)))
  plot_cols <- 
    if(plot_rows==1){
      length(plots)
    } else {
      ceiling(sqrt(length(plots)))
    }
  
  # save as jpeg, if plot_name is given
  if(!is.null(plot_name)) {
    plot_name = paste(plot_name, ".jpg", sep="")
    # jpeg(plot_name, width=4*plot_cols, height=2*plot_rows+2, 
    #      units="in", res=c(200,200))
  } # otherwise plot on device
  
  # op <- par(mfrow = c(plot_rows, plot_cols), oma = c(0,0,0,0))  
  # contour.lm is called
  # contour(model, plots, image = TRUE, at = maximum_slice)
  
  if (!is.null(plot_name)) {
    # dev.off()
  }
  # par(op)
}

@rietho
Copy link
Owner

rietho commented Sep 15, 2017

Thank you for the data and the sessionInfo().

The issue seems to arise in the rare case, when there is only one (or just few) peak(s) found. In this case the xcmsSet-object does not hold a sample-argument, which is expected by IPO. This might be a bug in the xcms-package, which I'm not sure yet. I'll have to have a deeper look into it to find out.

@rietho
Copy link
Owner

rietho commented Sep 18, 2017

So the problem was in the xcms-package (see sneumann/xcms#220). An update to the newest xcms3 branch from github should resolve the problem (see also sneumann/xcms#220). However I'll add a workaround in IPO to also work with older xcms-versions.

@lauzikaite
Copy link
Author

Thank you for looking into this, I will try running IPO with newest xcms3 branch.

@rietho
Copy link
Owner

rietho commented Sep 20, 2017

According to my tests the develop-branch (see 2abc39e) should now handle this issue correctly also for older xcms-versions.

@lauzikaite
Copy link
Author

Brilliant, the develop-branch works just fine with the old xcms. xcms3 with the IPO Bioconductor-version works too.

Also, my suggestion for a new version is to provide an option to disable jpeg generation, which other users might find useful too :)

Thanks for your help!

@rietho
Copy link
Owner

rietho commented Sep 22, 2017

Thank you very much for testing. The bug-fixes are on the way to Bioconductor as well.

Regarding the jpeg generateion: Are you aware of setting subdir = NULL to plot to the graphics device? Or are you asking for an option to completely disable plotting?

@lauzikaite
Copy link
Author

Yes, an option to completely disable plotting would be useful for those that are using linux clusters, which do not support graphical display.

@rietho
Copy link
Owner

rietho commented Sep 22, 2017

Thanks for the suggestion. I guess there's no reason against providing an option to disable plotting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants