Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running scde on linux cluster. Fortran "dqrls" not resolved #55

Closed
FloWuenne opened this issue Dec 4, 2017 · 8 comments
Closed

Comments

@FloWuenne
Copy link

Hi there,

I am trying to run SCDE on a Linux computational cluster.

I am getting the following error when running on a single core:

Error in .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,  :
  "dqrls" not resolved from current namespace (scde)
Calls: scde.error.models ... <Anonymous> -> glm.nb.fit -> glm.fitter -> .Fortran
Execution halted

Which seems to be a similar issue to this thread:
(https://github.com/hms-dbmi/scde/issues/21)

I have recompiled SCDE from the binary of the developer version but this did not solve the issue. I also tried revertingg the flexmix version and using it with the stable scde version. I tried both of this on the cluster as well on my local machine and both gave the same error.

When trying with multiple cores, I am getting the same problem described here:
(https://github.com/hms-dbmi/scde/issues/31)

Any suggestions what I can try for either single core but preferably to get the multicore command do work?

My sessionInfo() :
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.3 (Carbon)

Matrix products: default
BLAS: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRblas.so
LAPACK: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats4 grDevices datasets parallel stats graphics utils
[8] methods base

other attached packages:
[1] scde_2.7.0 flexmix_2.3-13
[3] lattice_0.20-35 bindrcpp_0.2
[5] qvalue_2.10.0 edgeR_3.20.1
[7] limma_3.34.1 gtools_3.5.0
[9] scater_1.6.0 SingleCellExperiment_1.0.0
[11] SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[13] matrixStats_0.52.2 GenomicRanges_1.30.0
[15] GenomeInfoDb_1.14.0 IRanges_2.12.0
[17] S4Vectors_0.16.0 ggrepel_0.7.0
[19] RColorBrewer_1.1-2 ggsci_2.8
[21] dplyr_0.7.4 tidyr_0.7.2
[23] data.table_1.10.4-3 Seurat_2.1.0
[25] Matrix_1.2-9 cowplot_0.8.0
[27] ggplot2_2.2.1 pander_0.6.1
[29] knitr_1.16 bigmemory_4.5.31
[31] bigmemory.sri_0.1.3 Biobase_2.38.0
[33] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
[1] shinydashboard_0.6.1 R.utils_2.6.0
[3] lme4_1.1-13 RSQLite_2.0
[5] AnnotationDbi_1.40.0 htmlwidgets_0.9
[7] grid_3.4.0 trimcluster_0.1-2
[9] ranger_0.8.0 BiocParallel_1.12.0
[11] Rtsne_0.13 munsell_0.4.3
[13] codetools_0.2-15 ica_1.0-1
[15] colorspace_1.3-2 ROCR_1.0-7
[17] robustbase_0.92-7 dtw_1.18-1
[19] distillery_1.0-4 NMF_0.20.6
[21] labeling_0.3 lars_1.2
[23] tximport_1.6.0 GenomeInfoDbData_0.99.1
[25] mnormt_1.5-5 bit64_0.9-7
[27] extRemes_2.0-8 rhdf5_2.22.0
[29] diptest_0.75-7 R6_2.2.2
[31] doParallel_1.0.11 ggbeeswarm_0.6.0
[33] VGAM_1.0-4 locfit_1.5-9.1
[35] RcppArmadillo_0.8.100.1.0 bitops_1.0-6
[37] assertthat_0.2.0 SDMTools_1.1-221
[39] scales_0.4.1 nnet_7.3-12
[41] ggjoy_0.3.0 beeswarm_0.2.3
[43] gtable_0.2.0 Cairo_1.5-9
[45] rlang_0.1.4 MatrixModels_0.4-1
[47] scatterplot3d_0.3-40 splines_3.4.0
[49] lazyeval_0.2.0 ModelMetrics_1.1.0
[51] acepack_1.4.1 brew_1.0-6
[53] checkmate_1.8.3 reshape2_1.4.2
[55] backports_1.1.0 httpuv_1.3.5
[57] Hmisc_4.0-3 caret_6.0-76
[59] tools_3.4.0 gridBase_0.4-7
[61] gplots_3.0.1 proxy_0.4-17
[63] Rcpp_0.12.14 plyr_1.8.4
[65] base64enc_0.1-3 progress_1.1.2
[67] zlibbioc_1.24.0 purrr_0.2.4
[69] RCurl_1.95-4.8 prettyunits_1.0.2
[71] rpart_4.1-11 pbapply_1.3-3
[73] viridis_0.4.0 cluster_2.0.6
[75] magrittr_1.5 SparseM_1.77
[77] pcaMethods_1.70.0 mvtnorm_1.0-6
[79] mime_0.5 xtable_1.8-2
[81] pbkrtest_0.4-7 XML_3.98-1.9
[83] mclust_5.3 RMTstat_0.3
[85] gridExtra_2.2.1 compiler_3.4.0
[87] biomaRt_2.34.0 tibble_1.3.4
[89] KernSmooth_2.23-15 minqa_1.2.4
[91] R.oo_1.21.0 htmltools_0.3.6
[93] segmented_0.5-2.1 mgcv_1.8-17
[95] Formula_1.2-2 tclust_1.2-7
[97] DBI_0.7 diffusionMap_1.1-0
[99] MASS_7.3-47 fpc_2.1-10
[101] car_2.1-6 R.methodsS3_1.7.1
[103] gdata_2.18.0 bindr_0.1
[105] igraph_1.1.2 pkgconfig_2.0.1
[107] sn_1.5-0 registry_0.3
[109] numDeriv_2016.8-1 foreign_0.8-67
[111] foreach_1.4.3 vipor_0.4.5
[113] rngtools_1.2.4 pkgmaker_0.22
[115] XVector_0.18.0 stringr_1.2.0
[117] digest_0.6.12 tsne_0.1-3
[119] Rook_1.1-1 htmlTable_1.9
[121] kernlab_0.9-25 Lmoments_1.2-3
[123] shiny_1.0.3 quantreg_5.34
[125] modeltools_0.2-21 rjson_0.2.15
[127] nloptr_1.0.4 nlme_3.1-131
[129] viridisLite_0.2.0 DEoptimR_1.0-8
[131] survival_2.41-3 glue_1.1.1
[133] FNN_1.1 prabclus_2.2-6
[135] iterators_1.0.8 bit_1.1-12
[137] class_7.3-14 stringi_1.1.5
[139] mixtools_1.1.0 blob_1.1.0
[141] latticeExtra_0.6-28 caTools_1.17.1
[143] memoise_1.1.0 irlba_2.2.1
[145] ape_4.1

@FloWuenne FloWuenne changed the title Problem runnin scde on linux cluster Fortran "dqrls" no resolved Problem running scde on linux cluster Fortran "dqrls" no resolved Dec 4, 2017
@FloWuenne FloWuenne changed the title Problem running scde on linux cluster Fortran "dqrls" no resolved Problem running scde on linux cluster. Fortran "dqrls" no resolved Dec 4, 2017
@FloWuenne FloWuenne changed the title Problem running scde on linux cluster. Fortran "dqrls" no resolved Problem running scde on linux cluster. Fortran "dqrls" not resolved Dec 4, 2017
@JEFworks
Copy link
Member

JEFworks commented Dec 4, 2017

Hi Florian,

Thanks for the thorough documentation of your issue. Can you please try to see if the error persists with knn.error.models in addition to scde.error.models?

Also, can you please check if the multicore issue is being caused by mclapply or bplapply? We currently handle multicore processing using this function:

papply <- function(...,n.cores=detectCores()) {
  if(n.cores>1) {
    # bplapply implementation
    if(is.element("parallel", installed.packages()[,1])) {
      mclapply(...,mc.cores=n.cores)
    } else {
      # last resort
      bplapply(... , BPPARAM = MulticoreParam(workers = n.cores))
    }
  } else { # fall back on lapply
    lapply(...);
  }
}

You can explicitly modify the papply function to use one or the other:

papply <- function(...,n.cores=detectCores()) {
  if(n.cores>1) {
      # no longer use mclapply
      bplapply(... , BPPARAM = MulticoreParam(workers = n.cores))
  } else { # fall back on lapply
    lapply(...);
  }
}

or just add a print or cat.

@FloWuenne
Copy link
Author

I tried running knn.error.models and it seems to calculate error models correctly when using

linear.fit=TRUE

When setting linear fit to false, it will still give the same error.

Error in .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,  :
  "dqrls" not resolved from current namespace (scde)

Using the papply code right in front of calculating error models seem to fix the multicore issue still gives me this error after execution for the code for 1 comparison group in my data. (I am iterating over clusters from my single cells and doing DE for two groups in each cluster).

models:
Error: 'bplapply' receive data failed:
  error reading from connection

@FloWuenne
Copy link
Author

Sorry, so the multicore module still fails for me at this point. I think I am using it wrong. How exactly do I have to implement the papply solution?

I actually need the multicore since single core is really slow when comparing hundreds of cells. Any other tips for speed improvement?

@FloWuenne
Copy link
Author

So I have been trying around with this a bit now and have not found a solution. Single core on my cells takes way too long. The error seems to come from bplapply for me but none of the modes in the papply function seems to work on our cluster. Any suggestions?!

@JEFworks
Copy link
Member

papply is just a wrapper function to call either bplapply or mclapply. So it seems like your cluster may be having trouble with either or both of those functions. Unfortunately I have not been able to reproduce this error, which makes helping you debug more challenging.

Can you please try a simple test:

## regular non-parallelized lapply
start_time <- Sys.time() 
lapply(1:10, function(x) { Sys.sleep(1) }) 
end_time <- Sys.time() 
t1 <- end_time - start_time 

This should take about 10 seconds.

## mclapply
start_time <- Sys.time() 
require(parallel) 
mclapply(1:10, function(x) { Sys.sleep(1) }, mc.cores=10) 
end_time <- Sys.time() 
t2 <- end_time - start_time 

This should take less than 10 seconds, though there's time associated with forking so it should be more than 1 second. If you want to benchmark the time spent for forking, you can try:

start_time <- Sys.time() 
require(parallel) 
mclapply(1:10, function(x) { }, mc.cores=10) 
end_time <- Sys.time() 
t3 <- end_time - start_time 

And the difference between t3 and t2 should be about 1 second if your mclapply is working properly.

And lastly:

## bplapply
start_time <- Sys.time() 
require(BiocParallel) 
bplapply(1:10, function(x) { Sys.sleep(1) }, BPPARAM = MulticoreParam(workers = 10)) 
end_time <- Sys.time() 
t4 <- end_time - start_time 

If these are failing on your cluster, then it is an issue with the parallelization.

@FloWuenne
Copy link
Author

Just ran the tests and it seemed to execute all of them without any problems...

t1
Time difference of 10.03305 secs
t2
Time difference of 1.132771 secs
t3
Time difference of 0.1887727 secs
t4
Time difference of 1.676646 secs

So seems its not a problem with these packages? Do I need to specify BiocParallel or parralel when running scde? I was guessing they are dependencies and are automatically loaded?

@JEFworks
Copy link
Member

They should be automatically loaded.

Based on the error you saw previously:

models:
Error: 'bplapply' receive data failed:
  error reading from connection

it sounds like this could be a memory issue ("*** caught segfault *** address 0xf2d1, cause 'memory not mapped'" is a more common error message for really large datasets). Does the error reproduce for the example data provided with the package or just your own data?

@hummuscience
Copy link

I might be having a similar problem as I am getting the following error when running scde.error.models:

Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
> 
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal

I will try out the different apply functions and let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants