Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelisation is not working in perf() #292

Open
mvacher opened this issue Apr 6, 2023 · 0 comments
Open

Parallelisation is not working in perf() #292

mvacher opened this issue Apr 6, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mvacher
Copy link

mvacher commented Apr 6, 2023


馃悶 Describe the bug:

perf() is not using multicore processing even with BPPARAM set correctly.


馃攳 reprex results from reproducible example including sessioninfo():

library(mixOmics) 
library(dplyr)
library(BiocParallel)
library(microbenchmark)

## -------------------------------------------------------------------------------------------------------------------
data(breast.TCGA) # load in the data
data = list(miRNA = breast.TCGA$data.train$mirna, # set a list of all the X dataframes
            mRNA = breast.TCGA$data.train$mrna,
            proteomics = breast.TCGA$data.train$protein)

Y = breast.TCGA$data.train$subtype # set the response variable as the Y dataframe

## -------------------------------------------------------------------------------------------------------------------
design = matrix(0.1, ncol = length(data), 
                nrow = length(data), # for square matrix filled with 0.1s
                dimnames = list(names(data), names(data)))
diag(design) = 0 # set diagonal to 0s

basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design) # form basic DIABLO

## -------------------------------------------------------------------------------------------------------------------
# Benchmark
n_rep = 1
res <- list(
  "MulticoreParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                       folds = 10, nrepeat = 10,
                                       progressBar=FALSE,
                                       BPPARAM=MulticoreParam(workers = 10)), 
                                  times = n_rep), 
  "MulticoreParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                             folds = 10, nrepeat = 10,
                                             progressBar=FALSE,
                                             BPPARAM=MulticoreParam(workers = 5)), 
                                        times = n_rep),
  "MulticoreParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=MulticoreParam(workers = 2)), 
                                       times = n_rep),
  "SnowParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 10)), 
                                       times = n_rep),
  "SnowParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 5)), 
                                       times = n_rep),
  "SnowParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 2)), 
                                       times = n_rep),
  "SerialParam(1)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=SerialParam()), 
                                       times = n_rep))
bind_rows(res)

Results:

Unit: seconds
                                expr      min       lq     mean   median       uq
BPPARAM = MulticoreParam(workers = 10) 	25.17865 25.17865 25.17865 25.17865 25.17865
BPPARAM = MulticoreParam(workers = 5) 	25.37876 25.37876 25.37876 25.37876 25.37876
BPPARAM = MulticoreParam(workers = 2) 	25.19722 25.19722 25.19722 25.19722 25.19722
BPPARAM = SnowParam(workers = 10)) 		25.45244 25.45244 25.45244 25.45244 25.45244
BPPARAM = SnowParam(workers = 5)) 		25.81489 25.81489 25.81489 25.81489 25.81489
BPPARAM = SnowParam(workers = 2)) 		25.91184 25.91184 25.91184 25.91184 25.91184
BPPARAM = SerialParam()) 				25.55273 25.55273 25.55273 25.55273 25.55273

sessionInfo()

R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin22.3.0 (64-bit)
Running under: macOS Ventura 13.0

Matrix products: default
LAPACK: /opt/homebrew/Cellar/r/4.2.3/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.9 BiocParallel_1.32.5  dplyr_1.1.1          mixOmics_6.22.0      ggplot2_3.4.1        lattice_0.20-45        

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.10.0 tidyr_1.3.0           jsonlite_1.8.4        ellipse_0.4.4         stats4_4.2.3          yaml_2.3.7            ggrepel_0.9.3         corrplot_0.92        
 [9] pillar_1.9.0          glue_1.6.2            reticulate_1.28       digest_0.6.31         RColorBrewer_1.1-3    colorspace_2.1-0      cowplot_1.1.1         htmltools_0.5.5      
[17] Matrix_1.5-3          plyr_1.8.8            pkgconfig_2.0.3       pheatmap_1.0.12       dir.expiry_1.6.0      purrr_1.0.1           corpcor_1.6.10        scales_1.2.1         
[25] HDF5Array_1.26.0      RSpectra_0.16-1       Rtsne_0.16            tibble_3.2.1          generics_0.1.3        IRanges_2.32.0        withr_2.5.0           BiocGenerics_0.44.0  
[33] cli_3.6.1             magrittr_2.0.3        evaluate_0.20         fansi_1.0.4           forcats_1.0.0         tools_4.2.3           lifecycle_1.0.3       matrixStats_0.63.0   
[41] basilisk.utils_1.10.0 stringr_1.5.0         Rhdf5lib_1.20.0       S4Vectors_0.36.2      munsell_0.5.0         DelayedArray_0.24.0   compiler_4.2.3        rlang_1.1.0          
[49] rhdf5_2.42.0          grid_4.2.3            rhdf5filters_1.10.0   rstudioapi_0.14       igraph_1.4.1          rmarkdown_2.21        basilisk_1.10.2       gtable_0.3.3         
[57] codetools_0.2-19      rARPACK_0.11-0        reshape2_1.4.4        R6_2.5.1              gridExtra_2.3         knitr_1.42            fastmap_1.1.1         uwot_0.1.14          
[65] utf8_1.2.3            filelock_1.0.2        stringi_1.7.12        parallel_4.2.3        Rcpp_1.0.10           vctrs_0.6.1           png_0.1-8             tidyselect_1.2.0     
[73] xfun_0.38  

馃 Expected behavior:

Decreasing running time when using multiple cores.


馃挕 Possible solution:

Not sure, the example provided uses an block.splsda object because it is what I was interested in but looking at the code, it seems that the use of BiocParallel is not consistent across all the perf() variants.

The problem looks similar to what was previously reported about lack of parallelisation in the tune() function in #214

@mvacher mvacher added the bug Something isn't working label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants