Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

No parallelization in tune.spls #214

Closed
EmileFaure opened this issue May 3, 2022 · 2 comments 路 Fixed by #218
Closed

No parallelization in tune.spls #214

EmileFaure opened this issue May 3, 2022 · 2 comments 路 Fixed by #218
Assignees
Labels
bug Something isn't working

Comments

@EmileFaure
Copy link


馃悶 Describe the bug:

tune.spls is not using multicore processing even with BPPARAM set correctly.


馃攳 reprex results from reproducible example including sessioninfo():

Monitoring of this code from the package vignette:

param = MulticoreParam(workers=5)
bpstart(param)
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
set.seed(42)
tune.res = tune.spls( X, Y, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      test.keepY = c(3, 6, 8), measure = "cor",
                      folds = 5, nrepeat = 3, progressBar = TRUE, BPPARAM = param)

--> Only one CPU is used

tune.res = tune.spca( X, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      folds = 5, nrepeat = 3, BPPARAM = param)

--> Multiple CPUs are used in parallel.

Session info:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mixOmics_6.18.1     ggplot2_3.3.5       lattice_0.20-45     MASS_7.3-57        
[5] BiocParallel_1.28.3

loaded via a namespace (and not attached):
 [1] ggrepel_0.9.1        Rcpp_1.0.8.3         lubridate_1.8.0      tidyr_1.2.0         
 [5] corpcor_1.6.10       listenv_0.8.0        class_7.3-20         snow_0.4-4          
 [9] assertthat_0.2.1     digest_0.6.29        ipred_0.9-12         foreach_1.5.2       
[13] utf8_1.2.2           RSpectra_0.16-1      parallelly_1.31.1    R6_2.5.1            
[17] plyr_1.8.7           hardhat_0.2.0        stats4_4.1.2         ellipse_0.4.2       
[21] pillar_1.7.0         rlang_1.0.2          caret_6.0-92         rstudioapi_0.13     
[25] data.table_1.14.2    rpart_4.1.16         Matrix_1.4-1         rARPACK_0.11-0      
[29] splines_4.1.2        gower_1.0.0          stringr_1.4.0        igraph_1.3.1        
[33] munsell_0.5.0        tinytex_0.38         compiler_4.1.2       xfun_0.30           
[37] pkgconfig_2.0.3      globals_0.14.0       nnet_7.3-17          tidyselect_1.1.2    
[41] gridExtra_2.3        tibble_3.1.6         prodlim_2019.11.13   matrixStats_0.62.0  
[45] codetools_0.2-18     fansi_1.0.3          future_1.25.0        crayon_1.5.1        
[49] dplyr_1.0.9          withr_2.5.0          recipes_0.2.0        ModelMetrics_1.2.2.2
[53] grid_4.1.2           nlme_3.1-157         gtable_0.3.0         lifecycle_1.0.1     
[57] DBI_1.1.2            magrittr_2.0.3       pROC_1.18.0          scales_1.2.0        
[61] future.apply_1.9.0   cli_3.3.0            stringi_1.7.6        reshape2_1.4.4      
[65] timeDate_3043.102    ellipsis_0.3.2       generics_0.1.2       vctrs_0.4.1         
[69] lava_1.6.10          RColorBrewer_1.1-3   iterators_1.0.14     tools_4.1.2         
[73] glue_1.6.2           purrr_0.3.4          parallel_4.1.2       survival_3.3-1      
[77] colorspace_2.0-3     BiocManager_1.30.17 

馃 Expected behavior:

tune.spls should be parallelized over the workers registered in the BiocParallel environment.


馃挕 Possible solution:

In source code of tune.spca, the call of iterations over the keeping of X is made through bplapply:
test.keepX.cors <- bplapply(X=all.keepX, FUN=iter_keepX, BPPARAM = BPPARAM)

In the one of tune.spls, there seem to be no use of any of the basic functions of BiocParallel. It looks to me as the presence of the BiocParallel backend is only playing a role on whether the progress bar is printed or not.
If I'm wrong, do you have any idea of why this could happen on my system ?
Thank you very much !

@Max-Bladen
Copy link
Collaborator

Thank you for your post @EmileFaure. What you've noted is totally correct. After I spent a bit of time exploring the source code, I've realised that this issue a bit larger than just a lack of parallelisation in tune.spls(). If you're interested, feel free to read the information in Issue #216.

As far as it should concern you, I will attempt to rectify the lack of any parallelisation in tune.spls() prior to any other fixes/enhancements.

@Max-Bladen Max-Bladen added the wip work-in-progress label May 15, 2022
Max-Bladen added a commit that referenced this issue May 16, 2022
fix: implemented BiocParallel functionality into `tune.spls()` via restructuring of the main portion of the function.

bplapply() iterates over all pairs of input keepX/Y so only reduces runtime for long lists of test.keepX/Y
Max-Bladen added a commit that referenced this issue May 16, 2022
refactor: previous PR left duplicate function used for development. this is rectified and more comments are added
Max-Bladen added a commit that referenced this issue May 16, 2022
test: added test cases for `tune.spls()`. These are just placeholders as more robust testing criteria needs to be established as well as this document will be fully updated as part of PR #206
@Max-Bladen Max-Bladen linked a pull request May 17, 2022 that will close this issue
@Max-Bladen Max-Bladen moved this from In Progress to Awaiting Review in mixOmics-development May 17, 2022
@Max-Bladen Max-Bladen removed the wip work-in-progress label May 17, 2022
@Max-Bladen
Copy link
Collaborator

G'day @EmileFaure. Thought I'd let you know that I've implemented a fix for this issue and that it is accessible on branch issue-214. Please let me know if there are any issues you run into

@Max-Bladen Max-Bladen reopened this May 17, 2022
@Max-Bladen Max-Bladen moved this from Awaiting Review to Implemented Locally in mixOmics-development Jun 8, 2022
@Max-Bladen Max-Bladen moved this from Implemented Locally to In Progress in mixOmics-development Jun 8, 2022
@Max-Bladen Max-Bladen added the wip work-in-progress label Jun 8, 2022
@Max-Bladen Max-Bladen moved this from In Progress to Awaiting Assistance in mixOmics-development Jun 27, 2022
@Max-Bladen Max-Bladen removed the wip work-in-progress label Aug 29, 2022
@Max-Bladen Max-Bladen moved this from Awaiting Assistance to Awaiting feedback in mixOmics-development Aug 30, 2022
@Max-Bladen Max-Bladen moved this from Awaiting feedback to Awaiting Assistance in mixOmics-development Aug 30, 2022
Max-Bladen added a commit that referenced this issue Dec 13, 2022
tests: updated tests to assess actual values alongside RNG control of parallel function calls.
Max-Bladen added a commit that referenced this issue Dec 13, 2022
fix: implemented BiocParallel functionality into `tune.spls()` via restructuring of the main portion of the function.

bplapply() iterates over all pairs of input keepX/Y so only reduces runtime for long lists of test.keepX/Y
Max-Bladen added a commit that referenced this issue Dec 13, 2022
refactor: previous PR left duplicate function used for development. this is rectified and more comments are added
Max-Bladen added a commit that referenced this issue Dec 13, 2022
test: added test cases for `tune.spls()`. These are just placeholders as more robust testing criteria needs to be established as well as this document will be fully updated as part of PR #206
Max-Bladen added a commit that referenced this issue Dec 13, 2022
tests: updated tests to assess actual values alongside RNG control of parallel function calls.
@Max-Bladen Max-Bladen added the ready-to-review for all PRs that are ready to be reviewed. including complex, larger commits label Dec 13, 2022
@Max-Bladen Max-Bladen moved this from Awaiting Assistance to In Progress in mixOmics-development Dec 13, 2022
Max-Bladen added a commit that referenced this issue Dec 13, 2022
docs: updated `tune.Rd` file with cpus -> BPPARAM shift
Max-Bladen added a commit that referenced this issue Dec 13, 2022
tests: bring in line with PR #282
Max-Bladen added a commit that referenced this issue Dec 13, 2022
tests: bring in line with PR #282
Max-Bladen added a commit that referenced this issue Dec 13, 2022
fix: implemented BiocParallel functionality into `tune.spls()` via restructuring of the main portion of the function.

bplapply() iterates over all pairs of input keepX/Y so only reduces runtime for long lists of test.keepX/Y
Max-Bladen added a commit that referenced this issue Dec 13, 2022
refactor: previous PR left duplicate function used for development. this is rectified and more comments are added
Max-Bladen added a commit that referenced this issue Dec 13, 2022
test: added test cases for `tune.spls()`. These are just placeholders as more robust testing criteria needs to be established as well as this document will be fully updated as part of PR #206
Max-Bladen added a commit that referenced this issue Dec 13, 2022
tests: updated tests to assess actual values alongside RNG control of parallel function calls.
Max-Bladen added a commit that referenced this issue Dec 13, 2022
docs: updated `tune.Rd` file with cpus -> BPPARAM shift
Max-Bladen added a commit that referenced this issue Dec 13, 2022
fix: implemented BiocParallel functionality into `tune.spls()` via restructuring of the main portion of the function. bplapply() iterates over all pairs of input keepX/Y so only reduces runtime for long lists of test.keepX/Y

refactor: previous PR left duplicate function used for development. this is rectified and more comments are added

test: added test cases for `tune.spls()`. These are just placeholders as more robust testing criteria needs to be established as well as this document will be fully updated as part of PR #206

tests: updated tests to assess actual values alongside RNG control of parallel function calls.

refactor: replaced all calls to `cpus` to `BPPARAM` in `tune()`

docs: updated `tune.Rd` file with cpus -> BPPARAM shift
@Max-Bladen Max-Bladen removed the ready-to-review for all PRs that are ready to be reviewed. including complex, larger commits label Dec 15, 2022
@Max-Bladen Max-Bladen moved this from In Progress to Merged in mixOmics-development Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

3 participants