Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'configureMlr' can not work with the parallel computing #504

Closed
Seager1989 opened this issue Jan 3, 2021 · 4 comments
Closed

The 'configureMlr' can not work with the parallel computing #504

Seager1989 opened this issue Jan 3, 2021 · 4 comments

Comments

@Seager1989
Copy link

I met the error before as using the 'classif.gausspr' as mentioned in #501. Thanks to the help of @jakob-r , this error was fixed by setting configureMlr(on.learner.error = "warn"). to make the code running.

Now, I met a new problem. As I implement parallel computing by,
parallelStartSocket(4) ,
the error can not be neglected. The code stopped after evaluating the initial DOE with errors similar to the following one again.
00002: Error in if (err < tol) break : missing value where TRUE/FALSE needed

Is there a way to make the configureMlr work with parallel computing? Thank you

@Seager1989
Copy link
Author

Seager1989 commented Jan 3, 2021

I found one possible solution is set the level of parallel computing as resample by,

parallelStartSocket(cpus=5,level="mlr.resample").

The default option should be mlrMBO.propose.points, which can result in the above problem.

I also tried other options, which may call multiple cores but can not assign the most computational expensive CV training process to the cores. In this way, the resample level parallel may be the best choice for the CV based hyperparameter tuning.

This is a simple analysis. It is appreciated if someone has any suggestions for implementing mlrMBO.propose.points with the configuremlr(on.learner.error = "warn"). Thanks

@jakob-r
Copy link
Sponsor Member

jakob-r commented Jan 4, 2021

The following is not a solution to your problem, but a general hint:

If you want to speed up the optimization through parallelization it is advisable to parallelize the evaluation of the black box (i.e. the train/test-resampling) rather than proposing multiple points because two sequential proposals are generally worth more than tow parallel proposals. Why? Because the second sequential point has been proposed with the knowledge of the first point whereas the second parallel point is generated from the same knowledge as the first.

@Seager1989
Copy link
Author

Yes, Parallizing on the resample level seems good. However, does this means if I have a 5-folds CV, I can only use 5 cores simultaneously? Is there a method to take advantage more?

@jakob-r jakob-r closed this as completed Jan 29, 2021
@jakob-r
Copy link
Sponsor Member

jakob-r commented Jan 29, 2021

Closing, because this is not a bug in mlrMBO, but rather a problem in parallelMap and mlr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants