Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel GridSearch (both Cartesian and RandomDiscrete) hangs #7731

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 7 comments
Closed

Parallel GridSearch (both Cartesian and RandomDiscrete) hangs #7731

exalate-issue-sync bot opened this issue May 11, 2023 · 7 comments

Comments

@exalate-issue-sync
Copy link

Running grid search with parallelism set to 0 or some value different than 1 always hangs. In other words, after some time the progress bar can reach 100% but the grid never finishes. It happens with both the Cartesian and RandomDiscrete strategies. Grid search finishes when running it with the default value for parallelism. Tried only with GBM model.

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: I see that you are reporting that the affected version is [3.30.1.3|https://h2oai.atlassian.net/issues/?jql=project%20%3D%20%22PUBDEV%22%20AND%20affectedVersion%20%3D%20%223.30.1.3%22], did you tried the latest 3.32.0.x?

@exalate-issue-sync
Copy link
Author

Igor Trpovski commented: I tried it right now and the same thing happens.

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: thanks for the info, will look into it

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: I have reproduced the issue, the work-around is to use lower-values of min-rows, the CV models are failing because of too high values of min_rows

_min_rows: The dataset size is too small to split for min_rows=18.0: must have at least 36.0 (weighted) rows, but have only 30.0

as a work around use smaller values for min_rows I am working on a fix

@exalate-issue-sync
Copy link
Author

Igor Trpovski commented: Thanks for the info, so basically the problem is that when some CV models fail, parallel GridSearch can’t handle that and hangs, as opposed to sequential GridSearch that prints out the errors for failed models and finishes?

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: yes exactly

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7914
Assignee: Jan Sterba
Reporter: Igor Trpovski
State: Closed
Fix Version: 3.32.0.3
Attachments: Available (Count: 1)
Development PRs: Available

Linked PRs from JIRA

#5183

Attachments From Jira

Attachment Name: parallel_grid_search_hangs_attachment.zip
Attached By: Igor Trpovski
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7914/parallel_grid_search_hangs_attachment.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant