-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel GridSearch (both Cartesian and RandomDiscrete) hangs #7731
Comments
Jan Sterba commented: I see that you are reporting that the affected version is [3.30.1.3|https://h2oai.atlassian.net/issues/?jql=project%20%3D%20%22PUBDEV%22%20AND%20affectedVersion%20%3D%20%223.30.1.3%22], did you tried the latest 3.32.0.x? |
Igor Trpovski commented: I tried it right now and the same thing happens. |
Jan Sterba commented: thanks for the info, will look into it |
Jan Sterba commented: I have reproduced the issue, the work-around is to use lower-values of min-rows, the CV models are failing because of too high values of min_rows _min_rows: The dataset size is too small to split for min_rows=18.0: must have at least 36.0 (weighted) rows, but have only 30.0 as a work around use smaller values for min_rows I am working on a fix |
Igor Trpovski commented: Thanks for the info, so basically the problem is that when some CV models fail, parallel GridSearch can’t handle that and hangs, as opposed to sequential GridSearch that prints out the errors for failed models and finishes? |
Jan Sterba commented: yes exactly |
JIRA Issue Migration Info Jira Issue: PUBDEV-7914 Linked PRs from JIRA Attachments From Jira Attachment Name: parallel_grid_search_hangs_attachment.zip |
Running grid search with parallelism set to 0 or some value different than 1 always hangs. In other words, after some time the progress bar can reach 100% but the grid never finishes. It happens with both the Cartesian and RandomDiscrete strategies. Grid search finishes when running it with the default value for parallelism. Tried only with GBM model.
The text was updated successfully, but these errors were encountered: