Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Ludwig resilience to Ray Tune issues #1660

Merged
merged 2 commits into from
Jan 8, 2022

Conversation

amholler
Copy link
Collaborator

@amholler amholler commented Jan 7, 2022

*) A trial can intermittently fail in Ray Tune, e.g., ray-project/ray#21458
In general, it seems more resilient to specify the option to have Ray Tune retry failed trials once.

*) Ray Tune can intermittently fail to upload checkpoints, e.g., ray-project/ray#21469
In general, it seems more resilient to have the Ludwig post-search evaluation process warn on & skip
trials with missing checkpoints, to provide value for the completed search. Note that unless
the checkpoints from the best trial are missing, the lack of other checkpoints is not critical.

@github-actions
Copy link

github-actions bot commented Jan 7, 2022

Unit Test Results

       6 files  ±0         6 suites  ±0   2h 44m 33s ⏱️ + 8m 30s
1 216 tests ±0  1 192 ✔️ ±0  24 💤 ±0  0 ±0 
3 648 runs  ±0  3 576 ✔️ ±0  72 💤 ±0  0 ±0 

Results for commit dfba7c6. ± Comparison against base commit 577887f.

♻️ This comment has been updated with latest results.

collect_overall_stats=True,
return_type='dict',
debug=debug
if best_model_path is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally necessary for this PR, but the level of nesting here is a bit intimating. It might be worth abstracting out some of this logic into helper methods, or adding comments to describe stages of the overall flow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out, thanks! PR updated.

@justinxzhao justinxzhao merged commit 46a3ead into ludwig-ai:tf-legacy Jan 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants