-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release] Convert Horovod and SGD release tests #15999
[Release] Convert Horovod and SGD release tests #15999
Conversation
@richardliaw do you know how to get the symlink working for the horovod smoke test? |
@amogkam I took a pass, should work now hopefully |
ping when tests pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Have you run this via e2e.py
locally?
…n-release-conversion
# Conflicts: # release/horovod_tests/app_config.yaml # release/horovod_tests/horovod_tests.yaml # release/horovod_tests/workloads/horovod_test.py # release/long_running_distributed_tests/app_config.yaml # release/long_running_distributed_tests/cluster.yaml # release/long_running_distributed_tests/workloads/pytorch_pbt_failure.py
Hey @amogkam I resolved the conflicts and I think this should be good to go. Can you take one more look? |
Thanks a lot @krfricke! I'll look into the failing test. But we shouldn't merge this in yet until Horovod can be built on App configs. Once we do that I'll confirm that the release test works with e2e.py locally and then we can merge this. |
Test seems to pass fine on buildkite: https://buildkite.com/ray-project/periodic-ci/builds/198#_ Feel free to merge after fixing the unit test. Please ping me once you did so I can enable it in nightly release testing |
…o train-release-conversion
…n-release-conversion
Is this working / or ready to merge? |
I think we need to merge this first #16581, and then pull master again |
…version # Conflicts: # release/horovod_tests/horovod_tests.yaml
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.