Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release] Convert Horovod and SGD release tests #15999

Merged
merged 31 commits into from
Jun 24, 2021

Conversation

amogkam
Copy link
Contributor

@amogkam amogkam commented May 22, 2021

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@amogkam amogkam changed the title [Release] Convert long running distributed release tests [Release] Convert Horovod and SGD release tests May 22, 2021
python/ray/tune/BUILD Outdated Show resolved Hide resolved
@amogkam
Copy link
Contributor Author

amogkam commented May 24, 2021

@richardliaw do you know how to get the symlink working for the horovod smoke test?

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw
Copy link
Contributor

@amogkam I took a pass, should work now hopefully

@richardliaw
Copy link
Contributor

ping when tests pass?

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Have you run this via e2e.py locally?

@richardliaw
Copy link
Contributor

Hmm @amogkam @krfricke I wasn't quite sure how to resolve these conflicts given the #15913 just merged and seemed to have a bit of overlap.

@richardliaw richardliaw added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 1, 2021
Kai Fricke added 3 commits June 9, 2021 14:07
# Conflicts:
#	release/horovod_tests/app_config.yaml
#	release/horovod_tests/horovod_tests.yaml
#	release/horovod_tests/workloads/horovod_test.py
#	release/long_running_distributed_tests/app_config.yaml
#	release/long_running_distributed_tests/cluster.yaml
#	release/long_running_distributed_tests/workloads/pytorch_pbt_failure.py
@krfricke
Copy link
Contributor

krfricke commented Jun 9, 2021

Hey @amogkam I resolved the conflicts and I think this should be good to go. Can you take one more look?

@amogkam amogkam added do-not-merge Do not merge this PR! and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jun 9, 2021
@amogkam
Copy link
Contributor Author

amogkam commented Jun 9, 2021

Thanks a lot @krfricke! I'll look into the failing test. But we shouldn't merge this in yet until Horovod can be built on App configs. Once we do that I'll confirm that the release test works with e2e.py locally and then we can merge this.

@krfricke
Copy link
Contributor

krfricke commented Jun 18, 2021

Test seems to pass fine on buildkite: https://buildkite.com/ray-project/periodic-ci/builds/198#_

Feel free to merge after fixing the unit test. Please ping me once you did so I can enable it in nightly release testing

@richardliaw
Copy link
Contributor

richardliaw commented Jun 23, 2021

Is this working / or ready to merge?

@amogkam
Copy link
Contributor Author

amogkam commented Jun 23, 2021

I think we need to merge this first #16581, and then pull master again

…version

# Conflicts:
#	release/horovod_tests/horovod_tests.yaml
@krfricke krfricke merged commit 53d1636 into ray-project:master Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants