Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray SGD] LightningModule integration + MNIST Example #11042

Merged
merged 55 commits into from
Oct 2, 2020

Conversation

amogkam
Copy link
Contributor

@amogkam amogkam commented Sep 26, 2020

Docs to come later

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good; good work!

can you add some unit tests and address comments?

@sumanthratna
Copy link
Member

amazing!! I haven't had a chance to pull and try this out but @amogkam do you know if the following work with PTL x RaySGD?

  • mixed precision
  • batch size and lr finding (trainer.tune(model))

we might also want to add a sentence describing whether any functionality is lost to the docs PR

@amogkam
Copy link
Contributor Author

amogkam commented Sep 28, 2020

@sumanthratna Thanks for the comment- let me know how it goes if you plan to try it out!

This will have all the same functionality as standard Ray SGD so there is mixed precision support with Nvidia Apex. But Pytorch native mixed precision, auto batch size tuning, and auto lr rate tuning are not supported yet. These are probably good next steps to work on.

And yep, I have a list of all features that we currently don't support and will add it to the docs :)

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 30, 2020
Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good; ping when tests pass?

@amogkam
Copy link
Contributor Author

amogkam commented Oct 2, 2020

@richardliaw travis passing, should be ready to merge

@amogkam amogkam added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Oct 2, 2020
@richardliaw richardliaw merged commit 874da9a into ray-project:master Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants