[train][2.7][1/n] cherry-picks for documentations, tests, examples #39105

matthewdeng · 2023-08-30T04:01:22Z

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ject#38938) Signed-off-by: Matthew Deng <matt@anyscale.com>

…amples (Python 3.7)` (ray-project#38923) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Signed-off-by: Matthew Deng <matt@anyscale.com>

…ding Ray AIR examples)` (ray-project#38940) Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Matthew Deng <matt@anyscale.com> Co-authored-by: Matthew Deng <matt@anyscale.com>

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…#38918) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…y-project#38905) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…rain Integration GPU Tests and Examples (ray-project#38910) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com>

…python: Lightning 2.0 Train GPU tests (ray-project#38903) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

) This PR re-introduces support for ray storage ray.init(storage="s3://...") and fixes a broken tune controller test. Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…earn` trainers, checkpoints + tests (ray-project#38959) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…ay-project#38915) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…ts (ray-project#38965) Signed-off-by: Matthew Deng <matt@anyscale.com>

…t#38932) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…ay-project#38895) Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: matthewdeng <matt@anyscale.com>

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…roject#39020) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Signed-off-by: Matthew Deng <matt@anyscale.com>

Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matt@anyscale.com>

…essors (ray-project#38701)

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

….6. (ray-project#38794) Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

Fixes multinode tests by using the new train.report() API. Signed-off-by: Kai Fricke <kai@anyscale.com>

@justinvyu

The new storage path does not create "empty" checkpoints per default anymore. Previously, when no checkpoint is saved, PAUSEing a trial would create a dummy checkpoint that only contains trial metadata (such as the iteration number). This is not the case anymore. Examples now have to implement checkpointing to properly restore previous state. This was also true previously - but some of our simple examples (e.g. the one in this PR) didn't implement it and still "worked". I think it's fine to keep the functionality as is and require our examples to show checkpointing implementations. This will ensure that users don't shoot their feet trying to use e.g. BOHB. Separately, BOHB was malfunctioning as trials were repeatedly PAUSED and restarted as they've never been removed from `bracket.trials_to_unpause`. @justinvyu mentioned this in the review where it was introduced and I believed at the time it wasn't necessary - turns out it is, as we can end up in a situation where a bracket is never finished because trials are constantly running. This was not caught by any tests. We should add one in a follow-up - for now we can proceed with this PR to pick onto Ray 2.7. Signed-off-by: Kai Fricke <kai@anyscale.com>

Signed-off-by: Yunxuan Xiao <yunxuanx@anyscale.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

…t#39023) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Signed-off-by: Matthew Deng <matt@anyscale.com>

This PR fixes rllib-related tests that didn't pass changes related to the new storage context. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: matthewdeng <matt@anyscale.com> Co-authored-by: matthewdeng <matt@anyscale.com>

…ium)` (ray-project#39081) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

GeneDer · 2023-08-30T04:06:19Z

@matthewdeng DCO are failing :)

zhe-thoughts

This qualifies for picking

Leaving to @GeneDer to make sure tests pass before picking. Thanks!

matthewdeng and others added 30 commits August 29, 2023 20:48

[train] enable new persistence mode for core and serve tests (ray-pro…

8794357

…ject#38938) Signed-off-by: Matthew Deng <matt@anyscale.com>

[train] New persistence mode: Update 🐠 `ML Libraries w/ Ray Client Ex…

7fbecff

…amples (Python 3.7)` (ray-project#38923) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[train] remove non-URI assertion (ray-project#38944)

a0b00c1

Signed-off-by: Matthew Deng <matt@anyscale.com>

[train] New persistence mode: Update 📖 `Doc tests and examples (exclu…

4ff2081

…ding Ray AIR examples)` (ray-project#38940) Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Matthew Deng <matt@anyscale.com> Co-authored-by: Matthew Deng <matt@anyscale.com>

disable legacy sync config logic in trainable (ray-project#38952)

d1d8639

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[2.7 CI][New Persistent Mode][6/n] 📖 ✈️ Ray AIR examples (ray-project…

b0d4331

…#38918) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

[2.7 CI][New Persistent Mode][2/n] 📺 📖 Doc GPU tests and examples (ra…

7b31ebd

…y-project#38905) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

[2.7 CI][New Persistent Mode][1/n] 📺 ✈️ AIR GPU tests (ray/air) & ⚡ :…

0e1d7f6

…python: Lightning 2.0 Train GPU tests (ray-project#38903) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

[train] Fix broken tune tests and support ray storage (ray-project#38950

ab5eb6c

) This PR re-introduces support for ray storage ray.init(storage="s3://...") and fixes a broken tune controller test. Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[train] New persistence mode: Finish migrating xgb, lgbm and `skl…

aee365f

…earn` trainers, checkpoints + tests (ray-project#38959) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[2.7 CI][New Persistent Mode][5/n] 📖 Doc examples for external code (r…

60025fd

…ay-project#38915) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

[train][rllib] temporarily disable new persistence mode for rllib tes…

efc4cb8

…ts (ray-project#38965) Signed-off-by: Matthew Deng <matt@anyscale.com>

[2.7 CI][New Persistent Mode][8/n] ✈️ AIR tests (ray/air) (ray-projec…

ae4e47e

…t#38932) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

[tune] Storage: 🐙 🧠 Tune tests and examples {using RLlib} migration (r…

ba0c946

…ay-project#38895) Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: matthewdeng <matt@anyscale.com>

[train] Fix MosaicTrainer example and unit test (ray-project#38970)

d8a7f0e

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[air/release] Fix dreambooth example image preprocessing logic (ray-p…

45e8ee1

…roject#39020) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[train] clean up ray.train._checkpoint imports (ray-project#38951)

05e78e9

Signed-off-by: Matthew Deng <matt@anyscale.com>

[train] high level cleanup of Ray Train docs (ray-project#38971)

c025962

Signed-off-by: Matthew Deng <matt@anyscale.com>

[wip][docs] update FrameworkPredictor examples (ray-project#38634)

e0225a7

Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matt@anyscale.com>

[train] Add documentation for using metadata argument to save preproc…

24fd3d5

…essors (ray-project#38701)

[Train] Restructure Ray Train Example Page (ray-project#38814)

43e8db1

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

[air] Deprecate some fields/classes that are supposed to be gone in 2…

d0ac3df

….6. (ray-project#38794) Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

[tune/storage] Fix Tune multinode tests (ray-project#39050)

d29266b

Fixes multinode tests by using the new train.report() API. Signed-off-by: Kai Fricke <kai@anyscale.com>

[Release Test] Fix long_running_horovod_tune_test. (ray-project#39012)

dcf2e2b

Signed-off-by: Yunxuan Xiao <yunxuanx@anyscale.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

[train] New persistence mode: StorageContext unit tests (ray-projec…

a0ec2ad

…t#39023) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

[train] enable train + tune tests and examples (ray-project#39021)

840986d

Signed-off-by: Matthew Deng <matt@anyscale.com>

[train] New persistence mode: Migrate 🐙 `Tune tests and examples (med…

926b8b1

…ium)` (ray-project#39081) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

GeneDer added release-blocker P0 Issue that blocks the release v2.7.0-pick labels Aug 30, 2023

zhe-thoughts approved these changes Aug 30, 2023

View reviewed changes

GeneDer marked this pull request as ready for review August 30, 2023 06:28

GeneDer requested review from richardliaw, gjoliver, krfricke, xwjiang2010, amogkam, Yard1, maxpumperla, a team, sven1977, avnishn, ArturNiederfahrenhorst, smorad, kouroshHakha, ericl, scv119, c21, scottjlee, bveeramani, raulchen, justinvyu and sofianhnaide as code owners August 30, 2023 06:28

GeneDer merged commit 9e71973 into ray-project:releases/2.7.0 Aug 30, 2023
111 of 122 checks passed

matthewdeng mentioned this pull request Aug 30, 2023

[pick] [train] Add documentation for using metadata argument to save preprocessors #38701 #39037

Closed

matthewdeng deleted the train-2.7 branch September 1, 2023 04:31

matthewdeng changed the title ~~[train][2.7] cherry-picks for documentations, tests, examples~~ [train][2.7][1/n] cherry-picks for documentations, tests, examples Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train][2.7][1/n] cherry-picks for documentations, tests, examples #39105

[train][2.7][1/n] cherry-picks for documentations, tests, examples #39105

matthewdeng commented Aug 30, 2023

GeneDer commented Aug 30, 2023

zhe-thoughts left a comment

[train][2.7][1/n] cherry-picks for documentations, tests, examples #39105

[train][2.7][1/n] cherry-picks for documentations, tests, examples #39105

Conversation

matthewdeng commented Aug 30, 2023

Why are these changes needed?

Related issue number

Checks

GeneDer commented Aug 30, 2023

zhe-thoughts left a comment

Choose a reason for hiding this comment