[2/n] Lightweight Ray AIR API refactor #37123

pcmoritz · 2023-07-05T22:04:06Z

Why are these changes needed?

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/

Continuation of #36706 and #37906

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

doc/source/train/dl_guide.rst

matthewdeng · 2023-08-02T01:50:56Z

python/ray/train/base_trainer.py

@@ -701,7 +700,7 @@ def train_func(config):
            # Instantiate new Trainer in Trainable.
            trainer = trainer_cls(**config)

-            # Get the checkpoint from the Tune session, and use it to initialize
+            # Get the checkpoint from the train context, and use it to initialize


Should we be updating the code (under this comment) to use the new API as well? Or is that for a separate PR?

python/ray/train/mosaic/_mosaic_utils.py

matthewdeng · 2023-08-02T01:56:59Z

python/ray/tune/experiment/experiment.py

@@ -222,15 +222,15 @@ def __init__(
                raise ValueError(
                    "'checkpoint_at_end' cannot be used with a function trainable. "
                    "You should include one last call to "
-                    "`ray.air.session.report(metrics=..., checkpoint=...)` at the end "
+                    "`ray.train.session.report(metrics=..., checkpoint=...)` at the end "


Suggested change

"`ray.train.session.report(metrics=..., checkpoint=...)` at the end "

"`ray.train.report(metrics=..., checkpoint=...)` at the end "

python/ray/tune/experiment/experiment.py

python/ray/tune/impl/tuner_internal.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Continuation of #37123

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/ Continuation of ray-project#36706 and ray-project#37906 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>

Continuation of ray-project#37123 Signed-off-by: NripeshN <nn2012@hw.ac.uk>

This PR removes some circularities in the Ray AIR import system so we can put the training related functions into `ray.train`. It introduces a training context and makes report, get_dataset_shard, Checkpoint, Result, and the following configs: - CheckpointConfig - DataConfig - FailureConfig - RunConfig - ScalingConfig available in `ray.train`. No user facing changes yet, the old APIs still work. Going forward, it will be most consistent / symmetrical if these things are included in the following way: ```python from ray import train, tune, serve # Pick the subset that is needed # Include what you need from the following: from ray.train import CheckpointConfig, DataConfig, FailureConfig, RunConfig, ScalingConfig # ... def train_func(): dataset_shard = train.get_dataset_shard("train") world_size = train.get_context().get_world_size() # ... train.report(...) trainer = train.torch.TorchTrainer( train_func, scaling_config=ScalingConfig(num_workers=2), ) result = trainer.fit() ``` We have many examples in ray-project#37123 on how this looks like in actual code. Signed-off-by: harborn <gangsheng.wu@intel.com>

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/ Continuation of ray-project#36706 and ray-project#37906 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: harborn <gangsheng.wu@intel.com>

Continuation of ray-project#37123 Signed-off-by: harborn <gangsheng.wu@intel.com>

This PR removes some circularities in the Ray AIR import system so we can put the training related functions into `ray.train`. It introduces a training context and makes report, get_dataset_shard, Checkpoint, Result, and the following configs: - CheckpointConfig - DataConfig - FailureConfig - RunConfig - ScalingConfig available in `ray.train`. No user facing changes yet, the old APIs still work. Going forward, it will be most consistent / symmetrical if these things are included in the following way: ```python from ray import train, tune, serve # Pick the subset that is needed # Include what you need from the following: from ray.train import CheckpointConfig, DataConfig, FailureConfig, RunConfig, ScalingConfig # ... def train_func(): dataset_shard = train.get_dataset_shard("train") world_size = train.get_context().get_world_size() # ... train.report(...) trainer = train.torch.TorchTrainer( train_func, scaling_config=ScalingConfig(num_workers=2), ) result = trainer.fit() ``` We have many examples in ray-project#37123 on how this looks like in actual code.

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/ Continuation of ray-project#36706 and ray-project#37906 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

Continuation of ray-project#37123

This PR removes some circularities in the Ray AIR import system so we can put the training related functions into `ray.train`. It introduces a training context and makes report, get_dataset_shard, Checkpoint, Result, and the following configs: - CheckpointConfig - DataConfig - FailureConfig - RunConfig - ScalingConfig available in `ray.train`. No user facing changes yet, the old APIs still work. Going forward, it will be most consistent / symmetrical if these things are included in the following way: ```python from ray import train, tune, serve # Pick the subset that is needed # Include what you need from the following: from ray.train import CheckpointConfig, DataConfig, FailureConfig, RunConfig, ScalingConfig # ... def train_func(): dataset_shard = train.get_dataset_shard("train") world_size = train.get_context().get_world_size() # ... train.report(...) trainer = train.torch.TorchTrainer( train_func, scaling_config=ScalingConfig(num_workers=2), ) result = trainer.fit() ``` We have many examples in ray-project#37123 on how this looks like in actual code. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/ Continuation of ray-project#36706 and ray-project#37906 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Continuation of ray-project#37123 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

This PR migrates all the train and tune examples and docstrings to the new API convention, see https://github.com/ray-project/enhancements/ Continuation of ray-project#36706 and ray-project#37906 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>

Continuation of ray-project#37123 Signed-off-by: Victor <vctr.y.m@example.com>

pcmoritz added 24 commits June 22, 2023 18:45

Lightweight Ray AIR API refactor

8d36b16

shuffle

6cfdd91

Merge branch 'master' into lightweight-ray-air-api-refactor

6f2f095

update API

6bdbe14

compat

9d74cc2

update

aa73331

move to internal

3009ae9

update

c26abb1

lint

74be132

update

6685d1f

update

9709afd

Merge branch 'master' into lightweight-ray-air-api-refactor

c525695

update

1e32b82

add context

2c21498

Merge branch 'master' into lightweight-ray-air-api-refactor

0b00002

fix

90dc9e4

fix errors in docstrings

561f74c

merge

dcf0332

update

995ac6a

fix

7e2dfe3

lint

4ba674e

update

d399bfa

lint

2b2c683

save

41173d5

pcmoritz requested review from richardliaw, gjoliver, krfricke, xwjiang2010, amogkam and matthewdeng as code owners July 5, 2023 22:04

fix

7fe6d8f

matthewdeng approved these changes Aug 2, 2023

View reviewed changes

pcmoritz and others added 11 commits August 2, 2023 03:01

Update doc/source/train/dl_guide.rst

f952741

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Update doc/source/train/dl_guide.rst

497f559

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Update python/ray/train/mosaic/_mosaic_utils.py

c41be34

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Update python/ray/tune/experiment/experiment.py

329715c

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Update python/ray/tune/impl/tuner_internal.py

9121156

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

Update python/ray/tune/impl/tuner_internal.py

b0040e2

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

update

8eee519

Merge branch 'master' into lightweight-ray-air-api-refactor-examples

6c821e2

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

lint

99757d4

Merge branch 'master' into lightweight-ray-air-api-refactor-examples

4386a8a

update

8ae82af

pcmoritz merged commit 7548902 into ray-project:master Aug 2, 2023
101 of 118 checks passed

pcmoritz deleted the lightweight-ray-air-api-refactor-examples branch August 2, 2023 22:04

pcmoritz mentioned this pull request Aug 12, 2023

[3/n] Lightweight Ray AIR API refactor #38379

Merged

8 tasks

pcmoritz added a commit that referenced this pull request Aug 14, 2023

[3/n] Lightweight Ray AIR API refactor (#38379)

d9dcc3f

Continuation of #37123

NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023

[3/n] Lightweight Ray AIR API refactor (ray-project#38379)

4602264

Continuation of ray-project#37123 Signed-off-by: NripeshN <nn2012@hw.ac.uk>

harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023

[3/n] Lightweight Ray AIR API refactor (ray-project#38379)

c386fd6

Continuation of ray-project#37123 Signed-off-by: harborn <gangsheng.wu@intel.com>

harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023

[3/n] Lightweight Ray AIR API refactor (ray-project#38379)

876382e

Continuation of ray-project#37123

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[3/n] Lightweight Ray AIR API refactor (ray-project#38379)

898ca4a

Continuation of ray-project#37123 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[3/n] Lightweight Ray AIR API refactor (ray-project#38379)

b0be324

Continuation of ray-project#37123 Signed-off-by: Victor <vctr.y.m@example.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/n] Lightweight Ray AIR API refactor #37123

[2/n] Lightweight Ray AIR API refactor #37123

pcmoritz commented Jul 5, 2023 •

edited

Loading

matthewdeng Aug 2, 2023

matthewdeng Aug 2, 2023

	"`ray.train.session.report(metrics=..., checkpoint=...)` at the end "
	"`ray.train.report(metrics=..., checkpoint=...)` at the end "

[2/n] Lightweight Ray AIR API refactor #37123

[2/n] Lightweight Ray AIR API refactor #37123

Conversation

pcmoritz commented Jul 5, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

matthewdeng Aug 2, 2023

Choose a reason for hiding this comment

matthewdeng Aug 2, 2023

Choose a reason for hiding this comment

pcmoritz commented Jul 5, 2023 •

edited

Loading