Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train] Hard-deprecate MosaicTrainer and remove SklearnTrainer #42814

Merged
merged 14 commits into from Jan 31, 2024

Conversation

justinvyu
Copy link
Contributor

Why are these changes needed?

MosaicTrainer is not needed, since it can be folded under TorchTrainer, similar to LightningTrainer and TransformersTrainer

SklearnTrainer is also not needed, since it does not provide any extra utility over just calling Tuner(sklearn_train_fn). The Ray Train version added an extra layer of abstraction that is not needed over calling the underlying sklearn methods. There was no multi-worker training happening.

TODO

  • Finish writing the migration issue.

Notes

  • Mosaic Composer does not actually work with Ray Train as of 2.9.1 due to a signal handler it tries to register that must run on the main thread. Ray Train currently creates a worker thread to execute the training logic. This is because Ray Train implements pause-able training execution for each training worker by communicating between a main thread and a worker thread through ray.train.report.

Related issue number

Closes #32732

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
"`ray.train.mosaic.MosaicTrainer` is deprecated. "
"Use `ray.train.torch.TorchTrainer` instead. "
"See this issue for a migration example: "
"https://github.com/ray-project/ray/issues/42257"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be able to have a migration example because of the threading issue though, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was planning to just keep it as "not possible for now", then update the issue later. Or should I just remove the migration link for Mosaic trainer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah we can have a GH Issue that says it's not supported right now (the current one points to sklearn)

}
estimator.set_params(**cpu_params)


@PublicAPI(stability="alpha")
class SklearnPredictor(Predictor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need SklearnPredictor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have all the predictors around but not shown in docs. Maybe we can remove them all at some point at once?

@woshiyyya
Copy link
Member

Does the main thread refer to the driver and all Ray actors methods are child threads? Seems that the signal handling issue is unresolvable in current Ray Train design.

@justinvyu
Copy link
Contributor Author

justinvyu commented Jan 30, 2024

@woshiyyya It refers to the main thread of a single worker actor that is processing the result queue from the training thread:

Seems that the signal handling issue is unresolvable in current Ray Train design.

Yep this is unresolvable unless we get rid of this threading logic (which is an implementation detail for implementing "yielding" behavior, that can be done with Ray Generators in the future).

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@woshiyyya
Copy link
Member

woshiyyya commented Jan 30, 2024

Got it. Signal handling is a missing feature that we need to support it in the future. User can save checkpoint on exception before the actors get killed.

"`ray.train.mosaic.MosaicTrainer` is deprecated. "
"Use `ray.train.torch.TorchTrainer` instead. "
"See this issue for a migration example: "
"https://github.com/ray-project/ray/issues/42257"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah we can have a GH Issue that says it's not supported right now (the current one points to sklearn)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu merged commit c13f233 into ray-project:master Jan 31, 2024
9 checks passed
@justinvyu justinvyu deleted the deprecate_sklearn_trainer branch January 31, 2024 23:51
shrekris-anyscale pushed a commit to shrekris-anyscale/ray that referenced this pull request Feb 1, 2024
…ay-project#42814)

This PR removes some already deprecated APIs to reduce the library surface area and remove unused/unnecessary components. (`MosaicTrainer` can be folded into `TorchTrainer`, and `SklearnTrainer` doesn't provide any value over using Tune with your own training loop.)

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
tterrysun pushed a commit to tterrysun/ray that referenced this pull request Feb 14, 2024
…ay-project#42814)

This PR removes some already deprecated APIs to reduce the library surface area and remove unused/unnecessary components. (`MosaicTrainer` can be folded into `TorchTrainer`, and `SklearnTrainer` doesn't provide any value over using Tune with your own training loop.)

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: tterrysun <terry@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AIR][Train] Multilabel classification with SklearnTrainer
3 participants