Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REP] Refining the Ray AIR Surface API #36

Merged
merged 12 commits into from
Aug 7, 2023
Merged

Conversation

pcmoritz
Copy link
Collaborator

@pcmoritz pcmoritz commented Jul 9, 2023

This REP proposes to remove the ray.air namespace and put the functionality into the respective libraries Ray Data, Ray Serve and Ray Train.

reps/2023-07-08-air-surface-syntax.md Outdated Show resolved Hide resolved

## Open Questions

We are likely going to remove `PredictorWrapper` and `PredictorDeployment` and migrate the examples to use Ray Serve deployments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we flesh out an example of this migration in the REP? It may make it more clear the implications of this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will do that, great point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added now

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this proposal and only have detail questions left.

Just for completeness (as it's not mentioned in the REP). An alternative to resolving this is by moving everything into AIR instead. Thus we would have ray.air.training, ray.air.tuning, etc. In such a world we would soften the boundaries between the libraries and keep shared modules in the AIR namespace. By moving the libraries "one level down" we would enforce a separation from Ray Core and double down on AIR as an umbrella for our downstream libraries.

It also avoids the question of where to put which modules. Integrations and callbacks can quite naturally remain in ray.air, and both training and tuning modules can access them.

reps/2023-07-08-air-surface-syntax.md Outdated Show resolved Hide resolved
## Open Questions

We are likely going to remove `PredictorWrapper` and `PredictorDeployment` and migrate the examples to use Ray Serve deployments
direcly, and we are also likely going to move `air.integrations` to `train.integrations` and tentatively the predictors to `ray.train`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few more:

  • air.callbacks --> also train?
  • air.examples for cross-library examples --> docs? (related question: how do we restructure the docs? I guess out of scope for this REP...)
  • air.execution --> tentatively train?
  • air.util - looks like mostly data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing these up :)

air.callbacks has already been moved to air.integrations, which will move to train.integrations, right?

air.examples these should not be living in the source tree, so yes, docs would probably be the right place. I don't consider air.examples to be part of the AIR API -- on the docs: @richardliaw is owning the docs restructuring :)

air.execution these are internal APIs, so are not a concern for this REP. We should feel free to put them into some utility namespace of either train or tune, depending what makes the most sense going forward.

air.util the tensor extension stuff should go into ray.data, the torch stuff into ray.train.torch (I'm a bit confused on whether this is a public API or not -- it seems to be used in the OPT example at the moment? -- we should clarify that and handle it appropriately. Is there anything else here we need to decide about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to capture all these in the REP to make it more comprehensive, i.e. to describe what it takes to fully "Disband the ray.air namespace"?

@pcmoritz
Copy link
Collaborator Author

On your higher level comment: Not only would introducing ray.air.training, ray.air.tuning (and we would also need ray.air.serving) be a far higher amount of API changes, but it would also lead us to a worse outcome -- everything actually fits neatly into one of the libraries and there is really no need for a ray.air namespace. The libraries ray.train, ray.tune, ray.serve, ray.data and ray.rllib are already well separated from ray core, so I don't think there is danger of confusion here :)

reps/2023-07-08-air-surface-syntax.md Outdated Show resolved Hide resolved
reps/2023-07-08-air-surface-syntax.md Outdated Show resolved Hide resolved
## Open Questions

We are likely going to remove `PredictorWrapper` and `PredictorDeployment` and migrate the examples to use Ray Serve deployments
direcly, and we are also likely going to move `air.integrations` to `train.integrations` and tentatively the predictors to `ray.train`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to capture all these in the REP to make it more comprehensive, i.e. to describe what it takes to fully "Disband the ray.air namespace"?

@amogkam
Copy link
Contributor

amogkam commented Jul 12, 2023

The only concern is if we are ok with this type of usage for Ray Tune only use case?

from ray import tune
from ray import train

def objective(config):
    score = config["a"] ** 2 + config["b"]
    train.report({"score": score})

search_space = {
    "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
    "b": tune.choice([1, 2, 3]),
}

tuner = tune.Tuner(objective, param_space=search_space)

results = tuner.fit()
print(results.get_best_result(metric="score", mode="min").config)

This exposes the ray.train namespace to Tune only users...which may lend to confusion (why do I need to import Ray Train if I am just using Ray Tune?) This was one of the motivations for consolidating under ray.air, so just want to make sure we are considering this case.

@serve.deployment
class XGBoostService:
def __init__(self, checkpoint):
self.predictor = XGBoostPredictor.from_checkpoint(checkpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.predictor = XGBoostPredictor.from_checkpoint(checkpoint)
local_dir = checkpoint.to_directory()
self.model: xgboost.Booster = pickle.loads(local_dir + "/saved_booster.pkl")
``
Since we are de-emphasizing predictors, shall we also change this to show something like this instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates

pcmoritz added a commit to ray-project/ray that referenced this pull request Jul 29, 2023
@pcmoritz
Copy link
Collaborator Author

@amogkam We decided that it is ok for Tune to depend on Train, since tuning training runs is going to be the most important use case of Ray Tune and also that's how most machine learning engineers / practitioners think about the relation between training and tuning :)

Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and feels a lot cleaner!

pcmoritz and others added 12 commits July 31, 2023 16:47
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
@ericl
Copy link
Contributor

ericl commented Aug 4, 2023

@zhe-thoughts this can be merged

@zhe-thoughts zhe-thoughts merged commit abb378e into main Aug 7, 2023
1 check passed
@zhe-thoughts
Copy link
Collaborator

Thanks for the work @pcmoritz @ericl @krfricke @matthewdeng @amogkam

NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
…ect#37906)

This is bringing the API up-to-date with ray-project/enhancements#36

Signed-off-by: NripeshN <nn2012@hw.ac.uk>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…ect#37906)

This is bringing the API up-to-date with ray-project/enhancements#36

Signed-off-by: harborn <gangsheng.wu@intel.com>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…ect#37906)

This is bringing the API up-to-date with ray-project/enhancements#36

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…ect#37906)

This is bringing the API up-to-date with ray-project/enhancements#36

Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants