Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] Add batch predictor class #23808

Merged
merged 11 commits into from
Apr 13, 2022
Merged

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Apr 8, 2022

Why are these changes needed?

What: This class adds a generic BatchPredictor class that offers an interface to run batch inference on Ray datasets. It takes a Predictor class and checkpoint as an input, and provides a predict(dataset) method to run scalable scoring inference.

Why: Currently users have to implement scorers themselves. This is mostly boilerplate and prone to errors, so we should provide a simple solution instead.

Note that this predictor also implements the Predictor interface.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Yard1
Copy link
Member

Yard1 commented Apr 9, 2022

I don't really like the score nomenclature here. Scoring implies that you will receive a score - a single value. Can we do BatchPredictor, or something similar?

batch_size: Split dataset into batches of this size for prediction.
max_scoring_actors: If set, specify the maximum number of scoring actors.
ray_remote_args: Additional resource requirements to request from
ray (e.g., num_gpus=1 to request GPUs for the map tasks).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if num_gpus should be a top-level arg, since GPU-based batch inference is pretty fundamental. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should be easily accessible. I've added it to the top level API

scorer = BatchScorer(DummyPredictor, Checkpoint.from_dict({"factor": 2.0}))

test_dataset = ray.data.from_items([1.0, 2.0, 3.0, 4.0])
assert scorer.score(test_dataset).to_pandas().to_numpy().squeeze().tolist() == [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If looking for a more concise conversion + comparison, content should be assertable without this chain, something like:

assert scorer.score(test_dataset).take() == [2.0, 4.0, 6.0, 8.0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to

E       AssertionError: assert [{'value': 8.0}, {'value': 6.0}, {'value': 4.0}, {'value': 2.0}] == [2.0, 4.0, 6.0, 8.0]

how do I get them as a series and not dicts?

Also, the order problem seems to remain, any insights on that?

@krfricke
Copy link
Contributor Author

Regarding the naming, I don't think that "scoring" implies it's only 1 measure, but yeah, it's not ideal because we're not actually scoring a dataset, but perform inference on it.

I wouldn't want to go with BatchPredictor or anything *Predictor, as we do have a top-level Predictor interface and this wrapper does not implement it.

Maybe BatchInference?

@Yard1
Copy link
Member

Yard1 commented Apr 11, 2022

BatchInference is better, yeah.

@krfricke
Copy link
Contributor Author

Actually, we can go with BatchPredictor if we implement the Predictor interface - which works quite well here tbh. I've changed this in the last commit, let me know what you think

@krfricke krfricke marked this pull request as ready for review April 11, 2022 16:59
@krfricke krfricke changed the title [air/wip] Add batch scorer class [air/wip] Add batch predictor class Apr 11, 2022
@Yard1
Copy link
Member

Yard1 commented Apr 11, 2022

BatchPredictor is even better :)

python/ray/ml/predictor.py Outdated Show resolved Hide resolved
python/ray/ml/predictor.py Outdated Show resolved Hide resolved
python/ray/ml/predictor.py Outdated Show resolved Hide resolved
python/ray/ml/predictor.py Outdated Show resolved Hide resolved
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 11, 2022
@krfricke krfricke removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 12, 2022
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 12, 2022
@@ -89,6 +89,9 @@ Predictors
.. autoclass:: ray.ml.predictor.Predictor
:members:

.. autoclass:: ray.ml.batch_predictor.BatchPredictor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@krfricke krfricke merged commit 40d3a62 into ray-project:master Apr 13, 2022
@krfricke krfricke deleted the air/scorer branch April 13, 2022 07:58
@amogkam amogkam changed the title [air/wip] Add batch predictor class [air] Add batch predictor class Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants