Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Support class constructor args for map() and flat_map() #38606

Merged
merged 1 commit into from
Aug 18, 2023

Conversation

c21
Copy link
Contributor

@c21 c21 commented Aug 18, 2023

Why are these changes needed?

We get user request to support class constructor args for Dataset.map() and Dataset.flat_map(), similar to Dataset.map_batches(). No technical reason why we cannot support it, given they are got converted to same map function during execution. This PR is to add them.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Cheng Su <scnju13@gmail.com>
Comment on lines +181 to +189
result = ds.map_batches(
StatefulFnWithArgs,
compute=ray.data.ActorPoolStrategy(),
fn_args=(1,),
fn_kwargs={"kwarg": 2},
fn_constructor_args=(1,),
fn_constructor_kwargs={"kwarg": 2},
).take() == list(range(10))
).take()
assert sorted(extract_values("id", result)) == list(range(10)), result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bugfix for previous test.

@@ -286,6 +286,7 @@ def map(
fn: UserDefinedFunction[Dict[str, Any], Dict[str, Any]],
*,
compute: Optional[ComputeStrategy] = None,
fn_constructor_args: Optional[Iterable[Any]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also add fn_constructor_kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer us to do the minimal we need here. We can add later if there's ask.

raise ValueError(
"fn_constructor_args can only be specified if providing a "
f"CallableClass instance for fn, but got: {fn}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of the compute arg is too loose. This makes us need duplicated null checks.

It'd be better to split this into 2 steps.

  1. a parse_compute util function that converts Optional[Union[str, "ComputeStrategy"]] to just ComputeStrategy.
  2. Then check the fn and args.

And by doing this, compute can only be None or str in the Dataset API level. under the hood, it's always the concrete type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, we already have a get_compute() method.

Let me make a separate PR to do the refactoring, it would involve change AbstractUDFMap.compute and _plan_udf_map_op.

@c21 c21 merged commit 915df19 into ray-project:master Aug 18, 2023
54 of 56 checks passed
@c21 c21 deleted the flat-map branch August 18, 2023 23:50
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…roject#38606)

We get user request to support class constructor args for `Dataset.map()` and `Dataset.flat_map()`, similar to `Dataset.map_batches()`. No technical reason why we cannot support it, given they are got converted to same map function  during execution. This PR is to add them.

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…roject#38606)

We get user request to support class constructor args for `Dataset.map()` and `Dataset.flat_map()`, similar to `Dataset.map_batches()`. No technical reason why we cannot support it, given they are got converted to same map function  during execution. This PR is to add them.

Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants