-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Support class constructor args for map() and flat_map() #38606
Conversation
Signed-off-by: Cheng Su <scnju13@gmail.com>
result = ds.map_batches( | ||
StatefulFnWithArgs, | ||
compute=ray.data.ActorPoolStrategy(), | ||
fn_args=(1,), | ||
fn_kwargs={"kwarg": 2}, | ||
fn_constructor_args=(1,), | ||
fn_constructor_kwargs={"kwarg": 2}, | ||
).take() == list(range(10)) | ||
).take() | ||
assert sorted(extract_values("id", result)) == list(range(10)), result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bugfix for previous test.
@@ -286,6 +286,7 @@ def map( | |||
fn: UserDefinedFunction[Dict[str, Any], Dict[str, Any]], | |||
*, | |||
compute: Optional[ComputeStrategy] = None, | |||
fn_constructor_args: Optional[Iterable[Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add fn_constructor_kwargs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer us to do the minimal we need here. We can add later if there's ask.
raise ValueError( | ||
"fn_constructor_args can only be specified if providing a " | ||
f"CallableClass instance for fn, but got: {fn}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of the compute arg is too loose. This makes us need duplicated null checks.
It'd be better to split this into 2 steps.
- a
parse_compute
util function that convertsOptional[Union[str, "ComputeStrategy"]]
to justComputeStrategy
. - Then check the fn and args.
And by doing this, compute
can only be None or str in the Dataset API level. under the hood, it's always the concrete type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, we already have a get_compute()
method.
Let me make a separate PR to do the refactoring, it would involve change AbstractUDFMap.compute
and _plan_udf_map_op
.
…roject#38606) We get user request to support class constructor args for `Dataset.map()` and `Dataset.flat_map()`, similar to `Dataset.map_batches()`. No technical reason why we cannot support it, given they are got converted to same map function during execution. This PR is to add them. Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
…roject#38606) We get user request to support class constructor args for `Dataset.map()` and `Dataset.flat_map()`, similar to `Dataset.map_batches()`. No technical reason why we cannot support it, given they are got converted to same map function during execution. This PR is to add them. Signed-off-by: Cheng Su <scnju13@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
We get user request to support class constructor args for
Dataset.map()
andDataset.flat_map()
, similar toDataset.map_batches()
. No technical reason why we cannot support it, given they are got converted to same map function during execution. This PR is to add them.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.