[data] Add iterator batch_format=None support, which will yield batches in the current batch format with zero copies#33562
Conversation
| batch_size: Optional[Union[int, Literal["default"]]] = "default", | ||
| compute: Optional[Union[str, ComputeStrategy]] = None, | ||
| batch_format: Literal["default", "pandas", "pyarrow", "numpy"] = "default", | ||
| batch_format: Optional[str] = "default", |
There was a problem hiding this comment.
keep it as Optional[Literal] for full explicitness of supported batch formats?
There was a problem hiding this comment.
I feel like that is hard to maintain (per the inconsistencies in the code already), so opted to go unify on the shorter signature.
|
lets also keep the documentation changes from https://github.com/ray-project/ray/pull/33536/files#diff-988f3832ac94d085daf61260175e2580920ebd1521dc760f58b426b94379d5b7L235? |
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Done |
| ``"numpy"`` to select ``numpy.ndarray`` for tensor datasets and | ||
| ``Dict[str, numpy.ndarray]`` for tabular datasets. Default is "default". | ||
| ``Dict[str, numpy.ndarray]`` for tabular datasets, or None to return | ||
| the underlying block exactly as is with no additional formatting. |
There was a problem hiding this comment.
Nice, I like batch_size=None a good bit more than adding another literal string!
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: elliottower <elliot@elliottower.com>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: elliottower <elliot@elliottower.com>
…es in the current batch format with zero copies (ray-project#33562) This PR is a cleanup of ray-project#33536 It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format. Signed-off-by: Jack He <jackhe2345@gmail.com>
…project#324… (ray-project#33601) The failure in rllib should have been fixed by ray-project#33562 Verified with `python -m pytest rllib/core/learner/torch/tests/test_torch_learner.py::TestLearner::test_end_to_end_update`. Signed-off-by: Jack He <jackhe2345@gmail.com>
Why are these changes needed?
This PR is a cleanup of #33536
It uses "None" instead of "zero-copy" as a batch format, since None has a similar meaning for batch_size, where it means a system-chosen batch size. Here "None" also means the system chosen optimal batch format.