-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Closed
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesquestionJust a question :)Just a question :)
Description
Description
Is there a reason for ray.data.iter_batches to execute lazy transformation of the dataset?
We have an application in which an extremely large ray data is consumed by a generator, so iter_batches is used to yield the data for that use case.
However, iter_batches invokes the total transformation, thus exceeding object storage and getting an OOM error.
- why doesn't the object storage spill to disk to prevent the OOM error?
- why does iter_batches need to execute lazy transformation?
- is there a streaming iter_batches in the pipeline?
Thanks
Use case
another user also reports needing a streaming iter_batches in which we have similar applications in mind
https://discuss.ray.io/t/ray-dataset-from-iterabledataset-no-lazy-implementation/20630
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesquestionJust a question :)Just a question :)