[Data] Streaming executor backpressure #40754
Labels
data
Ray Data-related issues
data-stability
P1
Issue that should be fixed within a few weeks
ray 2.10
Ray Data now has switched to the streaming execution backend. For Datasets that don't have aggregation operators, all data should be streamed through all the operators. However, if any operator is slow, data will pile up in the buffer and may cause OOM, disk spilling, or even out-of-disk errors.
As of Ray 2.7, we have implemented following backpressure mechanisms:
Despite above mechanisms, there are still some scenarios where backpressure doesn't work properly.
parallelism
in your read op (e.g.,ray.data.read_image(..., parallelism=N)
), so that each task is more fine-grained. This can usually solve most cases, unless one single file is too big.OpRuntimeMetrics
to make the calculation more accurate.The text was updated successfully, but these errors were encountered: