[data] implement streaming output backpressure #40387

raulchen · 2023-10-17T00:18:51Z

Why are these changes needed?

This PR integrates Ray Core's streaming generator backpressure to Data. This enables throttling the streaming outputs of tasks. That is, when the downstream ops are too slow to consume data, the upstream ops should stop yielding streaming outputs.

Also fixes a Ray Core bug, see task_manager.cc.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-10-17T00:36:46Z

This PR is ready for a preliminary review. It depends on #40285. I've tested it locally by patching that PR. Unit tests will be added later when that PR is merged.

stephanie-wang

I think maybe I am missing something about how the streaming generator backpressure works, or maybe the code just needs more documentation (I left some comments on places that could be explained more).

Does the streaming generator's backpressure limit apply to ObjectRefs that haven't been yielded to the caller yet, or is it based on how many output ObjectRefs are still in scope? What exactly is the guarantee of the max output that may get queued for a single op (is it op_output_buffer_size_bytes, streaming_gen_backpressure_size, or the sum)? It'd be helpful if these things could be explained in the code.

stephanie-wang · 2023-10-17T16:15:35Z

python/ray/data/_internal/execution/interfaces/physical_operator.py

+        read_bytes = 0
+        # If max_bytes_to_read is None, we will read all available blocks.
+        # Otherwise, we will read until we reach max_bytes_to_read.
+        while max_bytes_to_read is None or max_bytes_to_read > 0:


Does this guarantee that we will read at least one block if the block's size is greater than max_bytes_to_read?

good question. It doesn't guarantee now. I think I can make it read at least one block when the op buffer is empty.

oh, I misunderstood. Actually it already guarantees. I'll add a note here.

stephanie-wang · 2023-10-17T16:17:06Z

python/ray/data/_internal/execution/streaming_executor_state.py

+    active_tasks: Dict[Waitable, Tuple[OpState, OpTask]] = {}
+    # Current output buffer sizes for each operator.
+    # Used for streaming output backpressure.
+    output_bufer_sizes: Dict[OpState, int] = {}


Suggested change

output_bufer_sizes: Dict[OpState, int] = {}

output_buffer_sizes_bytes: Dict[OpState, int] = {}

stephanie-wang · 2023-10-17T16:18:01Z

python/ray/data/context.py

+    # `_streaming_generator_backpressure_size_bytes` parameter).
+    streaming_gen_backpressure_size: int = 1 * 1024 * 1024 * 1024
+    # Max size of the output buffer (`OpState.outqueue`) for each operator.
+    op_output_buffer_size_bytes: int = 1 * 1024 * 1024 * 1024


Suggested change

op_output_buffer_size_bytes: int = 1 * 1024 * 1024 * 1024

max_op_output_buffer_size_bytes: int = 1 * 1024 * 1024 * 1024

stephanie-wang · 2023-10-17T16:18:21Z

python/ray/data/context.py

+
+    # The streaming genenrator-level backpressure size (i.e., the
+    # `_streaming_generator_backpressure_size_bytes` parameter).
+    streaming_gen_backpressure_size: int = 1 * 1024 * 1024 * 1024


Suggested change

streaming_gen_backpressure_size: int = 1 * 1024 * 1024 * 1024

max_streaming_gen_output_size_bytes: int = 1 * 1024 * 1024 * 1024

stephanie-wang · 2023-10-17T16:29:21Z

python/ray/data/_internal/execution/streaming_executor_state.py

+    # All active tasks, keyed by their waitable.
+    active_tasks: Dict[Waitable, Tuple[OpState, OpTask]] = {}
+    # Current output buffer sizes for each operator.
+    # Used for streaming output backpressure.


It would be helpful to expand this comment.

stephanie-wang · 2023-10-17T16:29:41Z

python/ray/data/context.py

@@ -144,6 +145,24 @@
 DEFAULT_ENABLE_GET_OBJECT_LOCATIONS_FOR_METRICS = False


+@dataclass
+class StreamingOutputBackpressureConfig:
+    """Configuration for task-level streaming output backpressure."""


It would be helpful to expand this comment.

stephanie-wang · 2023-10-17T16:30:08Z

python/ray/data/context.py

+    # the app-level buffer size.
+
+    # The streaming genenrator-level backpressure size (i.e., the
+    # `_streaming_generator_backpressure_size_bytes` parameter).


It would be helpful to expand this comment.

Also, if you can explain the relationship between this and the other output cap.

stephanie-wang · 2023-10-17T16:31:39Z

There could be a deadlock if an upstream op is backpressured, and all resources are allocated for this op. In this case, we have no resources to schedule downstream op tasks to consume the data. This issue isn't solved in this PR. https://github.com/ray-project/ray/pull/40275can mitigate this issue, but cannot eliminate it completely. There are some potential solution we can try in 2.9:

So we will disable this by default, right?

Long-term, I don't see a way around this except by implementing some sort of task preemption.

raulchen · 2023-10-17T19:18:32Z

I'll add the comment on how streaming gen backpressure works in code.
Sang's PR has a link to the doc.
Regarding the solution to the deadlock issue, I guess the simplest approach we can implement in 2.9 is to detect if the downstream ops have no resources to run. If so, we temporarily disable backpressure for the current op.
I guess this should reduce memory usage for most cases. For the edge cases, the behavior should remain the same as before.

stephanie-wang · 2023-10-17T19:44:02Z

I'll add the comment on how streaming gen backpressure works in code. Sang's PR has a link to the doc. Regarding the solution to the deadlock issue, I guess the simplest approach we can implement in 2.9 is to detect if the downstream ops have no resources to run. If so, we temporarily disable backpressure for the current op. I guess this should reduce memory usage for most cases. For the edge cases, the behavior should remain the same as before.

I see, I didn't realize we had a way to disable backpressure. That seems fine then.

Can we add that to this PR? Even if the feature is disabled, it's not really useful to merge into 2.8 unless we have something close to the final version that we can test. If we add the deadlock fix, we can test for unknown unknowns, instead of just running into the known deadlock issue.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-10-17T21:29:07Z

@stephanie-wang that fix doesn't seems super straight-forward. I slightly lean towards merging this PR first and add that fix later (hopefully before 2.8 release). Without that fix, this should still be useful. Because: 1) If one op's tasks won't use all resources, it's fine; 2) the concurrency cap ramp-up will make the resource allocation more balanced and mitigate the issue; 3) for internal tests, we can always tune configs to avoid the issue.
What do you think?
Other comments are addressed.

Update: per offline discussion, we can check the sum of downstreams' num_active_tasks . This would be simpler than checking resources.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-10-19T05:35:37Z

@stephanie-wang @c21 This PR is ready for another review.
Main changes include: 1) migrated to the new BackpressurePolicy framework. 2) resolved the deadlock issue.
one remaining comment is to make the config based on number of blocks, I'll change that tomorrow.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

This reverts commit 259983c.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-10-19T20:59:25Z

Changed configs to be based on number of blocks.

stephanie-wang · 2023-10-20T18:03:52Z

python/ray/data/_internal/execution/backpressure_policy.py

+    - At the Ray Core level, we use
+      `MAX_BLOCKS_IN_GENERATOR_BUFFER` to limit the number of blocks buffered in
+      the streaming generator of each OpDataTask. When it's reached, the task will
+      be blocked at `yield`.


Suggested change

be blocked at `yield`.

be blocked at `yield` until the caller reads another `ObjectRef`.

python/ray/data/_internal/execution/backpressure_policy.py

stephanie-wang · 2023-10-20T18:07:18Z

python/ray/data/_internal/execution/backpressure_policy.py

+                # If all downstream operators are idle, it could be because no resources
+                # are available. In this case, we'll make sure to read at least one
+                # block to avoid deadlock.


I am still unsure about this condition.

"it could be" -> meaning that there cases where the downstream operators are idle but it is not a deadlock? Can we guarantee that it is a deadlock?

Actually isn't exponential rampup one of the cases where it's not actually a deadlock?

I guess it is okay to merge for now, but I think we should revisit this condition after 2.8.

yes, this is not 100% percent accurate. so the current strategy is conservative, it only unblocks one block at a time. I'll add a TODO here.

After a second thought, it seems that the only false positive case is when the executor just starts, the first op hasn't produced blocks for the second op to consume.
For the exponential ramp-up case, if an op is backpressured due to the concurrency cap. num_active_tasks will be greater than 0.

stephanie-wang · 2023-10-20T18:12:09Z

python/ray/data/_internal/execution/backpressure_policy.py

-    def can_run(self, op: "PhysicalOperator") -> bool:
-        """Called when StreamingExecutor selects an operator to run in
-        `streaming_executor_state.select_operator_to_run()`.
+    def calcuate_max_blocks_to_read_per_op(


Suggested change

def calcuate_max_blocks_to_read_per_op(

def calculate_max_blocks_to_read_per_op(

stephanie-wang · 2023-10-20T18:16:52Z

python/ray/data/tests/test_backpressure_policies.py

+        assert start1 < start2 < end1 < end2, (start1, start2, end1, end2)
+
+
+class TestStreamOutputBackpressurePolicy(unittest.TestCase):


Nice tests!

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-10-23T21:20:36Z

src/ray/core_worker/task_manager.cc

-      total_unconsumed >= backpressure_threshold
-      // We can only backpressure the last generated item.
-      && item_index >= total_generated - 1) {
+      (item_index - stream_it->second.LastConsumedIndex()) >= backpressure_threshold) {


cc @rkooo567 @jjyao This fixes a bug introduced by #40285
The issue happens when we set threshold to 2, and index 1 comes before index 0. the unit tests of this PR can cover this issue.

This PR integrates [Ray Core's streaming generator backpressure](ray-project#40285) to Data. This enables throttling the streaming outputs of tasks. That is, when the downstream ops are too slow to consume data, the upstream ops should stop yielding streaming outputs. Also fixes a Ray Core bug, see `task_manager.cc`. --------- Signed-off-by: Hao Chen <chenh1024@gmail.com>

This PR integrates [Ray Core's streaming generator backpressure](#40285) to Data. This enables throttling the streaming outputs of tasks. That is, when the downstream ops are too slow to consume data, the upstream ops should stop yielding streaming outputs. Also fixes a Ray Core bug, see `task_manager.cc`. --------- Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen added 10 commits October 16, 2023 17:15

basic impl

6e7de8c

Signed-off-by: Hao Chen <chenh1024@gmail.com>

debug

7881f51

Signed-off-by: Hao Chen <chenh1024@gmail.com>

debug

b192a78

Signed-off-by: Hao Chen <chenh1024@gmail.com>

refine

c5a8453

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix

90afcfe

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix on_waitable_ready

18df710

Signed-off-by: Hao Chen <chenh1024@gmail.com>

revert

b746753

Signed-off-by: Hao Chen <chenh1024@gmail.com>

refine

c08d041

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comments

75b6c23

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Merge branch 'master' into streaming-gen-backpressure

1fc504f

raulchen requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani and stephanie-wang as code owners October 17, 2023 00:18

raulchen assigned stephanie-wang and c21 Oct 17, 2023

stephanie-wang requested changes Oct 17, 2023

View reviewed changes

stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 17, 2023

raulchen added 4 commits October 17, 2023 13:16

Merge branch 'master' into streaming-gen-backpressure

0ea025a

comments

65d449d

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comment

7a7b423

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comment

87e719c

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen added 2 commits October 18, 2023 22:18

fix

6891f41

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comments

9632e84

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 19, 2023

raulchen added 4 commits October 19, 2023 11:51

config based on number

8fa34ec

Signed-off-by: Hao Chen <chenh1024@gmail.com>

ray core

259983c

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix test

af82f99

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comment

8e38323

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen requested review from a team and pcmoritz as code owners October 19, 2023 20:38

raulchen added 2 commits October 19, 2023 13:38

Revert "ray core"

21b6d19

This reverts commit 259983c.

comments

84539ba

Signed-off-by: Hao Chen <chenh1024@gmail.com>

stephanie-wang approved these changes Oct 20, 2023

View reviewed changes

c21 approved these changes Oct 20, 2023

View reviewed changes

raulchen added 2 commits October 20, 2023 14:08

Merge branch 'master' into streaming-gen-backpressure

a35f380

refine

ebdc82f

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen requested a review from Zandew as a code owner October 20, 2023 21:25

raulchen added 2 commits October 20, 2023 14:28

typo

d0769fb

Signed-off-by: Hao Chen <chenh1024@gmail.com>

refine

7b71cc8

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen mentioned this pull request Oct 20, 2023

[Data] Issue with backpressure when used with python generators #40287

Closed

raulchen added 2 commits October 23, 2023 13:59

Merge branch 'master' into streaming-gen-backpressure

5aaa08e

fix ray core

41951fb

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen commented Oct 23, 2023

View reviewed changes

raulchen merged commit d59f597 into ray-project:master Oct 23, 2023
42 of 51 checks passed

raulchen deleted the streaming-gen-backpressure branch October 23, 2023 23:27

raulchen mentioned this pull request Oct 24, 2023

[cherry-pick][data] implement streaming output backpressure #40632

Merged

8 tasks

raulchen mentioned this pull request Oct 27, 2023

[Data] Streaming executor backpressure #40754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] implement streaming output backpressure #40387

[data] implement streaming output backpressure #40387

raulchen commented Oct 17, 2023 •

edited

Loading

raulchen commented Oct 17, 2023

stephanie-wang left a comment

stephanie-wang Oct 17, 2023

raulchen Oct 17, 2023

raulchen Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang Oct 17, 2023

stephanie-wang commented Oct 17, 2023

raulchen commented Oct 17, 2023

stephanie-wang commented Oct 17, 2023

raulchen commented Oct 17, 2023 •

edited

Loading

raulchen commented Oct 19, 2023

raulchen commented Oct 19, 2023

stephanie-wang Oct 20, 2023

stephanie-wang Oct 20, 2023

stephanie-wang Oct 20, 2023

raulchen Oct 20, 2023

raulchen Oct 20, 2023

stephanie-wang Oct 20, 2023

stephanie-wang Oct 20, 2023

raulchen Oct 23, 2023 •

edited

Loading

	output_bufer_sizes: Dict[OpState, int] = {}
	output_buffer_sizes_bytes: Dict[OpState, int] = {}

	op_output_buffer_size_bytes: int = 1 * 1024 * 1024 * 1024
	max_op_output_buffer_size_bytes: int = 1 * 1024 * 1024 * 1024

	streaming_gen_backpressure_size: int = 1 * 1024 * 1024 * 1024
	max_streaming_gen_output_size_bytes: int = 1 * 1024 * 1024 * 1024

	be blocked at `yield`.
	be blocked at `yield` until the caller reads another `ObjectRef`.

	def calcuate_max_blocks_to_read_per_op(
	def calculate_max_blocks_to_read_per_op(

		assert start1 < start2 < end1 < end2, (start1, start2, end1, end2)


		class TestStreamOutputBackpressurePolicy(unittest.TestCase):

[data] implement streaming output backpressure #40387

[data] implement streaming output backpressure #40387

Conversation

raulchen commented Oct 17, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

raulchen commented Oct 17, 2023

stephanie-wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanie-wang commented Oct 17, 2023

raulchen commented Oct 17, 2023

stephanie-wang commented Oct 17, 2023

raulchen commented Oct 17, 2023 • edited Loading

raulchen commented Oct 19, 2023

raulchen commented Oct 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulchen Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

raulchen commented Oct 17, 2023 •

edited

Loading

raulchen commented Oct 17, 2023 •

edited

Loading

raulchen Oct 23, 2023 •

edited

Loading