[Data] Allow tasks to control concurrency in map-like APIs #42637

c21 · 2024-01-24T01:18:37Z

Why are these changes needed?

This PR is to allow tasks to control concurrency in map-like APIs, when user uses map_batches(fn, concurrency=...). Each TaskPoolMapOperator will have a concurrency cap to control the concurrency.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

bveeramani

LGTM

bveeramani · 2024-01-29T19:05:21Z

python/ray/data/_internal/execution/operators/task_pool_map_operator.py

            ray_remote_args: Customize the ray remote args for this op's tasks.
        """
+        self._concurrency = concurrency


Nit: Don't think it really matters here, but IIRC calling super().__init__ first is a best practice

bveeramani · 2024-01-29T19:06:32Z

python/ray/data/_internal/util.py

+                    raise ValueError(
+                        "``concurrency`` is set as a tuple of integers, but ``fn`` "
+                        f"is not a callable class: {fn}. Use ``concurrency=n`` to "
+                        "control maximal number of workers to use."


Suggested change

"control maximal number of workers to use."

"control maximum number of workers to use."

bveeramani · 2024-01-29T19:11:54Z

python/ray/data/_internal/execution/backpressure_policy/concurrency_cap_backpressure_policy.py

-    # The multiplier to multiply the concurrency cap by.
-    CAP_MULTIPLIER = 2.0
-    CAP_MULTIPLIER_CONFIG_KEY = "backpressure_policies.concurrency_cap.cap_multiplier"
+    NOTE: Only support setting concurrency cap for `TaskPoolMapOperator` for now.


What's our motivation for not adding it to ActorPoolMapOperator? Is it because there's a separate code path for controlling actor concurrency?

Is it because there's a separate code path for controlling actor concurrency?

Yes, the concurrency cap back pressure policy here is not useful given we already have internal scaling up and down in ActorPoolMapOperator.

also add this to the comment? and maybe also leave a TODO to consolidate this policy with the actor pool concurrency.

raulchen · 2024-01-29T21:09:09Z

python/ray/data/_internal/compute.py

+
+        if size is not None and size < 1:
+            raise ValueError("`size` must be >= 1", size)
+        self.size = size


nit, maybe rename this to max_size to make the meaning clearer.

I think it's okay, let's be consistent with naming of ActorPoolStrategy(size=...).

raulchen · 2024-01-29T21:14:21Z

python/ray/data/_internal/execution/backpressure_policy/concurrency_cap_backpressure_policy.py

-    # The multiplier to multiply the concurrency cap by.
-    CAP_MULTIPLIER = 2.0
-    CAP_MULTIPLIER_CONFIG_KEY = "backpressure_policies.concurrency_cap.cap_multiplier"
+    NOTE: Only support setting concurrency cap for `TaskPoolMapOperator` for now.


also add this to the comment? and maybe also leave a TODO to consolidate this policy with the actor pool concurrency.

raulchen · 2024-01-29T21:14:27Z

python/ray/data/_internal/execution/backpressure_policy/concurrency_cap_backpressure_policy.py


    The concrete stategy is as follows:
-    - Each PhysicalOperator is assigned an initial concurrency cap.
+    - Each PhysicalOperator is assigned a concurrency cap.
    - An PhysicalOperator can run new tasks if the number of running tasks is less
      than the cap.


The comment looks a bit too verbose since we've removed the exponential ramp-up. We can simplify it to something like "The policy will limit the number of concurrently running tasks based on its concurrency parameter.`

raulchen · 2024-01-29T21:17:27Z

python/ray/data/tests/test_backpressure_policies.py

@@ -185,24 +138,6 @@ def test_e2e_normal(self):
        start2, end2 = ray.get(actor.get_start_and_end_time_for_op.remote(2))
        assert start1 < start2 < end1 < end2, (start1, start2, end1, end2)

-    def test_e2e_no_ramping_up(self):


we also need to update the above test_e2e_normal to use the concurrency parameter.

thank, updated.

Signed-off-by: Cheng Su <scnju13@gmail.com>

c21 requested review from ericl, scv119, amogkam, scottjlee, bveeramani, raulchen, stephanie-wang and Zandew as code owners January 24, 2024 01:18

c21 changed the title ~~[WIP][Data] Allow tasks to control concurrency in map-like APIs~~ [Data] Allow tasks to control concurrency in map-like APIs Jan 26, 2024

c21 assigned raulchen Jan 26, 2024

c21 force-pushed the cap branch from f0b0fdc to f5de03e Compare January 26, 2024 22:12

c21 assigned bveeramani Jan 29, 2024

bveeramani approved these changes Jan 29, 2024

View reviewed changes

raulchen approved these changes Jan 29, 2024

View reviewed changes

c21 added 6 commits January 29, 2024 13:59

Allow tasks to control concurrency in map-like APIs

4e17b31

Signed-off-by: Cheng Su <scnju13@gmail.com>

Change to remove exponential rampup

398105b

Signed-off-by: Cheng Su <scnju13@gmail.com>

Fix unit test

2ca833d

Signed-off-by: Cheng Su <scnju13@gmail.com>

Update test

20f185b

Signed-off-by: Cheng Su <scnju13@gmail.com>

Address comments

6e70338

Signed-off-by: Cheng Su <scnju13@gmail.com>

Address comments

06f5d4a

Signed-off-by: Cheng Su <scnju13@gmail.com>

c21 force-pushed the cap branch from c480cb3 to 06f5d4a Compare January 29, 2024 23:14

c21 merged commit 46a9efe into ray-project:master Jan 30, 2024
9 checks passed

c21 deleted the cap branch January 30, 2024 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Allow tasks to control concurrency in map-like APIs #42637

[Data] Allow tasks to control concurrency in map-like APIs #42637

c21 commented Jan 24, 2024 •

edited

bveeramani left a comment

bveeramani Jan 29, 2024

c21 Jan 29, 2024

bveeramani Jan 29, 2024

c21 Jan 29, 2024

bveeramani Jan 29, 2024

c21 Jan 29, 2024

raulchen Jan 29, 2024

c21 Jan 29, 2024

raulchen Jan 29, 2024

c21 Jan 29, 2024

raulchen Jan 29, 2024

raulchen Jan 29, 2024

c21 Jan 29, 2024

raulchen Jan 29, 2024

c21 Jan 29, 2024

	"control maximal number of workers to use."
	"control maximum number of workers to use."

[Data] Allow tasks to control concurrency in map-like APIs #42637

[Data] Allow tasks to control concurrency in map-like APIs #42637

Conversation

c21 commented Jan 24, 2024 • edited

Why are these changes needed?

Related issue number

Checks

bveeramani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c21 commented Jan 24, 2024 •

edited