Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] ray.util.ActorPool supports batch submission of remote actor tasks. #39196

Open
larrylian opened this issue Sep 1, 2023 · 4 comments
Open
Assignees
Labels
core Issues that should be addressed in Ray Core core-util enhancement Request for new feature and/or capability P2 Important issue, but not time-critical

Comments

@larrylian
Copy link
Contributor

larrylian commented Sep 1, 2023

1. Description

Recently, I submitted a Ray Enhancement proposal :
[Core] Add a new interface for submitting actor tasks in batches (Batch Remote) #39048 .

The implementation of batch remote api for batch submit actor task:
[WIP][Core]Add batch remote api for batch submit actor task #35597

actors = [Counter.remote(i) for i in range(num_actors)]

object_refs = BatchRemoteHandle(actors).get_value.remote()
print(ray.get(object_refs))

I believe that the batch remote interface is a great addition to be used with the ActorPool.

The current usage of ActorPool mainly involves coordinating the semantics of map, submit, and get_next.
For simple scenarios involving batch submission of ActorTasks, the current API usage of ActorPool becomes overly redundant.
I would like to propose adding a new batch_submit_handle()method to address this. The main purpose are:
1. Simpler syntax for invocation
2. Improved performance.

2. User Usage

Scenario:

  1. All actors are of the same type.
  2. The tasks to be batch-submitted are the same.
  3. The parameters of actors tasks to bet submited are the same.
@ray.remote
class Actor:
    def double(self, v):
        return 2 * v

a1, a2 = Actor.remote(), Actor.remote()
pool = ActorPool([a1, a2])

object_refs = pool.batch_submit_handle().double.remote(v);
print(ray.get(object_refs))

3. Advantages

1. Significant performance improvement

We have already implemented and conducted extensive performance testing internally.
The following are the performance comparison results.

Table 1: Comparison of remote call time with varying parameter sizes and 400 Actors

Parameter Size (byte) Time taken for foreach_remote(ms) Time taken for batch_remote(ms) The ratio of time reduction
10 40.532 9.226 77.2%
409846 584.345 24.106 95.9%
819446 606.725 16.019 97.4%
1229047 976.974 19.403 98.0%
1638647 993.454 23.184 97.7%
2048247 972.438 19.028 98.0%
2457850 987.04 17.642 98.2%
2867450 976.165 15.07 98.5%
3277050 1108.331 18.272 98.4%
3686650 1186.371 16.011 98.7%
4096250 1335.575 15.951 98.8%
4505850 1490.914 20.928 98.6%
4915450 1511.744 23.041 98.5%
5325050 1716.752 17.515 99.0%
5734650 2009.711 22.891 98.9%
6144250 2424.166 23.129 99.0%
6553850 2354.033 20.271 99.1%
6963450 2599.015 24.347 99.1%
7373050 2610.843 17.91 99.3%
7782650 2751.179 17.258 99.4%

Comparison of remote call time with varying parameter sizes and 400 Actors

Conclusion:

  1. The larger the parameter size, the greater the performance gain of batch_remote().

Table 2: Comparison of remote call time with varying numbers of Actors and a fixed parameter size (1MB)

actor counts Time taken for foreach_remote(ms) Time taken for batch_remote(ms) The ratio of time reduction
50 95.889 4.657 95.1%
100 196.184 8.447 95.7%
150 291.879 15.228 94.8%
200 373.806 21.161 94.3%
250 475.482 20.768 95.6%
300 577.406 24.323 95.8%
350 646.415 24.59 96.2%
400 763.94 34.806 95.4%
450 874.601 30.723 96.5%
500 955.915 45.888 95.2%
550 1041.728 39.736 96.2%
600 1124.786 45.122 96.0%
650 1262.363 46.687 96.3%
700 1331.427 60.543 95.5%
750 1407.485 47.386 96.6%
800 1555.571 55.297 96.4%
850 1549.03 60.493 96.1%
900 1675.685 58.268 96.5%
950 2314.186 48.785 97.9%

Comparison of remote call time with varying numbers of Actors and 1MB parameter size

Conclusion:
The more actors, the greater the performance gain.

Table 3: Comparison of remote call time with varying numbers of Actors and no parameters in remote calls

This test is to confirm the degree of performance optimization after reducing the frequency of switching between the Python and C++ execution layers.

actor counts Time taken for foreach_remote(ms) Time taken for batch_remote(ms) The ratio of time reduction
50 2.083 1.257 39.7%
100 4.005 2.314 42.2%
150 5.582 3.467 37.9%
200 8.104 3.574 55.9%
250 10.104 4.64 54.1%
300 11.858 6.224 47.5%
350 13.826 8.017 42.0%
400 15.862 8.145 48.7%
450 18.368 9.261 49.6%
500 18.881 10.722 43.2%
550 21.129 11.944 43.5%
600 23.413 12.925 44.8%
650 26.485 13.328 49.7%
700 27.855 14.303 48.7%
750 29.432 14.922 49.3%
800 31.03 16.329 47.4%
850 32.405 17.582 45.7%
900 34.388 18.521 46.1%
950 36.499 19.658 46.1%

Comparison of remote call time with varying numbers of Actors and no parameters in remote calls

Conclusion:
After comparison, in the scenario of remote calls without parameters, the performance is optimized by 2+ times.

Table 4: Comparison of remote call time with varying numbers of Actors and object ref parameters in remote calls

actor counts The time taken for foreach_remote(ms) The time taken for batch_remote(ms) The ratio of time reduction
50 3.878 1.488 61.6%
100 8.383 2.405 71.3%
150 12.16 3.255 73.2%
200 16.835 4.913 70.8%
250 21.09 6.424 69.5%
300 24.674 8.272 66.5%
350 28.639 8.862 69.1%
400 33.42 10.352 69.0%
450 37.39 12.02 67.9%
500 39.944 13.288 66.7%
550 45.019 15.005 66.7%
600 48.237 15.349 68.2%
650 53.304 17.149 67.8%
700 56.961 18.124 68.2%
750 61.672 19.079 69.1%
800 66.185 20.485 69.0%
850 69.524 21.584 69.0%
900 74.754 22.304 70.2%
950 79.493 25.932 67.4%

Comparison of remote call time with varying numbers of Actors and object ref parameters in remote calls

Conclusion:
After comparison, in the scenario of remote calls with object ref paramter, the performance is optimized by 3~4 times.

Summary:
The newly added Batch Remote API can improve performance in the case of batch calling Actor task. It can reduce performance costs such as parameter serialization, object store consumption, and Python and C++ execution layer switching, thereby improving the performance of the entire distributed computing system.
Especially in the following scenario:

  1. large parameters
  2. a large number of Actors

4. Implementation

@DeveloperAPI
class ActorPool:
     def batch_submit_handle(self): BatchRemoteHandle
           return BatchRemoteHandle(self._idle_actors)
@larrylian larrylian added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) core Issues that should be addressed in Ray Core labels Sep 1, 2023
@larrylian
Copy link
Contributor Author

larrylian commented Sep 12, 2023

@PRESIDENT810 Can you give some suggestions?

@rkooo567
Copy link
Contributor

Q: can we hide batch from the API and just do internally? Maybe we can set a parameter to actor pool or submit API instead?

@rkooo567 rkooo567 added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 25, 2023
@rkooo567
Copy link
Contributor

Note: It will be contributed from @larrylian

@larrylian
Copy link
Contributor Author

Q: can we hide batch from the API and just do internally? Maybe we can set a parameter to actor pool or submit API instead?

Good idea. Let me take some time to design it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core core-util enhancement Request for new feature and/or capability P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants