[core] Fix performance regression in single_client_tasks_and_get_batch #39362

vitsai · 2023-09-07T00:30:49Z

The single_client_tasks_and_get_batch benchmark saw a ~0.5-1k tasks/s average regression (2k tasks/s on a local machine) due to #38323, which changed some tracing logic to unconditionally change the signature of every remote function to accomodate tracing during _inject_tracing_into_function.

Make the signature change conditional again, but move it to the execution portion of RemoteFunction rather than the definition. Also make sure the injection only happens once even when the remote function is executed multiple times.

Why are these changes needed?

Related issue number

#39259

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

rkooo567 · 2023-09-07T08:53:02Z

Running microbenchmark here https://buildkite.com/ray-project/release-tests-pr/builds/52417.

@vitsai can you let me know the number after this PR from this test after it is completed?

rkooo567

Looks good to me. Nit comments

python/ray/util/tracing/tracing_helper.py

python/ray/remote_function.py

rkooo567 · 2023-09-07T08:57:21Z

python/ray/remote_function.py

@@ -254,6 +252,15 @@ def _remote(self, args=None, kwargs=None, **task_options):
        worker = ray._private.worker.global_worker
        worker.check_connected()

+        # We cannot do this when the function is first defined, because we need


Does this mean the before this change, it was already broken? I think when the remote function is first created, ray should automatically call ray.init

ray calls ray.init when the function is first invoked, not defined

edoakes · 2023-09-07T14:39:28Z

python/ray/remote_function.py

+        if not self._injected_tracing:
+            self._function = _inject_tracing_into_function(self._function)
+            self._function_signature = ray._private.signature.extract_signature(
+                self._function
+            )
+            self._injected_tracing = True


the ray API needs to be thread-safe, is there any concern of a race condition here if multiple threads invoke a function concurrently?

Added a mutex, and custom getstate/setstate for pickling (lock is not picklable).

Since simple assignments of variables are atomic in CPython and these two functions are idempotent I believe it would have been mostly fine except for the dictionary mutating part inside _inject_tracing_into_function. Also learned that Python doesn't provide any kind of CAS? ChatGPT suggested tuple swap x, y = y, x but seems like that is not actually atomic either

Actually no _inject is not idempotent so

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai · 2023-09-07T22:13:16Z

single_client_tasks_and_get_batch = [12.225312658308761, 0.38310197760588016] for microbenchmark release test after addressing comments (does not include latest commit, which only changes 1 line of test logic)

edoakes

One small comment otherwise lgtm

python/ray/remote_function.py

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

python/ray/remote_function.py

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

rkooo567 · 2023-09-08T00:11:34Z

Lmk when tests pass! Seems good to merge

vitsai · 2023-09-08T01:07:51Z

Tests look good, seems like microbenchmark ran into some infra failure so running it again now. On my machine, I did see the throughput go back up to ~13k/s from ~11k/s on the latest commit, so personally don't think we have to wait.

rkooo567 · 2023-09-08T01:19:33Z

You machine has a different environment from microbenchmark, so we shouldn't rely on that result. Let me rerun tests

rkooo567 · 2023-09-08T01:20:22Z

Try again here; https://buildkite.com/ray-project/release-tests-pr/builds/52670

rkooo567 · 2023-09-08T01:20:56Z

wait, actually I realized it succeeded? https://buildkite.com/ray-project/release-tests-pr/builds/52417#018a6f01-015c-4d80-8f8c-f37dc1a0a686

rkooo567 · 2023-09-08T01:21:47Z

=

single_client_tasks_and_get_batch = [11.222801689347772, 0.4976784102185062]

<br class="Apple-interchange-newline

single_client_tasks_and_get_batch = [11.222801689347772, 0.4976784102185062]

vitsai · 2023-09-08T01:49:27Z

(probably because I retried at the time of failure in the last comment)

ray-project#39362) The single_client_tasks_and_get_batch benchmark saw a ~0.5-1k tasks/s average regression (2k tasks/s on a local machine) due to ray-project#38323, which changed some tracing logic to unconditionally change the signature of every remote function to accomodate tracing during _inject_tracing_into_function. Make the signature change conditional again, but move it to the execution portion of RemoteFunction rather than the definition. Also make sure the injection only happens once even when the remote function is executed multiple times.

#39362) (#39429) The single_client_tasks_and_get_batch benchmark saw a ~0.5-1k tasks/s average regression (2k tasks/s on a local machine) due to #38323, which changed some tracing logic to unconditionally change the signature of every remote function to accomodate tracing during _inject_tracing_into_function. Make the signature change conditional again, but move it to the execution portion of RemoteFunction rather than the definition. Also make sure the injection only happens once even when the remote function is executed multiple times.

ray-project#39362) The single_client_tasks_and_get_batch benchmark saw a ~0.5-1k tasks/s average regression (2k tasks/s on a local machine) due to ray-project#38323, which changed some tracing logic to unconditionally change the signature of every remote function to accomodate tracing during _inject_tracing_into_function. Make the signature change conditional again, but move it to the execution portion of RemoteFunction rather than the definition. Also make sure the injection only happens once even when the remote function is executed multiple times. Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>

ray-project#39362) The single_client_tasks_and_get_batch benchmark saw a ~0.5-1k tasks/s average regression (2k tasks/s on a local machine) due to ray-project#38323, which changed some tracing logic to unconditionally change the signature of every remote function to accomodate tracing during _inject_tracing_into_function. Make the signature change conditional again, but move it to the execution portion of RemoteFunction rather than the definition. Also make sure the injection only happens once even when the remote function is executed multiple times. Signed-off-by: Victor <vctr.y.m@example.com>

lazily add tracing to remote functions

c61465d

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai assigned edoakes and rkooo567 Sep 7, 2023

rkooo567 approved these changes Sep 7, 2023

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 7, 2023

edoakes reviewed Sep 7, 2023

View reviewed changes

comments

6953ab5

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 7, 2023

fix test

84d319d

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

edoakes reviewed Sep 7, 2023

View reviewed changes

python/ray/remote_function.py Outdated Show resolved Hide resolved

python/ray/remote_function.py Outdated Show resolved Hide resolved

comments

b5b499b

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

rkooo567 reviewed Sep 7, 2023

View reviewed changes

python/ray/remote_function.py Outdated Show resolved Hide resolved

vitsai added 2 commits September 7, 2023 22:38

comment

b57deeb

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

remove comment

229a022

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

edoakes approved these changes Sep 7, 2023

View reviewed changes

rkooo567 merged commit b6edccf into ray-project:master Sep 8, 2023
88 of 96 checks passed

vitsai mentioned this pull request Sep 8, 2023

[core] Fix performance regression in single_client_tasks_and_get_batc… #39429

Merged

8 tasks

rickyyx mentioned this pull request Oct 24, 2023

[core] microbenchmark regression #40606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Fix performance regression in single_client_tasks_and_get_batch #39362

[core] Fix performance regression in single_client_tasks_and_get_batch #39362

vitsai commented Sep 7, 2023 •

edited

Loading

rkooo567 commented Sep 7, 2023

rkooo567 left a comment

rkooo567 Sep 7, 2023

edoakes Sep 7, 2023

edoakes Sep 7, 2023

vitsai Sep 7, 2023

vitsai Sep 7, 2023

vitsai commented Sep 7, 2023

edoakes left a comment

rkooo567 commented Sep 8, 2023

vitsai commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

vitsai commented Sep 8, 2023

[core] Fix performance regression in single_client_tasks_and_get_batch #39362

[core] Fix performance regression in single_client_tasks_and_get_batch #39362

Conversation

vitsai commented Sep 7, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

rkooo567 commented Sep 7, 2023

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 Sep 7, 2023

Choose a reason for hiding this comment

edoakes Sep 7, 2023

Choose a reason for hiding this comment

edoakes Sep 7, 2023

Choose a reason for hiding this comment

vitsai Sep 7, 2023

Choose a reason for hiding this comment

vitsai Sep 7, 2023

Choose a reason for hiding this comment

vitsai commented Sep 7, 2023

edoakes left a comment

Choose a reason for hiding this comment

rkooo567 commented Sep 8, 2023

vitsai commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

rkooo567 commented Sep 8, 2023

vitsai commented Sep 8, 2023

vitsai commented Sep 7, 2023 •

edited

Loading