Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][enable_task_events] Options to disable task tracing on task/actor #42431

Merged
merged 20 commits into from Feb 29, 2024

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented Jan 16, 2024

Why are these changes needed?

There are requests to disable task events reporting at the actor/task level to avoid excessive data/communication:

This PR will allow users to disable task events reporting (both task status events + profile events) for a task/actor through a enable_task_events=False flag on the remote for actor and task:

@ray.remote
def f():
    pass

# For task
f.options(enable_task_events=False).remote()

@ray.remote
class Actor():
    def g(self):
       pass


# For entire actor
a = Actor.options(enable_task_events=False).remote()

# Or for the specific actor task 
a.g.options(enable_task_events=False).remote()

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx rickyyx changed the title [core][task-events] Options to disable task events reporting on task/actor [WIP][core][task-events] Options to disable task events reporting on task/actor Jan 16, 2024
@rkooo567
Copy link
Contributor

Please update the related issue! Also why is this draft? (is this ready to be reviewd? )

@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 18, 2024

Please update the related issue! Also why is this draft? (is this ready to be reviewd? )

Yeah, i would love to get a quick round of review for the high-level approach first.

@rickyyx rickyyx marked this pull request as ready for review January 18, 2024 18:28
@rickyyx rickyyx requested review from ericl, pcmoritz, raulchen and a team as code owners January 18, 2024 18:28
@MissiontoMars
Copy link

Will the "_report_task_events" set for the actor task override the "_report_task_events" set during the creation of the actor?
For example, in the following code:

@ray.remote
class Actor():
    def g(self):
       pass


a = Actor.options(_report_task_events=False).remote()

a.g.options(_report_task_events=True).remote()

Does the actor task g report task events or not?
@rickyyx

@rickyyx rickyyx marked this pull request as draft January 29, 2024 17:22
@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 29, 2024

Will the "_report_task_events" set for the actor task override the "_report_task_events" set during the creation of the actor? For example, in the following code:

@ray.remote
class Actor():
    def g(self):
       pass


a = Actor.options(_report_task_events=False).remote()

a.g.options(_report_task_events=True).remote()

Does the actor task g report task events or not? @rickyyx

Yes, an actor task level config would override the actor level config.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach lgtm

python/ray/tests/test_task_events.py Outdated Show resolved Hide resolved
@rkooo567
Copy link
Contributor

rkooo567 commented Feb 6, 2024

is this PR ready?

@rickyyx
Copy link
Contributor Author

rickyyx commented Feb 7, 2024

is this PR ready?

not yet - iterating on this.

Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx rickyyx changed the title [WIP][core][task-events] Options to disable task events reporting on task/actor [core][task-events] Options to disable task tracing on task/actor Feb 8, 2024
@rickyyx rickyyx marked this pull request as ready for review February 8, 2024 02:21
@rickyyx rickyyx requested a review from a team as a code owner February 8, 2024 02:21
@rickyyx rickyyx added the core-interface-change-approval-required This changes the Ray core behavior / API and requires broader approvals. label Feb 8, 2024
@@ -1449,6 +1494,8 @@ def remote(self, *args, **kwargs):
False, # retry_exceptions
False, # is_generator
self._ray_method_generator_backpressure_num_objects.get(item, -1),
# TODO(rickyx): how to determinte the defautls here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not too sure about the default here.

Copy link
Contributor Author

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of changes are just plumbing to add the option.

Major changes are in /core_worker for the task event buffer refactoring and changes.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. NIT questions

python/ray/actor.py Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/tasks.rst Outdated Show resolved Hide resolved
src/ray/core_worker/common.h Outdated Show resolved Hide resolved
src/ray/core_worker/task_manager.h Outdated Show resolved Hide resolved
src/ray/core_worker/task_manager.h Outdated Show resolved Hide resolved
@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 15, 2024
Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved given we choose the arguments discussed in api review

@rkooo567
Copy link
Contributor

@rickyyx let's make sure this is merged asap. It is required for 2.10. (cc @jjyao)

I think the blocker is the doc approval.

Signed-off-by: rickyyx <rickyx@anyscale.com>
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits.

doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
doc/source/ray-core/actors.rst Outdated Show resolved Hide resolved
python/ray/actor.py Outdated Show resolved Hide resolved
python/ray/actor.py Outdated Show resolved Hide resolved
python/ray/remote_function.py Outdated Show resolved Hide resolved
src/ray/common/ray_config_def.h Outdated Show resolved Hide resolved
src/ray/core_worker/task_manager.h Outdated Show resolved Hide resolved
Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx
Copy link
Contributor Author

rickyyx commented Feb 28, 2024

@rkooo567 can you take a quick look at the delta in the commits?

I made a few changes to make tests passed

  • ActorHandle has to have this enforce_task_events flag as well otherwise remote actors would not work
  • Add a global RAY_enable_task_events to enforce if task events would be reported from the entire ray cluster (rather than relying on RAY_task_events_report_interval) -> we could maybe refactor all task events related things into a config struct in the future.

rickyyx and others added 4 commits February 28, 2024 01:54
Signed-off-by: rickyyx <rickyx@anyscale.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Ricky Xu  <xuchen727@hotmail.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 28, 2024
@rkooo567
Copy link
Contributor

ActorHandle has to have this enforce_task_events flag as well otherwise remote actors would not work

You mean enable_task_events right? This makes sense

Add a global RAY_enable_task_events to enforce if task events would be reported from the entire ray cluster (rather than relying on RAY_task_events_report_interval) -> we could maybe refactor all task events related things into a config struct in the future.

Hmm I don't understand why we need this?

@rickyyx
Copy link
Contributor Author

rickyyx commented Feb 29, 2024

Hmm I don't understand why we need this?

It's just a better name than the RAY_task_events_report_interval, and it provides a way to workers get default behaviour of if task events should be reported.

@rkooo567
Copy link
Contributor

rkooo567 commented Feb 29, 2024

It's just a better name than the RAY_task_events_report_interval, and it provides a way to workers get default behaviour of if task events should be reported.

Hmm I feel like we should make it just orthogonal. It makes things confusing if we start coupling user-level config and system level config?

  • True by default.
  • False if you don't want
  • although it is True, if user sets systems to not report task, it is disabled. (just use existing flag)

Btw, it is not a hard blocker.

@rickyyx
Copy link
Contributor Author

rickyyx commented Feb 29, 2024

It's just a better name than the RAY_task_events_report_interval, and it provides a way to workers get default behaviour of if task events should be reported.

Hmm I feel like we should make it just orthogonal. It makes things confusing if we start coupling user-level config and system level config?

  • True by default.
  • False if you don't want
  • although it is True, if user sets systems to not report task, it is disabled. (just use existing flag)

Btw, it is not a hard blocker.

This is actually exactly how it's behaved in the PR, with the exception of not using existing flag (i.e. RAY_task_events_report_interval):
If users to set system config for not reporting, they would:

  • In PR: set RAY_enable_tasks_events=False
  • In master: set RAY_task_events_report_interval=0

I am also open to keeping with existing flags.

@rickyyx rickyyx added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 29, 2024
@rickyyx rickyyx removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 29, 2024
@rickyyx
Copy link
Contributor Author

rickyyx commented Feb 29, 2024

Updated with using existing flags.

@rickyyx rickyyx changed the title [core][task-events] Options to disable task tracing on task/actor [core][enable_task_events] Options to disable task tracing on task/actor Feb 29, 2024
@rickyyx rickyyx merged commit 0162f35 into ray-project:master Feb 29, 2024
9 checks passed
hebiao064 pushed a commit to hebiao064/ray that referenced this pull request Mar 12, 2024
…tor (ray-project#42431)



---------

Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: Ricky Xu  <xuchen727@hotmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-interface-change-approval-required This changes the Ray core behavior / API and requires broader approvals.
Projects
None yet
5 participants