Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][enable_task_events] Options to disable task tracing on task/actor #42431

Merged
merged 20 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 17 additions & 0 deletions doc/source/ray-core/actors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,23 @@ define can run. This also implies that tasks are scheduled more flexibly,
and that if you don't need the stateful part of an actor, you're mostly
better off using tasks.

Task Events
-----------

By default, Ray will trace the execution of actor tasks, reporting task status events and profiling events
rickyyx marked this conversation as resolved.
Show resolved Hide resolved
that's used by Ray Dashboard and :ref:`State API <state-api-overview-ref>`.
rickyyx marked this conversation as resolved.
Show resolved Hide resolved

You could disable task events for the actor by setting the `enable_task_events` option to `False` in :func:`ray.remote() <ray.remote>` and :meth:`.options() <ray.actor.ActorClass.options>`, which would reduce the overhead of task execution, and also reduce the amount of data sent to the Ray Dashboard
rickyyx marked this conversation as resolved.
Show resolved Hide resolved

You could also disable task events for some actor methods by setting the `enable_task_events` option to `False` in :func:`ray.remote() <ray.remote>` and :meth:`.options() <ray.remote_function.RemoteFunction.options>` on the actor method.
rickyyx marked this conversation as resolved.
Show resolved Hide resolved
The per method setting will override the actor setting:
rickyyx marked this conversation as resolved.
Show resolved Hide resolved

.. literalinclude:: doc_code/actors.py
:language: python
:start-after: __enable_task_events_start__
:end-before: __enable_task_events_end__


More about Ray Actors
---------------------

Expand Down
17 changes: 17 additions & 0 deletions doc/source/ray-core/doc_code/actors.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,20 @@ async def f(self):
except ray.exceptions.RayTaskError:
print("Object reference was cancelled.")
# __cancel_end__


# __enable_task_events_start__
@ray.remote
class FooActor:

# Disable task events reporting for this method.
@ray.method(enable_task_events=False)
def foo(self):
pass


foo_actor = FooActor.remote()
ray.get(foo_actor.foo.remote())


# __enable_task_events_end__
12 changes: 12 additions & 0 deletions doc/source/ray-core/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,18 @@ You can change this behavior by setting
in :func:`ray.remote() <ray.remote>` and :meth:`.options() <ray.remote_function.RemoteFunction.options>`.
See :ref:`Ray fault tolerance <fault-tolerance>` for more details.

Task Events
-----------


By default, Ray will trace the execution of tasks, reporting task status events and profiling events
rickyyx marked this conversation as resolved.
Show resolved Hide resolved
that's used by Ray Dashboard and :ref:`State API <state-api-overview-ref>`.
rickyyx marked this conversation as resolved.
Show resolved Hide resolved

You could change this behavior by setting ``enable_task_events`` options in :func:`ray.remote() <ray.remote>` and :meth:`.options() <ray.remote_function.RemoteFunction.options>`
rickyyx marked this conversation as resolved.
Show resolved Hide resolved
to disable task events, which would not only reduce the overhead of task execution, but also reduce the amount of data sent to the Ray Dashboard.
rickyyx marked this conversation as resolved.
Show resolved Hide resolved
Nested tasks don't inherit the task events settings from the parent task, you need to set the task events settings for each task separately.
rickyyx marked this conversation as resolved.
Show resolved Hide resolved



More about Ray Tasks
--------------------
Expand Down
1 change: 1 addition & 0 deletions python/ray/_private/ray_option_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ def _validate_resources(resources: Optional[Dict[str, float]]) -> Optional[str]:
)
),
"_metadata": Option((dict, type(None))),
"enable_task_events": Option(bool),
}


Expand Down
5 changes: 2 additions & 3 deletions python/ray/_private/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ def chrome_tracing_dump(self, filename=None):
logger.warning(
"No profiling events found. Ray profiling must be enabled "
"by setting RAY_PROFILING=1, and make sure "
"RAY_task_events_report_interval_ms is a positive value (default 1000)."
"RAY_enable_task_events=True."
)

if filename is not None:
Expand Down Expand Up @@ -918,8 +918,7 @@ def timeline(filename=None):
"""Return a list of profiling events that can viewed as a timeline.

Ray profiling must be enabled by setting the RAY_PROFILING=1 environment
variable prior to starting Ray, and RAY_task_events_report_interval_ms set
to be positive (default 1000)
variable prior to starting Ray, and RAY_enable_task_events set to True.

To view this information as a timeline, simply dump it as a json file by
passing in "filename" or using using json.dump, and then load go to
Expand Down
23 changes: 18 additions & 5 deletions python/ray/_raylet.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -3822,7 +3822,8 @@ cdef class CoreWorker:
scheduling_strategy,
c_string debugger_breakpoint,
c_string serialized_runtime_env_info,
int64_t generator_backpressure_num_objects
int64_t generator_backpressure_num_objects,
c_bool enable_task_events
):
cdef:
unordered_map[c_string, double] c_resources
Expand Down Expand Up @@ -3855,7 +3856,8 @@ cdef class CoreWorker:
name, num_returns, c_resources,
b"",
generator_backpressure_num_objects,
serialized_runtime_env_info)
serialized_runtime_env_info,
enable_task_events)

current_c_task_id = current_task.native()

Expand Down Expand Up @@ -3900,6 +3902,7 @@ cdef class CoreWorker:
concurrency_groups_dict,
int32_t max_pending_calls,
scheduling_strategy,
c_bool enable_task_events,
):
cdef:
CRayFunction ray_function
Expand Down Expand Up @@ -3946,7 +3949,8 @@ cdef class CoreWorker:
# execute out of order for
# async or threaded actors.
is_asyncio or max_concurrency > 1,
max_pending_calls),
max_pending_calls,
enable_task_events),
extension_data,
&c_actor_id)

Expand Down Expand Up @@ -4037,7 +4041,8 @@ cdef class CoreWorker:
retry_exception_allowlist,
double num_method_cpus,
c_string concurrency_group_name,
int64_t generator_backpressure_num_objects):
int64_t generator_backpressure_num_objects,
c_bool enable_task_events):

cdef:
CActorID c_actor_id = actor_id.native()
Expand All @@ -4049,6 +4054,7 @@ cdef class CoreWorker:
CTaskID current_c_task_id = CTaskID.Nil()
TaskID current_task = self.get_current_task_id()
c_string serialized_retry_exception_allowlist
c_string serialized_runtime_env = b"{}"

serialized_retry_exception_allowlist = serialize_retry_exception_allowlist(
retry_exception_allowlist,
Expand All @@ -4075,7 +4081,9 @@ cdef class CoreWorker:
num_returns,
c_resources,
concurrency_group_name,
generator_backpressure_num_objects),
generator_backpressure_num_objects,
serialized_runtime_env,
enable_task_events),
max_retries,
retry_exceptions,
serialized_retry_exception_allowlist,
Expand Down Expand Up @@ -4182,6 +4190,7 @@ cdef class CoreWorker:
actor_creation_function_descriptor = CFunctionDescriptorToPython(
dereference(c_actor_handle).ActorCreationTaskFunctionDescriptor())
max_task_retries = dereference(c_actor_handle).MaxTaskRetries()
enable_task_events = dereference(c_actor_handle).EnableTaskEvents()
if language == Language.PYTHON:
assert isinstance(actor_creation_function_descriptor,
PythonFunctionDescriptor)
Expand All @@ -4196,26 +4205,30 @@ cdef class CoreWorker:
method_meta = ray.actor._ActorClassMethodMetadata.create(
actor_class, actor_creation_function_descriptor)
return ray.actor.ActorHandle(language, actor_id, max_task_retries,
enable_task_events,
method_meta.method_is_generator,
method_meta.decorators,
method_meta.signatures,
method_meta.num_returns,
method_meta.max_task_retries,
method_meta.retry_exceptions,
method_meta.generator_backpressure_num_objects, # noqa
method_meta.enable_task_events,
actor_method_cpu,
actor_creation_function_descriptor,
worker.current_session_and_job)
else:
return ray.actor.ActorHandle(language, actor_id,
0, # max_task_retries,
True, # enable_task_events
{}, # method is_generator
{}, # method decorators
{}, # method signatures
{}, # method num_returns
{}, # method max_task_retries
{}, # method retry_exceptions
{}, # generator_backpressure_num_objects
{}, # enable_task_events
0, # actor method cpu
actor_creation_function_descriptor,
worker.current_session_and_job)
Expand Down