Skip to content

[Core] Support publishing Submission job events using ray event recorder#61099

Open
sampan-s-nayak wants to merge 36 commits into
masterfrom
job_events_missing_fields_3
Open

[Core] Support publishing Submission job events using ray event recorder#61099
sampan-s-nayak wants to merge 36 commits into
masterfrom
job_events_missing_fields_3

Conversation

@sampan-s-nayak
Copy link
Copy Markdown
Contributor

Description

use the newly added python ray event exporter framework to emit submission job events and verify using an E2E test.

sampan and others added 13 commits February 3, 2026 04:36
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
…nto job_events_missing_fields_3

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
@sampan-s-nayak sampan-s-nayak requested review from a team, edoakes and jjyao as code owners February 17, 2026 07:30
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully integrates the Ray event recorder framework into the job submission system, enabling the emission of job definition and lifecycle events. The implementation follows the existing pattern for actor and node events, using a priority-based approach where the new framework takes precedence when enabled. The addition of an end-to-end test ensures the correctness of event emission and capture. I have provided a few suggestions to improve the efficiency of protobuf timestamp handling and to ensure robust initialization of the event recorder.

Comment on lines +144 to +147
now = time.time()
transition.timestamp.CopyFrom(
Timestamp(seconds=int(now), nanos=int((now % 1) * 1e9))
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of creating a new Timestamp object and using CopyFrom, you can use the FromMilliseconds method directly on the existing transition.timestamp object. This is more efficient and avoids manual nanosecond calculations which can sometimes suffer from floating-point precision issues.

Suggested change
now = time.time()
transition.timestamp.CopyFrom(
Timestamp(seconds=int(now), nanos=int((now % 1) * 1e9))
)
transition.timestamp.FromMilliseconds(int(time.time() * 1000))

Comment on lines +207 to +223
if ray_constants.RAY_ENABLE_RAY_EVENT:
try:
from ray._raylet import initialize_event_recorder

initialize_event_recorder(
aggregator_address=self._dashboard_agent.ip,
aggregator_port=self._dashboard_agent.grpc_port,
node_ip=self._dashboard_agent.ip,
node_id_hex=self._dashboard_agent.node_id,
max_buffer_size=10000,
)
logger.info("Initialized ray event recorder in JobAgent.")
except Exception:
logger.warning(
"Failed to initialize ray event recorder in JobAgent.",
exc_info=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initialize_event_recorder call is placed within the run method of JobAgent. While this works for initialization, consider if a corresponding shutdown_event_recorder call is needed when the agent or module stops, similar to the implementation in JobSupervisor. This ensures that any buffered events are flushed before the process exits.

@ray-gardener ray-gardener Bot added the community-contribution Contributed by the community label Feb 17, 2026
sampan-s-nayak and others added 7 commits February 18, 2026 10:59
Signed-off-by: sampan <sampan@anyscale.com>
…elds_3

Signed-off-by: sampan <sampan@anyscale.com>
…ss attribute

Cython cdef class types are immutable — setting EventRecorder._instance
at module scope raises TypeError at runtime. Use a module-level
_event_recorder_instance variable with global declarations instead.

Signed-off-by: sampan <sampan@anyscale.com>
…elds_3

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Comment thread python/ray/_common/observability/submission_job_events.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Comment thread python/ray/dashboard/modules/job/common.py
Comment thread python/ray/_common/observability/submission_job_events.py
@sampan-s-nayak sampan-s-nayak added go add ONLY when ready to merge, run all tests and removed community-contribution Contributed by the community labels Feb 23, 2026
@ray-gardener ray-gardener Bot added the community-contribution Contributed by the community label Feb 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 9, 2026
@sampan-s-nayak
Copy link
Copy Markdown
Contributor Author

not stale

sampan and others added 2 commits March 16, 2026 05:43
Existing users who set RAY_enable_ray_event for C++ events should not
automatically opt into the new Python event pipeline. Introduce a
separate RAY_enable_python_ray_event flag for Python-side ONE-event
publishing (job events via EventRecorder, autoscaler events via
DashboardHeadRayEventPublisher).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
@github-actions github-actions Bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Mar 16, 2026
@sampan-s-nayak sampan-s-nayak requested a review from dayshah as a code owner April 22, 2026 11:10
Base automatically changed from job_events_missing_fields_2 to master May 6, 2026 18:44
MengjinYan added a commit that referenced this pull request May 6, 2026
## Description
Adds cython bindings for c++ rayEventRecorder and sets up the
scaffolding required to emit events from python (using the new one-event
framework). This will also be used to emit library events in the future

refer: #61099 for example usage
of this abstraction

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
chillCode404 pushed a commit to chillCode404/ray-contrib that referenced this pull request May 9, 2026
## Description
Adds cython bindings for c++ rayEventRecorder and sets up the
scaffolding required to emit events from python (using the new one-event
framework). This will also be used to emit library events in the future

refer: ray-project#61099 for example usage
of this abstraction

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
am-kinetica pushed a commit to kineticadb/ray that referenced this pull request May 14, 2026
## Description
Adds cython bindings for c++ rayEventRecorder and sets up the
scaffolding required to emit events from python (using the new one-event
framework). This will also be used to emit library events in the future

refer: ray-project#61099 for example usage
of this abstraction

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: anindyam1969 <amukherjee@kinetica.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description
Adds cython bindings for c++ rayEventRecorder and sets up the
scaffolding required to emit events from python (using the new one-event
framework). This will also be used to emit library events in the future

refer: ray-project#61099 for example usage
of this abstraction

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant