Skip to content

[core] Enable aggregator mode in state API and task event tests#59784

Merged
edoakes merged 13 commits intomasterfrom
aggr-to-gcs-test-enablement
Jan 7, 2026
Merged

[core] Enable aggregator mode in state API and task event tests#59784
edoakes merged 13 commits intomasterfrom
aggr-to-gcs-test-enablement

Conversation

@sampan-s-nayak
Copy link
Contributor

@sampan-s-nayak sampan-s-nayak commented Dec 31, 2025

Description

run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow

sampan added 2 commits December 31, 2025 07:15
- Fix parent_task_id to use SubmitterTaskId for concurrent actors
- Add missing fields: call_site, label_selector, is_debugger_paused, actor_repr_name
- Fix func_or_class_name to use CallString() for consistency with direct path
- Add helper functions for aggregator wait in test_utils.py

Signed-off-by: sampan <sampan@anyscale.com>
- Add event_routing_config fixture for dual-mode testing
- Parametrize state_api tests to run with default and aggregator routing
- Parametrize task_events tests to run with default and aggregator routing

Signed-off-by: sampan <sampan@anyscale.com>
@sampan-s-nayak sampan-s-nayak changed the base branch from master to aggr-to-gcs-fixes December 31, 2025 07:30
@sampan-s-nayak sampan-s-nayak added the go add ONLY when ready to merge, run all tests label Dec 31, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables testing the new aggregator mode for task events by parameterizing a large number of state API and task event tests. It introduces a new pytest fixture event_routing_config to switch between the default and aggregator modes. Additionally, it enhances the aggregator event path by adding several missing fields to the event protos and the corresponding C++ implementation to achieve feature parity with the existing GCS path. The changes are well-structured, and the test additions are comprehensive.

My review has two suggestions: one for improving code conciseness in a test file and another for removing a leftover debug print statement.

Signed-off-by: sampan <sampan@anyscale.com>
@sampan-s-nayak sampan-s-nayak marked this pull request as ready for review January 2, 2026 04:28
@sampan-s-nayak sampan-s-nayak requested a review from a team as a code owner January 2, 2026 04:28
@sampan-s-nayak
Copy link
Contributor Author

this pr was originally part of #56880

@pytest.mark.parametrize(
"event_routing_config", ["default", "aggregator"], indirect=True
)
@pytest.mark.usefixtures("event_routing_config")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixture scope mismatch prevents aggregator mode testing

The TestListActors class is parametrized with event_routing_config (function-scoped fixture) but uses class_ray_instance (class-scoped fixture) to start Ray. Pytest executes higher-scoped fixtures first, so class_ray_instance starts Ray BEFORE event_routing_config sets the aggregator environment variables. This means when running with event_routing_config="aggregator", the environment variables like RAY_enable_core_worker_ray_event_to_aggregator are set after Ray has already started, and aggregator mode is never actually enabled. The tests will pass but won't actually test the aggregator code path, defeating the purpose of the parametrization.

Fix in Cursor Fix in Web

@pytest.mark.parametrize(
"event_routing_config", ["default", "aggregator"], indirect=True
)
@pytest.mark.usefixtures("event_routing_config")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing aggregator agent wait causes flaky aggregator mode tests

The test_actor_summary and test_object_summary tests have the event_routing_config parametrization for aggregator mode but don't call wait_for_aggregator_agent_if_enabled after ray.init(). In contrast, test_task_summary in the same file correctly calls this wait for all nodes. A TODO comment in other tests (e.g., test_fault_tolerance_chained_task_fail) states this wait is required until task event buffering is implemented internally. Without the wait, these tests may fail or be flaky in aggregator mode if the aggregator agent isn't ready when actors/tasks are created.

Additional Locations (1)

Fix in Cursor Fix in Web

@ray-gardener ray-gardener bot added the core Issues that should be addressed in Ray Core label Jan 2, 2026
@@ -193,6 +193,7 @@ def _get_provider(name, **kwargs):
def test_node_providers_basic(get_provider, provider_name):
# Test launching.
provider = get_provider(name=provider_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increasing timeout as the test was flaky

sampan and others added 5 commits January 5, 2026 04:32
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Signed-off-by: sampan <sampan@anyscale.com>
…nto aggr-to-gcs-test-enablement

Signed-off-by: sampan <sampan@anyscale.com>
Base automatically changed from aggr-to-gcs-fixes to master January 6, 2026 01:12
@edoakes edoakes requested review from edoakes and jjyao as code owners January 6, 2026 01:12
@edoakes edoakes merged commit fb13b81 into master Jan 7, 2026
6 checks passed
@edoakes edoakes deleted the aggr-to-gcs-test-enablement branch January 7, 2026 02:28
AYou0207 pushed a commit to AYou0207/ray that referenced this pull request Jan 13, 2026
…project#59784)

## Description
run state api and task event unit tests with both the default
(task_event -> gcs flow) and aggregator (task_event -> aggregator ->
gcs) to smoothen the transition from default to aggregator flow

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
lee1258561 pushed a commit to pinterest/ray that referenced this pull request Feb 3, 2026
…project#59784)

## Description
run state api and task event unit tests with both the default
(task_event -> gcs flow) and aggregator (task_event -> aggregator ->
gcs) to smoothen the transition from default to aggregator flow

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Feb 3, 2026
…project#59784)

## Description
run state api and task event unit tests with both the default
(task_event -> gcs flow) and aggregator (task_event -> aggregator ->
gcs) to smoothen the transition from default to aggregator flow

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants