[core] Enable aggregator mode in state API and task event tests#59784
[core] Enable aggregator mode in state API and task event tests#59784
Conversation
- Fix parent_task_id to use SubmitterTaskId for concurrent actors - Add missing fields: call_site, label_selector, is_debugger_paused, actor_repr_name - Fix func_or_class_name to use CallString() for consistency with direct path - Add helper functions for aggregator wait in test_utils.py Signed-off-by: sampan <sampan@anyscale.com>
- Add event_routing_config fixture for dual-mode testing - Parametrize state_api tests to run with default and aggregator routing - Parametrize task_events tests to run with default and aggregator routing Signed-off-by: sampan <sampan@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request enables testing the new aggregator mode for task events by parameterizing a large number of state API and task event tests. It introduces a new pytest fixture event_routing_config to switch between the default and aggregator modes. Additionally, it enhances the aggregator event path by adding several missing fields to the event protos and the corresponding C++ implementation to achieve feature parity with the existing GCS path. The changes are well-structured, and the test additions are comprehensive.
My review has two suggestions: one for improving code conciseness in a test file and another for removing a leftover debug print statement.
Signed-off-by: sampan <sampan@anyscale.com>
|
this pr was originally part of #56880 |
| @pytest.mark.parametrize( | ||
| "event_routing_config", ["default", "aggregator"], indirect=True | ||
| ) | ||
| @pytest.mark.usefixtures("event_routing_config") |
There was a problem hiding this comment.
Fixture scope mismatch prevents aggregator mode testing
The TestListActors class is parametrized with event_routing_config (function-scoped fixture) but uses class_ray_instance (class-scoped fixture) to start Ray. Pytest executes higher-scoped fixtures first, so class_ray_instance starts Ray BEFORE event_routing_config sets the aggregator environment variables. This means when running with event_routing_config="aggregator", the environment variables like RAY_enable_core_worker_ray_event_to_aggregator are set after Ray has already started, and aggregator mode is never actually enabled. The tests will pass but won't actually test the aggregator code path, defeating the purpose of the parametrization.
| @pytest.mark.parametrize( | ||
| "event_routing_config", ["default", "aggregator"], indirect=True | ||
| ) | ||
| @pytest.mark.usefixtures("event_routing_config") |
There was a problem hiding this comment.
Missing aggregator agent wait causes flaky aggregator mode tests
The test_actor_summary and test_object_summary tests have the event_routing_config parametrization for aggregator mode but don't call wait_for_aggregator_agent_if_enabled after ray.init(). In contrast, test_task_summary in the same file correctly calls this wait for all nodes. A TODO comment in other tests (e.g., test_fault_tolerance_chained_task_fail) states this wait is required until task event buffering is implemented internally. Without the wait, these tests may fail or be flaky in aggregator mode if the aggregator agent isn't ready when actors/tasks are created.
Additional Locations (1)
| @@ -193,6 +193,7 @@ def _get_provider(name, **kwargs): | |||
| def test_node_providers_basic(get_provider, provider_name): | |||
| # Test launching. | |||
| provider = get_provider(name=provider_name) | |||
There was a problem hiding this comment.
increasing timeout as the test was flaky
Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Signed-off-by: sampan <sampan@anyscale.com>
…nto aggr-to-gcs-test-enablement Signed-off-by: sampan <sampan@anyscale.com>
…project#59784) ## Description run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com> Co-authored-by: sampan <sampan@anyscale.com> Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
…project#59784) ## Description run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com> Co-authored-by: sampan <sampan@anyscale.com>
…project#59784) ## Description run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com> Co-authored-by: sampan <sampan@anyscale.com>
Description
run state api and task event unit tests with both the default (task_event -> gcs flow) and aggregator (task_event -> aggregator -> gcs) to smoothen the transition from default to aggregator flow