Skip to content

[CI][Serve] Add dedicated Buildkite target for HAProxy tests#60914

Merged
abrarsheikh merged 18 commits intoray-project:masterfrom
eicherseiji:serve-haproxy-port
Feb 13, 2026
Merged

[CI][Serve] Add dedicated Buildkite target for HAProxy tests#60914
abrarsheikh merged 18 commits intoray-project:masterfrom
eicherseiji:serve-haproxy-port

Conversation

@eicherseiji
Copy link
Copy Markdown
Contributor

@eicherseiji eicherseiji commented Feb 10, 2026

Summary

  • Exclude haproxy-tagged tests from 5 general serve test steps to avoid running them without HAProxy enabled
  • Add a dedicated HAProxy CI step that runs 40 test targets (haproxy-specific + standard serve tests) with RAY_SERVE_ENABLE_HA_PROXY=1
  • Test targets listed in ci/ray_ci/serve_hap_test_names.txt, following the same pattern as serve_di_test_names.txt

Test plan

  • HAProxy-specific tests pass: test_haproxy, test_haproxy_api, test_metrics_haproxy, test_controller_haproxy
  • Standard serve tests pass with RAY_SERVE_ENABLE_HA_PROXY=1 (40 targets in serve_hap_test_names.txt)
  • Existing serve CI steps unaffected (haproxy tag excluded)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Feb 10, 2026
Exclude haproxy-tagged tests from 5 general serve test steps and add a
dedicated HAProxy step that runs 40 test targets (haproxy-specific +
standard serve tests) with RAY_SERVE_ENABLE_HA_PROXY=1, mirroring the
rayturbo CI configuration.

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji reopened this Feb 10, 2026
@eicherseiji eicherseiji changed the title [Serve] Add HAProxy support to OSS Ray Serve [CI][Serve] Add dedicated Buildkite target for HAProxy tests Feb 10, 2026
--test-env=RAY_SERVE_ENABLE_HA_PROXY=1
--test-env=RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S=0.01
--test-env=RAY_SERVE_DISABLE_SHUTTING_DOWN_INGRESS_REPLICAS_FORCEFULLY=0
--test-env=SERVE_SOCKET_REUSE_PORT_ENABLED=1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am surprised that this SERVE_SOCKET_REUSE_PORT_ENABLED=1, adding this to the list of things to fix later

eicherseiji and others added 16 commits February 10, 2026 11:04
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
- Add app_name="" to TargetGroup in _get_proxy_target_groups() to match
  rayturbo (commit 85adcc0), fixing 3 parametrized variants of
  test_get_serve_instance_details_json_serializable
- Increase wait_for_condition timeout to 30s in test_num_replicas_auto_basic
  to match rayturbo, fixing timeout under HAProxy CI

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Revert proactive app_name="" addition to TargetGroup constructors in
controller.py — Pydantic exclude_unset=True already handles serialization
correctly without explicit empty string.

Add HAProxy process cleanup to the serve_ha fixture in test_gcs_failure.py.
When GCS is killed mid-test, the HAProxy manager actor dies without cleaning
up its subprocess. Orphaned HAProxy processes hold the port 8000 socket and
serve stale configs, causing 404s in subsequent tests. This matches the
cleanup pattern already used in test_haproxy.py and test_haproxy_api.py.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
PR ray-project#60823 throttled the serve_deployment_replica_healthy gauge with a
10s cache TTL and bumped the timeout in test_metrics.py to 40s, but
missed applying the same fix to the HAProxy variant. The default 10s
wait_for_condition timeout is too short now that the gauge is cached.

Increase the timeout to 40s to match test_metrics.py.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Remove the pkill haproxy cleanup from serve_ha fixture — the orphaned
process issue needs more investigation before landing a fix.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
- controller.py: Add app_name to TargetGroup in _get_proxy_target_groups
- test_direct_ingress.py: Revert if False guards to match upstream
- test_deploy_2.py: Increase wait_for_condition timeout to 30s
- test_deploy.py: Fix typo
- serve_hap_test_names.txt: Align with upstream test list

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…e-haproxy-port

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Matches upstream HAProxy CI config. Without this, direct ingress
mode interferes with HAProxy routing during GCS failure tests.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Remove OSS-only branch that returned empty target groups when
HAProxy was enabled and no apps were visible. This caused HAProxy
to clear routes when GCS died and get_route_prefix returned None.
Match upstream behavior of returning proxy_target_groups.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
This reverts commit dc99494.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the file is identical to parity implementation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical

)

if user_exception is not None:
if user_exception is not None and not request_metadata.is_direct_ingress:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function _handle_errors_and_metrics identical to parity implementation

commands:
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/serve/... //python/ray/tests/... serve
--except-tags post_wheel_build,gpu,ha_integration,serve_tracing,direct_ingress
--except-tags post_wheel_build,gpu,ha_integration,serve_tracing,direct_ingress,haproxy
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclusion pattern also used in parity

--python-version 3.10
depends_on: servetracingbuild

- label: ":ray-serve: serve: HAProxy tests"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism, command, env vars identical to parity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contents identical, minus the expected number

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new test files in this PR, the contents of the test lists in this file are identical to the parity implementation.

@eicherseiji eicherseiji marked this pull request as ready for review February 13, 2026 06:17
@eicherseiji eicherseiji requested review from a team as code owners February 13, 2026 06:17
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

shutdown_ref = self._actor_handle.shutdown.remote()
ray.get(shutdown_ref, timeout=5)

# Shutdown completed successfully, now kill the actor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled exception in kill() skips ray.kill() call

High Severity

The ray.get(shutdown_ref, timeout=5) call in kill() can raise GetTimeoutError (or RayActorError if the actor dies mid-shutdown), which would prevent the subsequent ray.kill() from ever executing. This leaves the proxy actor potentially alive when it was supposed to be force-killed. The exception also propagates up to ProxyStateManager.shutdown() and _stop_proxies_if_needed(), which iterate over proxies in a loop — an unhandled exception from one proxy's kill() would break the shutdown of all remaining proxies.

Fix in Cursor Fix in Web

//python/ray/serve/tests:test_replica_sync_methods_with_run_sync_in_threadpool
//python/ray/serve/tests:test_request_timeout
//python/ray/serve/tests:test_streaming_response
//python/ray/serve/tests:test_target_capacity
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_controller_haproxy missing from HAProxy CI test list

Medium Severity

The test_controller_haproxy unit test (//python/ray/serve/tests/unit:test_controller_haproxy) has the haproxy Bazel tag and is now excluded from all general serve CI steps via --except-tags haproxy. However, it's not listed in serve_hap_test_names.txt, so the new HAProxy CI step won't run it either. This test has zero CI coverage. The PR test plan explicitly lists test_controller_haproxy as a test that should pass.

Fix in Cursor Fix in Web

aslonnie pushed a commit that referenced this pull request Feb 13, 2026
Needed for #60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji removed the request for review from a team February 13, 2026 20:38
@abrarsheikh abrarsheikh merged commit 1bbaf33 into ray-project:master Feb 13, 2026
6 checks passed
abrarsheikh pushed a commit that referenced this pull request Feb 15, 2026
…#60953)

##  Why are these changes needed?                              
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- #60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- #60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 15, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 15, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 17, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 17, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 17, 2026
…ray-project#60953)

##  Why are these changes needed?                              
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 17, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 17, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Feb 17, 2026
…ray-project#60953)

##  Why are these changes needed?                              
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ray-project#60953)

##  Why are these changes needed?
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ray-project#60953)

##  Why are these changes needed?                              
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ray-project#60953)

##  Why are these changes needed?
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ray-project#60953)

##  Why are these changes needed?
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
preneond pushed a commit to preneond/ray that referenced this pull request Mar 23, 2026
Needed for ray-project#60914

Add test names file for HAProxy tests and update `CODEOWNERS`
accordingly

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Mar 23, 2026
…ject#60914)

## Summary
- Exclude `haproxy`-tagged tests from 5 general serve test steps to
avoid running them without HAProxy enabled
- Add a dedicated HAProxy CI step that runs 40 test targets
(haproxy-specific + standard serve tests) with
`RAY_SERVE_ENABLE_HA_PROXY=1`
- Test targets listed in `ci/ray_ci/serve_hap_test_names.txt`, following
the same pattern as `serve_di_test_names.txt`

## Test plan
- [x] HAProxy-specific tests pass: `test_haproxy`, `test_haproxy_api`,
`test_metrics_haproxy`, `test_controller_haproxy`
- [x] Standard serve tests pass with `RAY_SERVE_ENABLE_HA_PROXY=1` (40
targets in `serve_hap_test_names.txt`)
- [x] Existing serve CI steps unaffected (haproxy tag excluded)

---------

Signed-off-by: Gene Su <gene@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
preneond pushed a commit to preneond/ray that referenced this pull request Mar 23, 2026
…ray-project#60953)

##  Why are these changes needed?                              
`test_metrics_haproxy.py::test_replica_metrics_fields` is failing in
postmerge.

- ray-project#60823 added a 10s cache on the health gauge, with
`RAY_SERVE_REPLICA_HEALTH_GAUGE_REPORT_INTERVAL_S=0.1` in the metrics
BUILD target to keep tests passing
- ray-project#60914 added a HAProxy BUILD target that re-runs serve tests with
`RAY_SERVE_ENABLE_HA_PROXY=1`, but didn't carry over that env var
- Without it, the health gauge goes stale between scrapes and the test
misses one deployment's metric

 Fix: add the missing env var to the HAProxy target.

Example failure:
https://buildkite.com/ray-project/postmerge/builds/15966#019c49dc-52b4-40dd-a67e-9f5ea5c61755

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants