[serve] hap grpc#63247
Conversation
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request implements gRPC support for the HAProxy ingress in Ray Serve. Key changes include adding a protocol field to backend configurations, implementing TCP-level health checks for gRPC, and updating HAProxy templates to handle HTTP/2 and header-based routing via the application metadata. Feedback focuses on refactoring duplicated health check logic, ensuring gRPC backends are correctly represented in the /-/routes and /-/healthz management endpoints, and fixing a potential infinite loop in the serving() method when scale-to-zero applications are present. Additionally, the reviewer suggested a more robust selection for the default gRPC backend.
I am having trouble creating individual review comments. Click here to see my feedback.
python/ray/serve/_private/haproxy.py (481-527)
The logic for resolving health check parameters (fall, rise, inter, etc.) and constructing the default-server directive is duplicated from build_health_check_config. This should be refactored into a shared helper method to improve maintainability and reduce code duplication.
python/ray/serve/_private/haproxy.py (940)
The health_route_info is currently built using only http_backends. This means the /-/routes endpoint will not include gRPC applications, even though they have valid route prefixes. It is better to use the full backends list to provide a complete view of all applications.
health_route_info = self.cfg.build_health_route_info(backends)
python/ray/serve/_private/haproxy.py (947)
The healthz_rules are rendered using only http_backends. In a deployment with only gRPC applications, http_backends will be empty, causing the /-/healthz endpoint to not render any health check rules and potentially return a 404 or 503 error. This would lead external load balancers to incorrectly mark the node as unhealthy. Passing the full backends list ensures that gRPC backends are also considered in the global health check.
"backends": backends,
python/ray/serve/_private/haproxy.py (1257-1260)
Removing the filter for gRPC backends in serving() introduces a potential hang for applications configured with min_replicas=0 (scale-to-zero). The current logic requires every backend in stats (which includes all configured backends) to have at least one UP server. For scale-to-zero apps, no replicas will be started until a request is received, meaning ready_backends will never match all_backends, causing this loop to run indefinitely. The logic should be updated to only wait for backends that are expected to have at least one replica.
python/ray/serve/_private/haproxy_templates.py (201)
Using grpc_backends[0] as the default_backend for gRPC Healthz and ListApplications requests is fragile. If the first gRPC application in the list is unhealthy or has no replicas, these global management requests will fail even if other gRPC applications are healthy. Consider using a dedicated backend that aggregates all gRPC replicas or implementing a more robust fallback mechanism. Additionally, the or 'unknown' check is redundant as the name field is mandatory in BackendConfig.
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 137c94c. Configure here.

support for haproxy grpc
known limitations:
OKon healthy andUNAVAILABLEotherwise).