Skip to content

[serve] Raise error when multiplex on ingress deployment used with direct ingress#64045

Open
akyang-anyscale wants to merge 13 commits into
ray-project:masterfrom
akyang-anyscale:alexyang/di-multiplex-raise
Open

[serve] Raise error when multiplex on ingress deployment used with direct ingress#64045
akyang-anyscale wants to merge 13 commits into
ray-project:masterfrom
akyang-anyscale:alexyang/di-multiplex-raise

Conversation

@akyang-anyscale

@akyang-anyscale akyang-anyscale commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Serve currently does not support model multiplexing on the ingress deployment when direct ingress is enabled (also when HAProxy is enabled). Raise an error instead of silently serving the app without proper multiplexing support.

We do this 2 ways:

  1. when the multiplexing decorator is used statically, we can detect multiplexing early and raise an error when the controller is building the DeploymentInfos
  2. when multiplexing is enabled dynamically, we check in the replica's initialization process if multiplexing is used, and will raise an error if so.

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces validation to reject model multiplexing on the ingress deployment when direct ingress is enabled. It adds static detection of @serve.multiplexed decorators and implements the validation check at application build time, accompanied by unit and integration tests. The review feedback points out that the validation check should be updated to also verify if HAProxy is enabled (RAY_SERVE_ENABLE_HA_PROXY), ensuring consistency with the raised error message and preventing unsupported deployments under HAProxy.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/ray/serve/_private/build_app.py Outdated
@akyang-anyscale akyang-anyscale added the go add ONLY when ready to merge, run all tests label Jun 11, 2026
@akyang-anyscale akyang-anyscale marked this pull request as ready for review June 11, 2026 22:59
@akyang-anyscale akyang-anyscale requested a review from a team as a code owner June 11, 2026 22:59
Comment thread python/ray/serve/_private/build_app.py Outdated
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 0c1a3a9. Configure here.

Comment thread python/ray/serve/_private/application_state.py Outdated
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Jun 12, 2026
built_app.validate_single_fastapi_ingress()
# This task runs on the cluster, so its view of the direct-ingress flag
# mirrors the replicas' (they inherit this task's runtime_env).
built_app.validate_multiplexing_with_direct_ingress(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we added a uses_multiplexing bit to the deploy args proto? Then the controller can validate in deploy_applications just once instead of duplicating the check

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eicherseiji I'm pretty sure deploy_applications isn't covered in the declarative path, but I could add the check when creating the deployment info, which happens in both cases.

@abrarsheikh this method would not catch that. I think the only way to do that would be at replica initialization time, wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akyang-anyscale Ah DeploymentInfo makes sense then

Also I would support a check at replica initialization as well for correctness in the dynamic multiplexing case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Comment on lines +1640 to +1642
# Imported lazily to avoid a circular import at module load time
# (multiplex -> metrics -> context -> client -> application_state).
from ray.serve.multiplex import _callable_uses_multiplexing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move _callable_uses_multiplexing into utility file to break the cir dep

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants