[RayService] Fixed issue where the custom serve port is not reflected in the serve health check for worker Pods #1816
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
As title.
ray-service.different-port.yaml
is used to demonstrate how to set custom serve port. However, the instruction is not complete.when running
ray-service.different-port.yaml
, it may appear that everything is functioning correctly. However, the custom serve port is not accurately reflected in the serve health check for worker Pods. The absence of errors is due to the serve app only requiring a minimal amount of CPU resources. Consequently, the worker Pod might not even host a serve replica.To illustrate this issue, I have created a Ray service similar to ray-service.different-port.yaml, but with a modified serve app configuration ensuring each Pod has one serve replica. As shown in the experiment below, the worker Pod consistently fails the health check because it defaults to using the serve port value of 8000.
This issue occurs because Kuberay determines the serve port by examining the ray container. If the serve port is not set in the worker configuration, it defaults to port 8000.
btw, I am not sure if it is a good idea to change every Rayservice samples to enforce every Ray Pod at least has one replica. This kind of error may occur in future.
Related issue number
Checks