[RayService][Observability] Add more loggings about networking issues #1282
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Note that
NetworkPolicy
will not have any effect if you do not have CNI plugin installed in your Kubernetes cluster. Hence, I developed this PR on AWS EKS and installed the CNI plugin Calico.Step 1: Follow this doc to install Calico in your EKS cluster.
Step 2: Create a NetworkPolicy to block all incoming traffic to the Ray head Pod.
Step 3: Install KubeRay operator with this PR.
Step 4: Create a RayService with 1 head Pod and 0 worker Pods. Since incoming traffic is not allowed to reach the Ray head Pod, the init container in the worker Pods will indefinitely wait for GCS to become ready. For this reason, we opt to create 0 worker Pods here.
Step 5: Check KubeRay operator's log
Related issue number
Closes #1279
Checks