Pixie Vizier NATS healthcheck failed with KubeDNS

**Describe the bug**
While installing Pixie, Vizier can fail its healthcheck even when the `pl-nats` pod is running and the NATS monitor endpoint is healthy.
The failure appears to happen when Pixie checks the NATS monitor endpoint using the generated pod DNS name:
`<pod-ip-with-dashes>.<namespace>.pod.cluster.local:8222`
In our cluster, this DNS name was not resolvable from the component performing the healthcheck. As a result, Pixie marked `pl-nats-0` pod as unhealthy and vizor operator deleting the pod and installation did not complete successfully.
We temporarily worked around this by adding `hostAliases` so the generated pod DNS name resolves to the NATS pod IP. However, this is fragile because pod IPs can change and the workaround has to be maintained outside Pixie.

**Observed Behavior**
- Pixie install waits for Vizier healthcheck.
- `pl-nats` pod is running.
- NATS monitor endpoint on port `8222` is reachable by pod IP.
- The healthcheck fails because the generated pod DNS name cannot be resolved.
- Adding `hostAliases` for the generated pod DNS name allows the healthcheck to pass.
Example failing endpoint format:

```text
http://<pod-ip-with-dashes>.<namespace>.pod.cluster.local:8222
```

**Expected behavior**
Pixie’s NATS healthcheck should succeed when the NATS monitor endpoint is reachable, even if pod DNS names such as *.pod.cluster.local are not resolvable in the cluster.

For the NATS monitor endpoint specifically, Pixie should avoid relying on pod DNS resolution where TLS hostname validation is not required.


**App information (please complete the following information):**
- Pixie version: release/cloud/v0.1.9
- K8s cluster version: v1.33.9-gke.1060000
- Node Kernel version: 6.6.122+

**Additional context**
The GKE cluster is running Kube-DNS(schema "1.0.0") with Node-local-cache enabled.


**Recommendation**
For StatefullSet/Headless service can we make use of `Hostname` and `Subdomain` for pods status?

Existing references:
https://github.com/pixie-io/pixie/issues/1544
https://github.com/pixie-io/pixie/issues/1581

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pixie Vizier NATS healthcheck failed with KubeDNS #2366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pixie Vizier NATS healthcheck failed with KubeDNS #2366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions