Skip to content

Ingress-gateway readiness probe failed: connection refused #796

@prateeksahu

Description

@prateeksahu

Hi, I am trying to install the latest Istio version of 1.15.0 on my local kubernetes cluster and I am facing readiness probe failed error for both istio-ingressgateway pods.

Kubernetes version:

> kubectl version --short
Client Version: v1.17.0
Server Version: v1.23.12

I am using 1 control-plane and 1 worker-node for my cluster. Kubernetes was installed using kubeadm on a custom podCIDR and I am using flannel CNI for my setup. I have disabled firewalld on both systems.
Both systems are running Linux Kernel 5.10+ (Control on 5.10 and worker is on 5.15).
After cluster bring-up, all components are healthy and running

kube-flannel   kube-flannel-ds-2dfzl                   1/1     Running   0          3h11m   XX.XX.90.238   worker      <none>           <none>
kube-flannel   kube-flannel-ds-7smvs                   1/1     Running   0          3h44m   XX.XX.90.23    control     <none>           <none>
kube-system    coredns-64897985d-drqs8                 1/1     Running   0          174m    10.244.0.6      control     <none>           <none>
kube-system    coredns-64897985d-pg2wh                 1/1     Running   0          174m    10.244.1.2      worker      <none>           <none>
kube-system    etcd-ubuntu2004                         1/1     Running   16         3h45m   XX.XX.90.23    control     <none>           <none>
kube-system    kube-apiserver-ubuntu2004               1/1     Running   2          3h45m   XX.XX.90.23    control     <none>           <none>
kube-system    kube-controller-manager-ubuntu2004      1/1     Running   0          3h44m   XX.XX.90.23    control     <none>           <none>
kube-system    kube-proxy-5jrdw                        1/1     Running   0          3h44m   XX.XX.90.23    control     <none>           <none>
kube-system    kube-proxy-7wt8r                        1/1     Running   0          3h11m   XX.XX.90.238   worker      <none>           <none>
kube-system    kube-scheduler-ubuntu2004               1/1     Running   2          3h45m   XX.XX.90.23    control     <none>           <none>

Upon installing istio using istioctl, the istiod, gateway components are in "running" mode but show warnings/errors:

istio-system   istio-ingressgateway-585d645855-6mnh4   0/1     Running   0          8m58s   10.244.1.7      worker      <none>           <none>
istio-system   istiod-86cd5997bb-tqjcb                 1/1     Running   0          9m5s    10.244.1.6      worker      <none>           <none>

Ingressgateway pods show readiness probe failed warnings:
istiod:

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  15m                 default-scheduler  Successfully assigned istio-system/istio-ingressgateway-585d645855-6mnh4 to worker
  Normal   Pulling    15m                 kubelet, worker   Pulling image "docker.io/istio/proxyv2:1.15.0"
  Normal   Pulled     15m                 kubelet, worker   Successfully pulled image "docker.io/istio/proxyv2:1.15.0" in 2.83053387s
  Normal   Created    15m                 kubelet, worker   Created container istio-proxy
  Normal   Started    15m                 kubelet, worker   Started container istio-proxy
  Warning  Unhealthy  9s (x461 over 15m)  kubelet, worker   Readiness probe failed: Get "http://10.244.1.7:15021/healthz/ready": dial tcp 10.244.1.7:15021: connect: connection refused

Logs show that the certificate initialization fails. I have not specified any certificate on my own. I have previously ran various instances of Istio without issues and I am not sure if its picking up on some earlier certificates that were installed by istio's previous versions.

2022-09-21T22:56:09.532113Z	info	Version 1.15.0-e3364ab424b70ca8ee1ca76cb0b3afb73476aaac-Clean
2022-09-21T22:56:09.534210Z	info	Maximum file descriptors (ulimit -n): 1048576
2022-09-21T22:56:09.534377Z	info	Proxy role	ips=[10.244.1.7] type=router id=istio-ingressgateway-585d645855-6mnh4.istio-system domain=istio-system.svc.cluster.local
2022-09-21T22:56:09.534470Z	info	Apply mesh config from file defaultConfig:
  discoveryAddress: istiod.istio-system.svc:15012
  proxyMetadata: {}
  tracing:
    zipkin:
      address: zipkin.istio-system:9411
enablePrometheusMerge: true
rootNamespace: istio-system
trustDomain: cluster.local
2022-09-21T22:56:09.537836Z	info	Effective config: binaryPath: /usr/local/bin/envoy
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: istio-proxy
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2022-09-21T22:56:09.537856Z	info	JWT policy is third-party-jwt
2022-09-21T22:56:09.537861Z	info	using credential fetcher of JWT type in cluster.local trust domain
2022-09-21T22:56:11.540487Z	info	Workload SDS socket not found. Starting Istio SDS Server
2022-09-21T22:56:11.540600Z	info	CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2022-09-21T22:56:11.540573Z	info	Opening status port 15020
2022-09-21T22:56:11.540692Z	info	Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2022-09-21T22:56:11.540975Z	info	citadelclient	Citadel client using custom root cert: var/run/secrets/istio/root-cert.pem
2022-09-21T22:56:11.553653Z	info	ads	All caches have been synced up in 2.024814382s, marking server ready
2022-09-21T22:56:11.553907Z	info	xdsproxy	Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes"
2022-09-21T22:56:11.553936Z	info	sds	Starting SDS grpc server
2022-09-21T22:56:11.554169Z	info	starting Http service at 127.0.0.1:15004
2022-09-21T22:56:11.555272Z	info	Pilot SAN: [istiod.istio-system.svc]
2022-09-21T22:56:11.556634Z	info	Starting proxy agent
2022-09-21T22:56:11.556666Z	info	starting
2022-09-21T22:56:11.556689Z	info	Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --parent-shutdown-time-s 60 --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --log-format %Y-%m-%dT%T.%fZ	%l	envoy %n	%v -l warning --component-log-level misc:error]
2022-09-21T22:56:31.543660Z	warn	ca	ca request failed, starting attempt 1 in 93.165481ms
2022-09-21T22:56:31.637845Z	warn	ca	ca request failed, starting attempt 2 in 214.163056ms
2022-09-21T22:56:31.853307Z	warn	ca	ca request failed, starting attempt 3 in 434.912904ms
2022-09-21T22:56:32.289257Z	warn	ca	ca request failed, starting attempt 4 in 828.413598ms
2022-09-21T22:56:51.562040Z	warn	sds	failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.96.0.10:53: read udp 10.244.1.7:37443->10.96.0.10:53: i/o timeout"
2022-09-21T22:56:52.141385Z	warn	ca	ca request failed, starting attempt 1 in 103.170203ms
2022-09-21T22:56:52.244752Z	warn	ca	ca request failed, starting attempt 2 in 212.437031ms
2022-09-21T22:56:52.457541Z	warn	ca	ca request failed, starting attempt 3 in 390.361727ms
2022-09-21T22:56:52.848428Z	warn	ca	ca request failed, starting attempt 4 in 844.130825ms
2022-09-21T22:57:12.009358Z	warning	envoy config	StreamAggregatedResources gRPC config stream closed since 40s ago: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc: i/o timeout"

Any help in figuring out what's going on will be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions