-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Troubleshooting Istio
If your pods are failing to start, look into the MutatingAdmissionWebhook
istio-sidecar-injector
. When a pod is created, the Kubernetes api-server will call the sidecar injector service (Istiod). Errors during injection, or failure to connect to the service, can result in pods not being created.
These errors may look something like failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: context deadline exceeded
.
The replica set will generally contain any error messages. Gather this information with kubectl describe replicaset REPLICA_SET > replicaset.txt
.
To get logs from Istiod, run: kubectl logs -n istio-system -l app=istiod --tail=100000000 > istiod.log
.
To get the injection template: kubectl -n istio-system get configmap istio-sidecar-injector -o jsonpath={.data.config} > template.yaml
Injection works by the API server connecting to the webhook deployment (Istiod). This may cause issues if there are connectivity issues, such as firewalls, blocking this call. Depending on the Kubernetes configuration, this may required a firewall rule on port 443 or port 15017; instructions for doing so on GKE can be found here.
In order to check if the API server can access the pod, we can send a request proxied through the api server:
An example of a request that succeeds (no body found
is returned from the service and indicates we do have connectivity):
$ kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4
I0618 07:39:46.663871 36880 helpers.go:216] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server rejected our request for an unknown reason",
"reason": "BadRequest",
"details": {
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "no body found"
}
]
},
"code": 400
}]
F0618 07:39:46.663940 36880 helpers.go:115] Error from server (BadRequest): the server rejected our request for an unknown reason
Similarly, we can send a request from another pod:
$ curl https://istiod.istio-system:443/inject -k
no body found
And from the istiod
pod directly (note: the port here is 15017, as this is the targetPort for the Service):
$ curl https://localhost:15017/inject -k
no body found
With this information you should be able to isolate where the breakage occurs.
To capture logs: kubectl logs -n istio-system -l app=istiod --tail=100000000 -c discovery > istiod.log
.
To capture mesh config: kubectl get configmap -n istio-system -o jsonpath={.data.mesh} istio > meshconfig.yaml
To capture a proxy config dump from Istiod perspective: kubectl exec ISTIOD_POD -- curl 'localhost:8080/debug/config_dump?proxyID=POD_NAME.POD_NAMESPACE'
,
Capture a snapshot of the Istio Control Plane dashboard. Prefer this to a screenshot if possible, as it allows zooming, etc.
If you are experiencing performance issues with Istiod, such as excessive CPU or memory usage, memory leaks, etc, it is helpful to capture profiles. Please see this page for help.
To get configuration and stats from a proxy (gateway or sidecar):
- Stats:
kubectl exec $POD -c istio-proxy -- curl 'localhost:15000/stats' > stats
- Config Dump:
kubectl exec $POD -c istio-proxy -- curl 'localhost:15000/config_dump' > config_dump.json
ORistioctl proxy-config all $POD -ojson > config_dump.json
- Clusters Dump:
kubectl exec $POD -c istio-proxy -- curl 'localhost:15000/clusters' > clusters
- Logs:
kubectl logs $POD -c istio-proxy > proxy.log
To enable debug logging, which may be useful if the default log does not provide enough information:
- At runtime:
istioctl proxy-config log POD --level=debug
- For a pod, set annotation:
sidecar.istio.io/logLevel: "debug"
- For the whole mesh, install with
--set values.global.proxy.logLevel=debug
To enable access logging, which may be useful to debug traffic, see here. More info about access log format can be found in Envoy docs.
See Analyzing Istio Performance
Istiod pushes updates to proxies in response to Kubernetes objects changing (Services, Pods, Istio configs, etc). The size/cost of these updates is also correlate to the amount of configuration and the number of proxies that need updates. When experiencing high Istiod CPU usage, this is typically due to too frequent or too large updates.
Both of these can often be addressed by scoping down dependencies using the Sidecar resource.
Another common issue is constant updates to objects, typically done by controllers constantly updating some field such as an annotation. To diagnose what is causing these pushes, it is best to look at Istiod logs:
info ads Push debounce stable[21] 2 for config ServiceEntry/echo/vm.echo.svc.cluster.local: 100.614843ms since last change, 110.109891ms since last push, full=true
This log line indicates the start of the push. Included is the configuration that caused the update (in this case, the vm
Service in namespace echo
), as well as the type of push (full=true
).
Updates with full=false
are pretty cheap and are triggered by endpoint updates. full=true
are generally the problem.
To diagnose these, look for what configurations are being updated and inspect the Kubernetes objects for changes. To watch for diffs, the kubectl-grep
tool can be used.
Note: ServiceEntry
is used for updates to Service
as well.
-
gRPC config stream closed: 13
orgRPC config stream closed: 0
in proxy logs, every 30 minutes. This error message is expected, as the connection to Pilot is intentionally closed every 30 minutes. -
gRPC config stream closed: 14
in proxy logs. If this occurs repeatedly it may indicate problems connecting to Pilot. However, a single occurance of this is typical when Envoy is starting or restarting.
Visit istio.io to learn how to use Istio.
- Preparing for Development Mac
- Preparing for Development Linux
- Troubleshooting Development Environment
- Repository Map
- GitHub Workflow
- Github Gmail Filters
- Using the Code Base
- Developing with Minikube
- Remote Debugging
- Verify your Docker Environment
- Istio Test Framework
- Working with Prow
- Test Grid
- Code Coverage FAQ
- Writing Good Integration Tests
- Test Flakes
- Release Manager Expectations
- Preparing Istio Releases
- 1.5 Release Information
- 1.6 Release Information
- 1.7 Release Information
- 1.8 Release Information
- 1.9 Release Information
- 1.10 Release Information
- 1.11 Release Information
- 1.12 Release Information
- 1.13 Release Information
- 1.14 Release Information
- 1.15 Release Information
- 1.16 Release Information
- 1.17 Release Information
- 1.18 Release Information
- 1.19 Release Information
- 1.20 Release Information
- 1.21 Release Information
- 1.22 Release Information
- 1.23 Release Information
- 1.24 Release Information
- Collecting Logs and Debug Info
- Dependency FAQ
- Working with discuss.istio.io
- Developing with and hosting upon OpenShift
- Adapter Dev Guide
- Adapter Walkthrough
- Attribute Generating Adapter Walkthrough
- Route Directive Adapter Development Guide
- Out of Tree Adapter Walkthrough
- Running a Local Instance
- Template Dev Guide
- Using a Custom Adapter
- Publishing Adapters and Templates to istio.io
- Enabling Envoy Authorization Service and gRPC Access Log Service With Mixer