Skip to content
This repository has been archived by the owner on Aug 19, 2021. It is now read-only.

Troubleshooting Kubernetes

Markus Winkler edited this page Nov 19, 2019 · 4 revisions

Troubleshooting Tips

Some tips for general troubleshooting within a kubernetes cluster.

Access your Master

You should never have the requirement to log on to any of the worker nodes. Just login to your master by

$ ssh <k8smaster ip>

while being logged in with user ansible on your Ansible server. Then switch to user ubuntu with

$ sudo su - ubuntu

Get Realtime Logs from a Namespace

Within the MOADSD-NG environment you can use Stern when you want to get logs from multiple Kubernetes objects like Service, Deployment or Job/CronJob. Stern lets you get color-coded logs from multiple containers inside the pods from all related Kubernetes objects of your application/microservice. With a simple command like below, you can tail logs from more relevant containers:

$ stern -n <namespace> <app-name> -t --since 10m

To easily follow realtime logs from a whole namespace, e.g. smartcheck, use the following command while logged in to your master:

$ stern -n smartcheck . -t --since 10m

See: https://github.com/wercker/stern

General Troubleshooting Examples

$ kubectl --namespace flaskapp describe pod flaskapp-b47654947-9jrkr
Name:               flaskapp-b47654947-9jrkr
Namespace:          flaskapp
Priority:           0
PriorityClassName:  <none>
Node:               k8sworker1/192.168.1.151
Start Time:         Tue, 13 Nov 2018 07:33:50 +0000
Labels:             app.kubernetes.io/instance=flaskapp
                    app.kubernetes.io/name=flaskapp
                    pod-template-hash=603210503
Annotations:        <none>
Status:             Pending
IP:                 10.244.1.78
Controlled By:      ReplicaSet/flaskapp-b47654947
Containers:
  flaskapp:
    Container ID:
    Image:          markus/flaskapp:latest
    Image ID:
    Port:           5001/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qqjzc (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-qqjzc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qqjzc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age              From                 Message
  ----     ------     ----             ----                 -------
  Normal   Scheduled  3m               default-scheduler    Successfully assigned flaskapp/flaskapp-b47654947-9jrkr to k8sworker1
  Normal   Pulling    2m (x4 over 3m)  kubelet, k8sworker1  pulling image "markus/flaskapp:latest"
  Warning  Failed     2m (x4 over 3m)  kubelet, k8sworker1  Failed to pull image "markus/flaskapp:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for markus/flaskapp, repository does not exist or may require 'docker login'
  Warning  Failed     2m (x4 over 3m)  kubelet, k8sworker1  Error: ErrImagePull
  Normal   BackOff    1m (x6 over 3m)  kubelet, k8sworker1  Back-off pulling image "markus/flaskapp:latest"
  Warning  Failed     1m (x7 over 3m)  kubelet, k8sworker1  Error: ImagePullBackOff
$ kubectl --namespace flaskapp edit pods flaskapp-7c45b98764-rf8wb

That opens a vi with the pod configuration which you can directly modify.

To query events from a namespace you can do the following:

$ kubectl get events --namespace=flaskapp
LAST SEEN   FIRST SEEN   COUNT     NAME                                         KIND         SUBOBJECT                   TYPE      REASON              SOURCE                  MESSAGE
42m         18h          213       flaskapp-7555c4974f-lvjps.1566629fd924b547   Pod          spec.containers{flaskapp}   Normal    Pulling             kubelet, k8sworker2     pulling image "markus/flaskapp:latest"
18m         18h          4673      flaskapp-7555c4974f-lvjps.156662a05ffeac90   Pod          spec.containers{flaskapp}   Normal    BackOff             kubelet, k8sworker2     Back-off pulling image "markus/flaskapp:latest"
11m         11m          1         flaskapp-b47654947-9jrkr.15669eb6e934d05d    Pod                                      Normal    Scheduled           default-scheduler       Successfully assigned flaskapp/flaskapp-b47654947-9jrkr to k8sworker1
11m         11m          1         flaskapp.15669eb6d86fd585                    Deployment                               Normal    ScalingReplicaSet   deployment-controller   Scaled up replica set flaskapp-b47654947 to 1
11m         11m          1         flaskapp-b47654947.15669eb6e1ed6b7b          ReplicaSet                               Normal    SuccessfulCreate    replicaset-controller   Created pod: flaskapp-b47654947-9jrkr
10m         11m          4         flaskapp-b47654947-9jrkr.15669eb76ebd723b    Pod          spec.containers{flaskapp}   Normal    Pulling             kubelet, k8sworker1     pulling image "markus/flaskapp:latest"
10m         11m          4         flaskapp-b47654947-9jrkr.15669eb7c83cb3a6    Pod          spec.containers{flaskapp}   Warning   Failed              kubelet, k8sworker1     Failed to pull image "markus/flaskapp:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for markus/flaskapp, repository does not exist or may require 'docker login'
10m         11m          4         flaskapp-b47654947-9jrkr.15669eb7c83d275d    Pod          spec.containers{flaskapp}   Warning   Failed              kubelet, k8sworker1     Error: ErrImagePull
10m         11m          6         flaskapp-b47654947-9jrkr.15669eb7ea66d8f6    Pod          spec.containers{flaskapp}   Normal    BackOff             kubelet, k8sworker1     Back-off pulling image "markus/flaskapp:latest"
3m          18h          4737      flaskapp-7555c4974f-lvjps.156662a05ffede13   Pod          spec.containers{flaskapp}   Warning   Failed              kubelet, k8sworker2     Error: ImagePullBackOff
1m          11m          41        flaskapp-b47654947-9jrkr.15669eb7ea6708e1    Pod          spec.containers{flaskapp}   Warning   Failed              kubelet, k8sworker1     Error: ImagePullBackOff

That also helps to identify the problem. In that case, pulling the image from docker hub did not work out.

Kubernetes Cheat Sheet

Might be useful while exploring Kubernetes :-)

Checking that RBAC is enabled

The following command will output false if RBAC is disabled and true otherwise:

$ kubectl get clusterroles > /dev/null 2>&1 && echo true || echo false

Kubectl Autocomplete

BASH setup autocomplete in bash into the current shell, bash-completion package should be installed first.

$ source <(kubectl completion bash)

add autocomplete permanently to your bash shell.

$ echo "source <(kubectl completion bash)" >> ~/.bashrc

ZSH setup autocomplete in zsh into the current shell

$ source <(kubectl completion zsh)

add autocomplete permanently to your zsh shell

$ echo "if [ $commands[kubectl] ]; then source <(kubectl completion zsh); fi" >> ~/.zshrc

Kubectl Context and Configuration

Set which Kubernetes cluster kubectl communicates with and modifies configuration information. See Authenticating Across Clusters with kubeconfig documentation for detailed config file information.

Select a specific namespace to work with:

$ kubectl config set-context $(kubectl config current-context) --namespace=monitoring

Verify:

$ kubectl config view | grep namespace:
    namespace: monitoring

Reset context to default:

$ kubectl config set-context kubernetes-admin@kubernetes --namespace=default
$ kubectl config view # Show Merged kubeconfig settings.
$
$ # use multiple kubeconfig files at the same time and view merged config
$ KUBECONFIG=~/.kube/config:~/.kube/kubconfig2 kubectl config view
$
$ # Get the password for the e2e user
$ kubectl config view -o jsonpath='{.users[?(@.name == "e2e")].user.password}'
$
$ kubectl config current-context              # Display the current-context
$ kubectl config use-context my-cluster-name  # set the default context to my-cluster-name
$
$ # add a new cluster to your kubeconf that supports basic auth
$ kubectl config set-credentials kubeuser/foo.kubernetes.com --username=kubeuser --password=kubepassword
$
$ # set a context utilizing a specific username and namespace.
$ kubectl config set-context gce --user=cluster-admin --namespace=foo && kubectl config use-context gce

Set default namespace as default in current context

$ kubectl config set-context $(kubectl config current-context) --namespace=default

Creating Objects

Kubernetes manifests can be defined in json or yaml. The file extension .yaml, .yml, and .json can be used.

$ kubectl create -f ./my-manifest.yaml           # create resource(s)
$ kubectl create -f ./my1.yaml -f ./my2.yaml     # create from multiple files
$ kubectl create -f ./dir                        # create resource(s) in all manifest files in dir
$ kubectl create -f https://git.io/vPieo         # create resource(s) from url
$ kubectl create deployment nginx --image=nginx  # start a single instance of nginx
$ kubectl explain pods,svc                       # get the documentation for pod and svc manifests
$
$ # Create multiple YAML objects from stdin
$ cat <<EOF | kubectl create -f -
  apiVersion: v1
  kind: Pod
  metadata:
    name: busybox-sleep
  spec:
    containers:
    - name: busybox
      image: busybox
      args:
      - sleep
      - "1000000"
  ---
  apiVersion: v1
  kind: Pod
  metadata:
    name: busybox-sleep-less
  spec:
    containers:
    - name: busybox
      image: busybox
      args:
      - sleep
      - "1000"
  EOF
$
$ # Create a secret with several keys
$ cat <<EOF | kubectl create -f -
  apiVersion: v1
  kind: Secret
  metadata:
    name: mysecret
  type: Opaque
  data:
    password: $(echo -n "s33msi4" | base64 -w0)
    username: $(echo -n "jane" | base64 -w0)
  EOF

Viewing, Finding Resources

$ # Get commands with basic output
$ kubectl get services                          # List all services in the namespace
$ kubectl get pods --all-namespaces             # List all pods in all namespaces
$ kubectl get pods -o wide                      # List all pods in the namespace, with more details
$ kubectl get deployment my-dep                 # List a particular deployment
$ kubectl get pods --include-uninitialized      # List all pods in the namespace, including uninitialized ones
$
$ # Describe commands with verbose output
$ kubectl describe nodes my-node
$ kubectl describe pods my-pod
$
$ kubectl get services --sort-by=.metadata.name # List Services Sorted by Name
$
$ # List pods Sorted by Restart Count
$ kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'
$
$ # Get the version label of all pods with label app=cassandra
$ kubectl get pods --selector=app=cassandra rc -o \
    jsonpath='{.items[*].metadata.labels.version}'
$
$ # Get all running pods in the namespace
$ kubectl get pods --field-selector=status.phase=Running
$
$ # Get ExternalIPs of all nodes
$ kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'
$
$ # List Names of Pods that belong to Particular RC
$ # "jq" command useful for transformations that are too complex for jsonpath, it can be found at https://stedolan.github.io/jq/
$ sel=${$(kubectl get rc my-rc --output=json | jq -j '.spec.selector | to_entries | .[] | "\(.key)=\(.value),"')%?}
$ echo $(kubectl get pods --selector=$sel --output=jsonpath={.items..metadata.name})
$
$ # Check which nodes are ready
$ JSONPATH='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' \
   && kubectl get nodes -o jsonpath="$JSONPATH" | grep "Ready=True"
$
$ # List all Secrets currently in use by a pod
$ kubectl get pods -o json | jq '.items[].spec.containers[].env[]?.valueFrom.secretKeyRef.name' | grep -v null | sort | uniq
$
$ # List Events sorted by timestamp
$ kubectl get events --sort-by=.metadata.creationTimestamp

Updating Resources

As of version 1.11 rolling-update have been deprecated (see CHANGELOG-1.11.md), use rollout instead.

$ kubectl set image deployment/frontend www=image:v2               # Rolling update "www" containers of "frontend" deployment, updating the image
$ kubectl rollout undo deployment/frontend                         # Rollback to the previous deployment
$ kubectl rollout status -w deployment/frontend                    # Watch rolling update status of "frontend" deployment until completion
$
$ # deprecated starting version 1.11
$ kubectl rolling-update frontend-v1 -f frontend-v2.json           # (deprecated) Rolling update pods of frontend-v1
$ kubectl rolling-update frontend-v1 frontend-v2 --image=image:v2  # (deprecated) Change the name of the resource and update the image
$ kubectl rolling-update frontend --image=image:v2                 # (deprecated) Update the pods image of frontend
$ kubectl rolling-update frontend-v1 frontend-v2 --rollback        # (deprecated) Abort existing rollout in progress
$
$ cat pod.json | kubectl replace -f -                              # Replace a pod based on the JSON passed into std
$
$ # Force replace, delete and then re-create the resource. Will cause a service outage.
$ kubectl replace --force -f ./pod.json
$
$ # Create a service for a replicated nginx, which serves on port 80 and connects to the containers on port 8000
$ kubectl expose rc nginx --port=80 --target-port=8000
$
$ # Update a single-container pod's image version (tag) to v4
$ kubectl get pod mypod -o yaml | sed 's/\(image: myimage\):.*$/\1:v4/' | kubectl replace -f -
$
$ kubectl label pods my-pod new-label=awesome                      # Add a Label
$ kubectl annotate pods my-pod icon-url=http://goo.gl/XXBTWq       # Add an annotation
$ kubectl autoscale deployment foo --min=2 --max=10                # Auto scale a deployment "foo"

Patching Resources

$ kubectl patch node k8s-node-1 -p '{"spec":{"unschedulable":true}}' # Partially update a node
$
$ # Update a container's image; spec.containers[*].name is required because it's a merge key
$ kubectl patch pod valid-pod -p '{"spec":{"containers":[{"name":"kubernetes-serve-hostname","image":"new image"}]}}'
$
$ # Update a container's image using a json patch with positional arrays
$ kubectl patch pod valid-pod --type='json' -p='[{"op": "replace", "path": "/spec/containers/0/image", "value":"new image"}]'
$
$ # Disable a deployment livenessProbe using a json patch with positional arrays
$ kubectl patch deployment valid-deployment  --type json   -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/livenessProbe"}]'
$
$ # Add a new element to a positional array
$ kubectl patch sa default --type='json' -p='[{"op": "add", "path": "/secrets/1", "value": {"name": "whatever" } }]'

Editing Resources

The edit any API resource in an editor.

$ kubectl edit svc/docker-registry                      # Edit the service named docker-registry
$ KUBE_EDITOR="nano" kubectl edit svc/docker-registry   # Use an alternative editor

Scaling Resources

$ kubectl scale --replicas=3 rs/foo                                 # Scale a replicaset named 'foo' to 3
$ kubectl scale --replicas=3 -f foo.yaml                            # Scale a resource specified in "foo.yaml" to 3
$ kubectl scale --current-replicas=2 --replicas=3 deployment/mysql  # If the deployment named mysql's current size is 2, scale mysql to 3
$ kubectl scale --replicas=5 rc/foo rc/bar rc/baz                   # Scale multiple replication controllers

Deleting Resources

$ kubectl delete -f ./pod.json                                              # Delete a pod using the type and name specified in pod.json
$ kubectl delete pod,service baz foo                                        # Delete pods and services with same names "baz" and "foo"
$ kubectl delete pods,services -l name=myLabel                              # Delete pods and services with label name=myLabel
$ kubectl delete pods,services -l name=myLabel --include-uninitialized      # Delete pods and services, including uninitialized ones, with label name=myLabel
$ kubectl -n my-ns delete po,svc --all                                      # Delete all pods and services, including uninitialized ones, in namespace my-ns,

Interacting with running Pods

$ kubectl logs my-pod                                 # dump pod logs (stdout)
$ kubectl logs my-pod --previous                      # dump pod logs (stdout) for a previous instantiation of a container
$ kubectl logs my-pod -c my-container                 # dump pod container logs (stdout, multi-container case)
$ kubectl logs my-pod -c my-container --previous      # dump pod container logs (stdout, multi-container case) for a previous instantiation of a container
$ kubectl logs -f my-pod                              # stream pod logs (stdout)
$ kubectl logs -f my-pod -c my-container              # stream pod container logs (stdout, multi-container case)
$ kubectl run -i --tty busybox --image=busybox -- sh  # Run pod as interactive shell
$ kubectl attach my-pod -i                            # Attach to Running Container
$ kubectl port-forward my-pod 5000:6000               # Listen on port 5000 on the local machine and forward to port 6000 on my-pod
$ kubectl exec my-pod -- ls /                         # Run command in existing pod (1 container case)
$ kubectl exec my-pod -c my-container -- ls /         # Run command in existing pod (multi-container case)
$ kubectl top pod POD_NAME --containers               # Show metrics for a given pod and its containers

Interacting with Nodes and Cluster

$ kubectl cordon my-node                                                # Mark my-node as unschedulable
$ kubectl drain my-node                                                 # Drain my-node in preparation for maintenance
$ kubectl uncordon my-node                                              # Mark my-node as schedulable
$ kubectl top node my-node                                              # Show metrics for a given node
$ kubectl cluster-info                                                  # Display addresses of the master and services
$ kubectl cluster-info dump                                             # Dump current cluster state to stdout
$ kubectl cluster-info dump --output-directory=/path/to/cluster-state   # Dump current cluster state to /path/to/cluster-state
$
$ # If a taint with that key and effect already exists, its value is replaced as specified.
$ kubectl taint nodes foo dedicated=special-user:NoSchedule

Resource types

List all supported resource types along with their shortnames, API group, whether they are namespaced, and Kind:

$ kubectl api-resources

Other operations for exploring API resources:

$ kubectl api-resources --namespaced=true      # All namespaced resources
$ kubectl api-resources --namespaced=false     # All non-namespaced resources
$ kubectl api-resources -o name                # All resources with simple output (just the resource name)
$ kubectl api-resources -o wide                # All resources with expanded (aka "wide") output
$ kubectl api-resources --verbs=list,get       # All resources that support the "list" and "get" request verbs
$ kubectl api-resources --api-group=extensions # All resources in the "extensions" API group
Clone this wiki locally