# Diagnostics

Some commands to see what's going on in the cluster

## Links

* [https://kubernetes.io/docs/concepts/architecture/](https://kubernetes.io/docs/concepts/architecture/)
* [https://kubernetes.io/docs/tasks/debug/debug-cluster/](https://kubernetes.io/docs/tasks/debug/debug-cluster/)

In [2]:
# Print the hosts.yml used for this cluser:
! cat ../mycluster/hosts.yml

all:
  hosts:
    node1:
      ansible_host: xxx.xxx.118.255
      ip: xxx.xxx.118.255
      access_ip: xxx.xxx.118.255
    node2:
      ansible_host: xxx.xxx.118.104
      ip: xxx.xxx.118.104
      access_ip: xxx.xxx.118.104
    node3:
      ansible_host: xxx.xxx.175.63
      ip: xxx.xxx.175.63
      access_ip: xxx.xxx.175.63
  children:
    kube_control_plane:
      hosts:
        node1:
    kube_node:
      hosts:
        node2:
        node3:
    etcd:
      hosts:
        node1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}


In [3]:
! kubectl cluster-info

[0;32mKubernetes control plane[0m is running at [0;33mhttps://xxx.xxx.118.255:6443[0m

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.


**Warning** The output of `kubectl cluster-info dump` is huge. 

In [4]:
! kubectl get all --all-namespaces

NAMESPACE            NAME                                           READY   STATUS    RESTARTS   AGE
kube-system          pod/calico-kube-controllers-6dd874f784-x2bbk   1/1     Running   0          16m
kube-system          pod/calico-node-gl8k2                          1/1     Running   0          16m
kube-system          pod/calico-node-spg5v                          1/1     Running   0          16m
kube-system          pod/calico-node-t5nsj                          1/1     Running   0          16m
kube-system          pod/coredns-76b4fb4578-jhsz6                   1/1     Running   0          15m
kube-system          pod/coredns-76b4fb4578-rk7zt                   1/1     Running   0          15m
kube-system          pod/dns-autoscaler-7979fb6659-h96tg            1/1     Running   0          15m
kube-system          pod/kube-apiserver-node1                       1/1     Running   1          18m
kube-system          pod/kube-controller-manager-node1              1/1     Running   1    

In [5]:
! kubectl get nodes

NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   18m   v1.23.7
node2   Ready    <none>                 17m   v1.23.7
node3   Ready    <none>                 17m   v1.23.7


**Overview of node status** (https://kubernetes.io/docs/concepts/architecture/nodes/#node-status)[https://kubernetes.io/docs/concepts/architecture/nodes/#node-status}

In [6]:
! kubectl describe node node2

Name:               node2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node2
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: xxx.xxx.118.104/32
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.233.96.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 21 Jun 2022 08:11:47 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node2
  AcquireTime:     <unset>
  RenewTime:       Tue, 21 Jun 2022 08:28:49 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason          

In [7]:
# What's running in the default namespace ?
! kubectl get all -v=6

I0621 08:29:03.285330   17667 loader.go:372] Config loaded from file:  /root/.kube/config
I0621 08:29:03.383947   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/api/v1/namespaces/default/pods?limit=500 200 OK in 87 milliseconds
I0621 08:29:03.406591   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/api/v1/namespaces/default/replicationcontrollers?limit=500 200 OK in 22 milliseconds
I0621 08:29:03.429737   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/api/v1/namespaces/default/services?limit=500 200 OK in 22 milliseconds
I0621 08:29:03.453266   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/apis/apps/v1/namespaces/default/daemonsets?limit=500 200 OK in 22 milliseconds
I0621 08:29:03.477488   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/apis/apps/v1/namespaces/default/deployments?limit=500 200 OK in 23 milliseconds
I0621 08:29:03.504126   17667 round_trippers.go:553] GET https://xxx.xxx.118.255:6443/apis/apps/v1

## systemd status

In [8]:
! ssh node1 systemctl status kubepods.slice

● kubepods.slice - libcontainer container kubepods.slice
     Loaded: loaded (/run/systemd/transient/kubepods.slice; transient)
  Transient: yes
    Drop-In: /run/systemd/transient/kubepods.slice.d
             └─50-CPUShares.conf, 50-MemoryLimit.conf, 50-TasksMax.conf
     Active: active since Tue 2022-06-21 08:10:25 UTC; 18min ago
      Tasks: 134 (limit: 4194304)
     Memory: 680.4M (limit: 3.2G)
        CPU: 4min 27.195s
     CGroup: /kubepods.slice
             ├─kubepods-besteffort.slice
             │ └─kubepods-besteffort-pod935dfb0c_6a7b_4e86_b869_791ffdc012fa.slice
             │   ├─cri-containerd-5cfe5ec7e799618675dbed1386150c46203789db0197c5a2de2c1ab7ca3b8033.scope
             │   │ └─14111 /pause
             │   └─cri-containerd-7fa84426bf9bbacb46509748554195e442b91a1b34fd95ef34d61196fbe72f71.scope
             │     └─14150 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=node1
             ├─kubepods-burstable.slice
             │

In [9]:
# anything running through containerd ?
! ssh node1 ctr c ls

CONTAINER    IMAGE    RUNTIME    


## etcd

In [10]:
# How many etcd instances are in my cluster ?

! ssh -q node1 'etcdctl member list --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem"'

e2315c91870fade, started, etcd1, https://xxx.xxx.118.255:2380, https://xxx.xxx.118.255:2379, false


In [11]:
# List first 20 keys from etcd
! ssh -q node1 'etcdctl get --from-key "" --keys-only --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/apiextensions.k8s.io/customresourcedefinitions/bgpconfigurations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/bgppeers.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/blockaffinities.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/caliconodestatuses.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/clusterinformations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/felixconfigurations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworkpolicies.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworksets.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/hostendpoints.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/ipamblocks.crd.projectcalico.org



In [12]:
# get services
! ssh -q node1 'etcdctl get --from-key "/registry/services" --keys-only --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/services/endpoints/default/kubernetes

/registry/services/endpoints/kube-system/coredns

/registry/services/endpoints/kube-system/metrics-server

/registry/services/specs/default/kubernetes

/registry/services/specs/kube-system/coredns

/registry/services/specs/kube-system/metrics-server

/registry/storageclasses/local-path

compact_rev_key



In [13]:
# what's in our kubernetes services ?

! ssh -q node1 'etcdctl get /registry/services/endpoints/default/kubernetes --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/services/endpoints/default/kubernetes
k8s 

v1	Endpoints�
�

kubernetes default" *$55a15631-e868-426c-b1f8-43691d028e5c2 8��ŕ Z/
'endpointslice.kubernetes.io/skip-mirrortruez ��
kube-apiserverUpdatev��ŕ FieldsV1:d
b{"f:metadata":{"f:labels":{".":{},"f:endpointslice.kubernetes.io/skip-mirror":{}}},"f:subsets":{}}B &

xxx.xxx.118.255 
https�2TCP " 


## containerd

In [14]:
# Which containers are running on node2 ?
! ssh -q node2 "nerdctl ps"
! echo "\nPrinting just the images: \n"
! ssh -q node2 "nerdctl ps | tail -n +2 | tr -s ' ' | cut -d' ' -f2 | sort | uniq"

CONTAINER ID    IMAGE                                               COMMAND                   CREATED           STATUS    PORTS    NAMES
44be767ddda7    k8s.gcr.io/pause:3.3                                "/pause"                  18 minutes ago    Up                 k8s://kube-system/kube-proxy-5q4lc                                                         
4d413e3cdfd9    k8s.gcr.io/pause:3.3                                "/pause"                  16 minutes ago    Up                 k8s://local-path-storage/local-path-provisioner-6957789775-hxkbg                           
5820e4af380f    docker.io/rancher/local-path-provisioner:v0.0.21    "local-path-provisio…"    16 minutes ago    Up                 k8s://local-path-storage/local-path-provisioner-6957789775-hxkbg/local-path-provisioner    
5a03896ebe6b    k8s.gcr.io/kube-proxy:v1.23.7                       "/usr/local/bin/kube…"    18 minutes ago    Up                 k8s://kube-system/kube-proxy-5q4lc/kube-proxy                  

In [15]:
# Which containers are running on node3 ?
! ssh -q node3 "nerdctl ps"
! echo "\nPrinting just the images: \n"
! ssh -q node3 "nerdctl ps | tail -n +2 | tr -s ' ' | cut -d' ' -f2 | sort | uniq"

CONTAINER ID    IMAGE                                       COMMAND                   CREATED           STATUS    PORTS    NAMES
26c178c96135    k8s.gcr.io/pause:3.3                        "/pause"                  17 minutes ago    Up                 k8s://kube-system/calico-kube-controllers-6dd874f784-x2bbk                            
41f656a16a5f    docker.io/library/nginx:1.21.4              "/docker-entrypoint.…"    18 minutes ago    Up                 k8s://kube-system/nginx-proxy-node3/nginx-proxy                                       
543ce2ddc8de    k8s.gcr.io/pause:3.3                        "/pause"                  18 minutes ago    Up                 k8s://kube-system/kube-proxy-n6b5d                                                    
6b48de264141    k8s.gcr.io/coredns/coredns:v1.8.6           "/coredns -conf /etc…"    16 minutes ago    Up                 k8s://kube-system/coredns-76b4fb4578-rk7zt/coredns                                    
79531790844a    k8s.gcr.io/paus

* quay.io/calico/kube-controllers is only running on node 2
* k8s.gcr.io/coredns/coredns is only running on node 2
* docker.io/rancher/local-path-provisioner is only running on node 3


## metrics server

**The metrics pipline needs to be installed in the cluster**

See [https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/)

To activate with kubespray make sure `metrics_server_enabled: true` is set in addons.yaml

In [16]:
! kubectl top node node1

NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
node1   444m         24%    1540Mi          47%       


In [17]:
# What's the top of the metrics-server ?
! kubectl top po $(kubectl get po -n kube-system | grep metrics | tr -s ' ' | cut -d' ' -f1) -n kube-system

NAME                              CPU(cores)   MEMORY(bytes)   
metrics-server-5c8c77d7b8-stkx7   8m           16Mi            
