# Diagnostics

Some commands to see what's going on in the cluster

## Links

* [https://kubernetes.io/docs/concepts/architecture/](https://kubernetes.io/docs/concepts/architecture/)
* [https://kubernetes.io/docs/tasks/debug/debug-cluster/](https://kubernetes.io/docs/tasks/debug/debug-cluster/)

In [25]:
# Print the hosts.yml used for this cluser:
! cat mycluster/hosts.yml

all:
  hosts:
    node1:
      ansible_host: xxx.xxx.235.216
      ip: xxx.xxx.235.216
      access_ip: xxx.xxx.235.216
    node2:
      ansible_host: xxx.xxx.195.197
      ip: xxx.xxx.195.197
      access_ip: xxx.xxx.195.197
    node3:
      ansible_host: xxx.xxx.195.230
      ip: xxx.xxx.195.230
      access_ip: xxx.xxx.195.230
  children:
    kube_control_plane:
      hosts:
        node1:
    kube_node:
      hosts:
        node2:
        node3:
    etcd:
      hosts:
        node1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}


In [12]:
! kubectl cluster-info

[0;32mKubernetes control plane[0m is running at [0;33mhttps://xxx.xxx.235.216:6443[0m

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.


**Warning** The output of `kubectl cluster-info dump` is huge. 

In [4]:
! kubectl get all --all-namespaces

NAMESPACE     NAME                                           READY   STATUS    RESTARTS      AGE
kube-system   pod/calico-kube-controllers-6dd874f784-p7pt7   1/1     Running   0             24m
kube-system   pod/calico-node-5hvhv                          1/1     Running   0             25m
kube-system   pod/calico-node-mxmmd                          1/1     Running   0             25m
kube-system   pod/calico-node-wfnlr                          1/1     Running   0             25m
kube-system   pod/coredns-76b4fb4578-gf6rx                   1/1     Running   0             23m
kube-system   pod/coredns-76b4fb4578-nhwzg                   1/1     Running   0             24m
kube-system   pod/dns-autoscaler-7979fb6659-z68nd            1/1     Running   0             24m
kube-system   pod/kube-apiserver-node1                       1/1     Running   1             27m
kube-system   pod/kube-controller-manager-node1              1/1     Running   2 (23m ago)   27m
kube-system   pod/kube-proxy-l

In [22]:
! kubectl get nodes

NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   39m   v1.23.7
node2   Ready    <none>                 38m   v1.23.7
node3   Ready    <none>                 38m   v1.23.7


**Overview of node status** (https://kubernetes.io/docs/concepts/architecture/nodes/#node-status)[https://kubernetes.io/docs/concepts/architecture/nodes/#node-status}

In [10]:
! kubectl describe node node2

Name:               node2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node2
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: xxx.xxx.195.197/32
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.233.96.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 Jun 2022 06:32:22 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node2
  AcquireTime:     <unset>
  RenewTime:       Wed, 08 Jun 2022 09:34:46 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason          

In [5]:
# What's running in the default namespace ?
! kubectl get all -v=6

I0608 07:39:10.759981   17503 loader.go:372] Config loaded from file:  /root/.kube/config
I0608 07:39:10.905631   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/api/v1/namespaces/default/pods?limit=500 200 OK in 107 milliseconds
I0608 07:39:10.932693   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/api/v1/namespaces/default/replicationcontrollers?limit=500 200 OK in 26 milliseconds
I0608 07:39:10.961948   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/api/v1/namespaces/default/services?limit=500 200 OK in 29 milliseconds
I0608 07:39:10.988432   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/apis/apps/v1/namespaces/default/daemonsets?limit=500 200 OK in 25 milliseconds
I0608 07:39:11.013283   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/apis/apps/v1/namespaces/default/deployments?limit=500 200 OK in 24 milliseconds
I0608 07:39:11.038686   17503 round_trippers.go:553] GET https://xxx.xxx.235.216:6443/apis/apps/v

## Apparently, the kubernetes servcies are packed into a so-called "slice". A new concept for me.

In [11]:
! ssh node1 systemctl status kubepods.slice

● kubepods.slice - libcontainer container kubepods.slice
     Loaded: loaded (/run/systemd/transient/kubepods.slice; transient)
  Transient: yes
    Drop-In: /run/systemd/transient/kubepods.slice.d
             └─50-CPUShares.conf, 50-MemoryLimit.conf, 50-TasksMax.conf
     Active: active since Mon 2022-06-06 08:23:30 UTC; 31min ago
      Tasks: 95 (limit: 4194304)
     Memory: 633.2M (limit: 1.3G)
        CPU: 4min 13.578s
     CGroup: /kubepods.slice
             ├─kubepods-besteffort.slice
             │ └─kubepods-besteffort-pod769636f9_4b3c_468c_81b0_a109e8f3563d.slice
             │   ├─cri-containerd-13cec8dabe40c4cece0c74c42f19c13c8463bc2887d15691b4c9149a08cb29f3.scope
             │   │ └─13900 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=node1
             │   └─cri-containerd-8887f221e20e74962a6202bef261956c22d93d4693c7374d38dc71581164e993.scope
             │     └─13861 /pause
             └─kubepods-burstable.slice
               

In [23]:
# anything running through containerd ?
! ssh node1 ctr c ls

CONTAINER    IMAGE    RUNTIME    


## etcd

In [1]:
# How many etcd instances are in our cluster ?

! ssh -q node1 'etcdctl member list --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem"'

562f5cd7801f1d8b, started, etcd1, https://xxx.xxx.235.216:2380, https://xxx.xxx.235.216:2379, false


In [2]:
# List first 20 keys from etcd
! ssh -q node1 'etcdctl get --from-key "" --keys-only --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/apiextensions.k8s.io/customresourcedefinitions/bgpconfigurations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/bgppeers.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/blockaffinities.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/caliconodestatuses.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/clusterinformations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/felixconfigurations.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworkpolicies.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworksets.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/hostendpoints.crd.projectcalico.org

/registry/apiextensions.k8s.io/customresourcedefinitions/ipamblocks.crd.projectcalico.org



In [3]:
# get services
! ssh -q node1 'etcdctl get --from-key "/registry/services" --keys-only --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/services/endpoints/default/kubernetes

/registry/services/endpoints/kube-system/coredns

/registry/services/specs/default/kubernetes

/registry/services/specs/kube-system/coredns

/registry/storageclasses/local-path

compact_rev_key



In [4]:
# what's in our kubernetes services ?

! ssh -q node1 'etcdctl get /registry/services/endpoints/default/kubernetes --cert="/etc/ssl/etcd/ssl/node-node1.pem" --key="/etc/ssl/etcd/ssl/node-node1-key.pem" | head -20'

/registry/services/endpoints/default/kubernetes
k8s 

v1	Endpoints�
�

kubernetes default" *$34a761b9-fdcf-4664-b7e2-dc7bcac180c52 8���� Z/
'endpointslice.kubernetes.io/skip-mirrortruez ��
kube-apiserverUpdatev���� FieldsV1:d
b{"f:metadata":{"f:labels":{".":{},"f:endpointslice.kubernetes.io/skip-mirror":{}}},"f:subsets":{}}B %

xxx.xxx.235.216 
https�2TCP " 


## containerd

In [6]:
# Which containers are running on node2 ?
! ssh -q node2 "nerdctl ps"
! echo "\nPrinting just the images: \n"
! ssh -q node2 "nerdctl ps | tail -n +2 | tr -s ' ' | cut -d' ' -f2 | sort | uniq"

CONTAINER ID    IMAGE                                       COMMAND                   CREATED        STATUS    PORTS    NAMES
03d415712a9f    k8s.gcr.io/kube-proxy:v1.23.7               "/usr/local/bin/kube…"    2 hours ago    Up                 k8s://kube-system/kube-proxy-xn9rx/kube-proxy                                         
0c12bf085273    k8s.gcr.io/pause:3.3                        "/pause"                  2 hours ago    Up                 k8s://kube-system/nodelocaldns-f29kg                                                  
143622dbd6d7    k8s.gcr.io/pause:3.3                        "/pause"                  2 hours ago    Up                 k8s://kube-system/nginx-proxy-node2                                                   
17b03d491dad    k8s.gcr.io/dns/k8s-dns-node-cache:1.21.1    "/node-cache -locali…"    2 hours ago    Up                 k8s://kube-system/nodelocaldns-f29kg/node-cache                                       
1d23a60e2b99    quay.io/calico/kube-controller

In [7]:
# Which containers are running on node3 ?
! ssh -q node3 "nerdctl ps"
! echo "\nPrinting just the images: \n"
! ssh -q node3 "nerdctl ps | tail -n +2 | tr -s ' ' | cut -d' ' -f2 | sort | uniq"

CONTAINER ID    IMAGE                                               COMMAND                   CREATED        STATUS    PORTS    NAMES
1210821f8181    k8s.gcr.io/dns/k8s-dns-node-cache:1.21.1            "/node-cache -locali…"    2 hours ago    Up                 k8s://kube-system/nodelocaldns-s728x/node-cache                                            
4b9c1c7a61af    k8s.gcr.io/pause:3.3                                "/pause"                  2 hours ago    Up                 k8s://kube-system/calico-node-h47zr                                                        
56abc426aabc    k8s.gcr.io/pause:3.3                                "/pause"                  2 hours ago    Up                 k8s://kube-system/kube-proxy-b9gh2                                                         
8a92fd9a6409    docker.io/library/nginx:1.21.4                      "/docker-entrypoint.…"    2 hours ago    Up                 k8s://kube-system/nginx-proxy-node3/nginx-proxy                               

* quay.io/calico/kube-controllers is only running on node 2
* k8s.gcr.io/coredns/coredns is only running on node 2
* docker.io/rancher/local-path-provisioner is only running on node 3


## metrics server

**The metrics pipline needs to be installed in the cluster**

See [https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/)

To activate with kubespray make sure `metrics_server_enabled: true` is set in addons.yaml

In [14]:
! kubectl top node node1

NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
node1   330m         18%    1608Mi          49%       


In [22]:
# What's the top of the metrics-server ?
! kubectl top po $(kubectl get po -n kube-system | grep metrics | tr -s ' ' | cut -d' ' -f1) -n kube-system

NAME                              CPU(cores)   MEMORY(bytes)   
metrics-server-5c8c77d7b8-xz79p   7m           17Mi            
