Skip to content

Commit

Permalink
Troubleshooting updates
Browse files Browse the repository at this point in the history
  • Loading branch information
superseb authored and Denise committed May 3, 2019
1 parent c4785f7 commit 408b829
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,22 @@ The cluster state (`/var/lib/etcd`) contains wrong information to join the clust

### etcd cluster and connectivity checks

If any of the commands respond with `Error: context deadline exceeded`, the etcd instance is unhealthy (either quorum is lost or the instance is not correctly joined in the cluster)
The address where etcd is listening depends on the address configuration of the host etcd is running on. If an internal address is configured for the host etcd is running on, the endpoint for `etcdctl` needs to be specified explicitly. If any of the commands respond with `Error: context deadline exceeded`, the etcd instance is unhealthy (either quorum is lost or the instance is not correctly joined in the cluster)

* Check etcd members on all nodes

Output should contain all the nodes with the `etcd` role and the output should be identical on all nodes.

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl member list
```

Command when internal address is configured on the host:
```
docker exec etcd sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list"
```

Example output:
```
xxx, started, etcd-xxx, https://IP:2380, https://IP:2379,https://IP:4001
Expand All @@ -79,10 +85,16 @@ xxx, started, etcd-xxx, https://IP:2380, https://IP:2379,https://IP:4001

The values for `RAFT TERM` should be equal and `RAFT INDEX` should be not be too far apart from each other.

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl endpoint status --endpoints=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") --write-out table
```

Command when internal address is configured on the host:
```
docker exec etcd etcdctl endpoint status --endpoints=$(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") --write-out table
```

Example output:
```
+-----------------+------------------+---------+---------+-----------+-----------+------------+
Expand All @@ -96,10 +108,16 @@ Example output:

* Check endpoint health

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl endpoint health --endpoints=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','")
```

Command when internal address is configured on the host:
```
docker exec etcd etcdctl endpoint health --endpoints=$(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','")
```

Example output:
```
https://IP:2379 is healthy: successfully committed proposal: took = 2.113189ms
Expand All @@ -109,22 +127,40 @@ https://IP:2379 is healthy: successfully committed proposal: took = 2.451201ms

* Check connectivity on port TCP/2379

Command when no internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done
```

Command when internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done
```

If you are running on an operating system without `curl` (for example, RancherOS), you can use the following command which uses a Docker container to run the `curl` command.

Command when no internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
docker run --net=host -v /opt/rke/etc/kubernetes/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health"
done
```

Command when internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
docker run --net=host -v /opt/rke/etc/kubernetes/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health"
done
```

Example output:
```
Validating connection to https://IP:2379/health
Expand All @@ -137,22 +173,40 @@ Validating connection to https://IP:2379/health

* Check connectivity on port TCP/2380

Command when no internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f4"); do
echo "Validating connection to ${endpoint}/version";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/version";
done
```

Command when internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f4"); do
echo "Validating connection to ${endpoint}/version";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/version";
done
```

If you are running on an operating system without `curl` (for example, RancherOS), you can use the following command which uses a Docker container to run the `curl` command.

Command when no internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f4"); do
echo "Validating connection to ${endpoint}/version";
docker run --net=host -v /opt/rke/etc/kubernetes/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/version"
done
```

Command when internal address is configured on the host:
```
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f4"); do
echo "Validating connection to ${endpoint}/version";
docker run --net=host -v /opt/rke/etc/kubernetes/ssl:/etc/kubernetes/ssl:ro appropriate/curl -s -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/version"
done
```

Example output:
```
Validating connection to https://IP:2380/version
Expand All @@ -167,10 +221,16 @@ Validating connection to https://IP:2380/version

etcd will trigger alarms, for instance when it runs out of space.

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl alarm list
```

Command when internal address is configured on the host:
```
docker exec etcd sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT alarm list"
```

Example output when NOSPACE alarm is triggered:
```
memberID:x alarm:NOSPACE
Expand All @@ -186,22 +246,35 @@ Resolution:

* Compact the keyspace

Command when no internal address is configured on the host:
```
rev=$(docker exec etcd etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
docker exec etcd etcdctl compact "$rev"
```

Command when internal address is configured on the host:
```
rev=$(docker exec etcd sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT endpoint status --write-out json | egrep -o '\"revision\":[0-9]*' | egrep -o '[0-9]*'")
docker exec etcd sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT compact \"$rev\""
```

Example output:
```
compacted revision xxx
```

* Defrag all etcd members

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl defrag --endpoints=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','")
```

Command when internal address is configured on the host:
```
docker exec etcd sh -c "etcdctl defrag --endpoints=$(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','")"
```

Example output:
```
Finished defragmenting etcd member[https://IP:2379]
Expand All @@ -211,10 +284,16 @@ Finished defragmenting etcd member[https://IP:2379]

* Check endpoint status

Command when no internal address is configured on the host:
```
docker exec etcd etcdctl endpoint status --endpoints=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") --write-out table
```

Command when internal address is configured on the host:
```
docker exec etcd sh -c "etcdctl endpoint status --endpoints=$(docker exec etcd /bin/sh -c "etcdctl --endpoints=\$ETCDCTL_ENDPOINT member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") --write-out table"
```

Example output:
```
+-----------------+------------------+---------+---------+-----------+-----------+------------+
Expand All @@ -226,6 +305,32 @@ Example output:
+-----------------+------------------+---------+---------+-----------+-----------+------------+
```

### Log level

The log level of etcd can be changed dynamically via the API. You can configure debug logging using the commands below.

Command when no internal address is configured on the host:
```
curl -XPUT -d '{"Level":"DEBUG"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) https://localhost:2379/config/local/log
```

Command when internal address is configured on the host:
```
curl -XPUT -d '{"Level":"DEBUG"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv $ETCDCTL_ENDPOINT)/config/local/log
```

To reset the log level back to the default (`INFO`), you can use the following command.

Command when no internal address is configured on the host:
```
curl -XPUT -d '{"Level":"INFO"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) https://localhost:2379/config/local/log
```

Command when internal address is configured on the host:
```
curl -XPUT -d '{"Level":"INFO"}' --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) $(docker exec etcd printenv $ETCDCTL_ENDPOINT)/config/local/log
```

## controlplane

This section applies to nodes with the `controlplane` role.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,28 +20,30 @@ Run the command below and check the following:


```
kubectl get nodes
kubectl get nodes -o wide
```

Example output:

```
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
etcd-0 Ready etcd 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
etcd-1 Ready etcd 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
etcd-2 Ready etcd 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
controlplane-0 Ready controlplane 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
controlplane-1 Ready controlplane 1m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
worker-0 Ready worker 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
worker-1 Ready worker 2m v1.11.5 <none> Ubuntu 16.04.5 LTS 4.4.0-138-generic docker://17.3.2
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
controlplane-0 Ready controlplane 31m v1.13.5 138.68.188.91 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
etcd-0 Ready etcd 31m v1.13.5 138.68.180.33 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
worker-0 Ready worker 30m v1.13.5 139.59.179.88 <none> Ubuntu 18.04.2 LTS 4.15.0-47-generic docker://18.9.5
```

#### Get node conditions

Run the command below to list nodes with [Node Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition)

```
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{end}}'
```

Run the command below to list nodes with [Node Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) that are active that could prevent normal operation.

```
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{if ne .type "Ready"}}{{if eq .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}'
kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{if ne .type "Ready"}}{{if eq .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{else}}{{if ne .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{": "}}{{.status}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}'
```

Example output:
Expand Down

0 comments on commit 408b829

Please sign in to comment.