kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102

Cytrian · 2018-12-17T12:24:47Z

How to reproduce the problem:
Set up a new demo cluster with kubeadm 1.13.1.
Create default configurationwith kubeadm config print init-defaults
Initialize cluster as usual with kubeadm init

Change the --etcd-servers list in kube-apiserver manifest to --etcd-servers=https://127.0.0.2:2379,https://127.0.0.1:2379, so that the first etcd node is unavailable ("connection refused").

The kube-apiserver is then not able to connect to etcd any more.

Last message: Unable to create storage backend: config (\u0026{ /registry [https://127.0.0.2:2379 https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true 0xc000381dd0 \u003cnil\u003e 5m0s 1m0s}), err (dial tcp 127.0.0.2:2379: connect: connection refused)\n","stream":"stderr","time":"2018-12-17T12:13:19.608822816Z"}

kube-apiserver does not start.

If I upgrade etcd to version 3.3.10, it reports an error remote error: tls: bad certificate", ServerName ""

Environment:

Kubernetes version 1.13.1
kubeadm in Vagrant box

I also experience this bug in an environment with a real etcd cluster.

/kind bug

The text was updated successfully, but these errors were encountered:

Cytrian · 2018-12-17T13:14:26Z

/sig api-machinery

yue9944882 · 2018-12-17T14:48:13Z

/remove-sig api-machinery
/sig cluster-lifecycle

yue9944882 · 2018-12-17T15:02:07Z

/sig api-machinery

apologies, just had another look and it's indeed an api-machinery issue.

// Endpoints defines a set of URLs (schemes, hosts and ports only)
// that can be used to communicate with a logical etcd cluster. For
// example, a three-node cluster could be provided like so:
//
// Endpoints: []string{
// "http://node1.example.com:2379",
// "http://node2.example.com:2379",
// "http://node3.example.com:2379",
// }
//
// If multiple endpoints are provided, the Client will attempt to
// use them all in the event that one or more of them are unusable.
//
// If Client.Sync is ever called, the Client may cache an alternate
// set of endpoints to continue operation.

we are passing the server list straight into etcd v3 client which return the error u reported. not sure if it's designed

JishanXing · 2018-12-20T02:37:15Z

This is an etcdv3 client issue. See etcd-io/etcd#9949

fedebongio · 2018-12-20T23:55:49Z

/cc @jpbetz

timothysc · 2019-02-01T19:39:47Z

/assign @timothysc @detiber

So live updating a static pod manifest is typically not recommended, was this triggered via some other operation or were you editing your static manifests?

Cytrian · 2019-02-01T20:20:03Z

No pod manifest involved here. Just a group of etcd and a kube-apiserver. The issue appeared when we rebooted the first etcd node.

alexbrand · 2019-02-04T16:02:40Z

I was able to repro this issue with the repro steps provided by @Cytrian. I also reproduced this issue with a real etcd cluster.

As @JishanXing previously mentioned, the problem is caused by a bug in the etcd v3 client library (or perhaps the grpc library). The vault project is also running into this: hashicorp/vault#4349

The problem seems to be that the etcd library uses the first node’s address as the ServerName for TLS. This means that all attempts to connect to any server other than the first will fail with a certificate validation error (i.e. cert has ${nameOfNode2} in SANs, but the client is expecting ${nameOfNode1}).

An important thing to highlight is that when the first etcd server goes down, it also takes the Kubernetes API servers down, because they fail to connect to the remaining etcd servers.

With that said, this all depends on what your etcd server certificates look like:

If you follow the kubeadm instructions to stand up a 3 node etcd cluster, you get a set of certificates that include the first node’s name and IP in the SANs (because all certs are generated on the first etcd node). Thus, you should not run into this issue.
If you have used another process to generate certificates for etcd, and the certs do not include the first node’s name and IP in the SANs, you will most likely run into this issue when the first etcd node goes down.

To reproduce the issue with a real etcd cluster:

Create a 3 node etcd cluster with TLS enabled. Each certificate should only contain the name/IP of the node that will be serving it.
Start an API server that points to the etcd cluster.
Stop the first etcd node.
API server crashes and fails to come back up

Versions:

kubeadm version: v1.13.2
kubernetes api server version: v1.13.2
etcd image: k8s.gcr.io/etcd:3.2.24

API server crash log: https://gist.github.com/alexbrand/ba86f506e4278ed2ada4504ab44b525b

I was unable to reproduce this issue with API server v1.12.5 (n.b. this was somewhat of a non-scientific test => tested by updating the image field of the API server static pod produced by kubeadm v1.13.2)

timothysc · 2019-02-04T16:42:10Z

/assign @gyuho @xiang90 @jpbetz

timothysc · 2019-02-04T16:42:37Z

@liggitt ^ FYI.

neolit123 · 2019-02-06T15:14:40Z

thank you for the investigation @alexbrand

TheDukeDK · 2019-10-07T07:20:05Z

I believe I am running into this issue or at least something similar.

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS                       PORTS               NAMES
ef5a8da82f60        3cab8e1b9802           "etcd --advertise-..."   41 seconds ago      Up 40 seconds                                    k8s_etcd_etcd-nem-docker-master01.inter-olymp.local_kube-system_beb5ba6bc28b987902829f8d53bdef31_1381
3f1cd9728ecc        3cab8e1b9802           "etcd --advertise-..."   2 minutes ago       Exited (0) 40 seconds ago                        k8s_etcd_etcd-nem-docker-master01.inter-olymp.local_kube-system_beb5ba6bc28b987902829f8d53bdef31_1380
9b991bfbf812        ab60b017e34f           "kube-apiserver --..."   5 minutes ago       Exited (255) 4 minutes ago                       k8s_kube-apiserver_kube-apiserver-nem-docker-master01.inter-olymp.local_kube-system_e4ebc726604ae399a1b7beb9adcb6b4d_1056
f66d4ae02fea        5a1527e735da           "kube-scheduler --..."   3 days ago          Up 3 days                                        k8s_kube-scheduler_kube-scheduler-nem-docker-master01.inter-olymp.local_kube-system_dd3b0cd7d636afb2b116453dc6524f26_19
482625481e36        07e068033cf2           "kube-controller-m..."   3 days ago          Up 3 days                                        k8s_kube-controller-manager_kube-controller-manager-nem-docker-master01.inter-olymp.local_kube-system_ee67cb8ee97d2edbb62c52d7615f8b47_18

I see the etcd going up and down and the api server. This cluster was created with kubeadm.

The logs from etcd show the following.

2019-10-07 07:15:17.105026 I | embed: ready to serve client requests
2019-10-07 07:15:17.105242 I | embed: serving client requests on 127.0.0.1:2379
WARNING: 2019/10/07 07:15:17 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
2019-10-07 07:16:45.903683 N | pkg/osutil: received terminated signal, shutting down...
2019-10-07 07:16:45.903767 I | etcdserver: skipped leadership transfer for single member cluster

The logs from the api server show the following.

Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I1007 06:59:46.911950       1 server.go:681] external host was not specified, using 192.168.2.227
I1007 06:59:46.912076       1 server.go:152] Version: v1.12.0
I1007 06:59:47.402070       1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I1007 06:59:47.402095       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I1007 06:59:47.402622       1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I1007 06:59:47.402631       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
F1007 07:00:07.404985       1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true true 1000 0xc42015f440 <nil> 5m0s 1m0s}), err (context deadline exceeded)

Is this the same?

gjcarneiro · 2019-10-16T17:06:15Z

There are claims here that the bug is solved, but I am seeing evidence of it not being solved in our cluster:

I1016 09:59:22.196298       1 client.go:361] parsed scheme: "endpoint"
I1016 09:59:22.196340       1 endpoint.go:66] ccResolverWrapper: sending new addresses to cc: [{https://hex-64d-pm.k2.gambit:2379 0  <nil>} {https://hex-64f-pm.k2.gambit:2379 0  <nil>} {https://hex-8c4-pm.k2.gambit:2379 0  <nil>}]
W1016 09:59:22.212143       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://hex-8c4-pm.k2.gambit:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, hex-8c4-pm.k2.gambit, hex-8c4-pm, not hex-64d-pm.k2.gambit". Reconnecting...
W1016 09:59:22.216358       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://hex-64f-pm.k2.gambit:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for hex-64f-pm, hex-64f-pm.k2.gambit, hex-64f-pm.hex10.gambit, not hex-64d-pm.k2.gambit". Reconnecting...
I1016 09:59:22.511696       1 client.go:361] parsed scheme: "endpoint"
I1016 09:59:22.511736       1 endpoint.go:66] ccResolverWrapper: sending new addresses to cc: [{https://hex-64d-pm.k2.gambit:2379 0  <nil>} {https://hex-64f-pm.k2.gambit:2379 0  <nil>} {https://hex-8c4-pm.k2.gambit:2379 0  <nil>}]
W1016 09:59:22.525738       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://hex-8c4-pm.k2.gambit:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, hex-8c4-pm.k2.gambit, hex-8c4-pm, not hex-64d-pm.k2.gambit". Reconnecting...
W1016 09:59:22.530117       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://hex-64f-pm.k2.gambit:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for hex-64f-pm, hex-64f-pm.k2.gambit, hex-64f-pm.hex10.gambit, not hex-64d-pm.k2.gambit". Reconnecting...

Are we absolutely sure the etcd client fix made it onto the release? I am testing v1.6.2.

seh · 2019-10-16T17:30:54Z

That bug is not fixed yet. The only fix was for IP address-only connections, not those using DNS names like this. We are waiting on #83968 for what will probably be Kubernetes version 1.16.3.

The workaround I'm using today is to replace my etcd server certificates with ones that use a wildcard SAN for the members in the subdomain, rather than including the given machine's DNS name as a SAN. So far, it works.

liggitt · 2019-10-16T17:37:48Z

It was fixed for IP addresses, but not DNS names (DNS name issue is tracked in #83028). Additionally, part of the fix regressed IPv6 address handling (#83550). See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#known-issues.

These two issues have been resolved in master, and #83968 is open to pick them to 1.16 (targeting 1.16.3)

gjcarneiro · 2019-10-16T17:39:37Z

Ah... thank you @seh and @liggitt, that explains it. Cheers!

Nuru · 2019-10-21T19:14:24Z

@seh Would you please explain how to change the SAN on the etcd certificates?

The workaround I'm using today is to replace my etcd server certificates with ones that use a wildcard SAN for the members in the subdomain, rather than including the given machine's DNS name as a SAN. So far, it works.

seh · 2019-10-21T21:45:05Z

Would you please explain how to change the SAN on the etcd certificates?

I generate these certificates myself using Terraform's tls provider, so it's a matter of revising the arguments passed for the tls_cert_request resource's "dns_names" attribute.

yacinelazaar · 2019-11-07T13:04:05Z

Tried with Kubernetes 1.15.3 and with 1.16.2 but its not working with neither.
This is not fixed even for IP addresses:

W1107 12:48:06.316691       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://172.17.8.202:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.0.2.15, 127.0.0.1, ::1, 172.17.8.202, not 172.17.8.201". Reconnecting...
W1107 12:48:06.328186       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://172.17.8.203:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.0.2.15, 127.0.0.1, ::1, 172.17.8.203, not 172.17.8.201". Reconnecting...

r0bj · 2019-11-07T13:31:31Z

I have similar observation as @yacinelazaar with IP addresses:
etcd 3.3.15
kubernetes 1.16.2

kube-apiserver log:

W1107 10:57:50.375677       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.12.72.135:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.12.72.135, not 10.12.70.111". Reconnecting...

This fixes a big issue with apiserver <-> etcd interaction and mutual TLS, as defined in [1] and [2]. [1]: https://github.com/etcd-io/etcd/releases/tag/v3.3.14 [2]: kubernetes/kubernetes#72102 Fixes #24

yacinelazaar · 2019-11-16T02:07:07Z

I have similar observation as @yacinelazaar with IP addresses:
etcd 3.3.15
kubernetes 1.16.2

kube-apiserver log:

W1107 10:57:50.375677       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://10.12.72.135:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.12.72.135, not 10.12.70.111". Reconnecting...

You should be fine with 1.16.2 and Etcd 3.3.15 now. I managed to get 3 masters running.

Issue resolved in >v1.16.3 kubernetes/kubernetes#72102 (comment)

Davidrjx · 2020-03-21T02:11:35Z

In my case, apiserver has been repeating warnings about connecting to external etcd cluster with tls, log snippets as follows

...
I0316 09:23:15.568757       1 client.go:354] parsed scheme: ""
I0316 09:23:15.568774       1 client.go:354] scheme "" not registered, fallback to default scheme
I0316 09:23:15.568812       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{10.90.9.32:2379 0  <nil>}]
I0316 09:23:15.568868       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.90.9.32:2379 <nil>} {10.90.9.41:2379 <nil>} {10.9
0.9.44:2379 <nil>}]
W0316 09:23:15.573002       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {10.90.9.44:2379 0  <nil>}. Err :connection er
ror: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.90.9.44, 127.0.0.1, not 10.90.9.32". Reconnecting...
W0316 09:23:15.581827       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {10.90.9.41:2379 0  <nil>}. Err :connection er
ror: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.90.9.41, 127.0.0.1, not 10.90.9.32". Reconnecting...
I0316 09:23:15.582204       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{10.90.9.32:2379 <nil>}]
W0316 09:23:15.582232       1 asm_amd64.s:1337] Failed to dial 10.90.9.44:2379: context canceled; please retry.
W0316 09:23:15.582242       1 asm_amd64.s:1337] Failed to dial 10.90.9.41:2379: context canceled; please retry.
...

My environment:
HA kubernetes 1.15.5 cluster made by kubeadm
Etcd 3.3.10 cluster with three members

but i am not sure whether my issue is releated with grpc. any answer will be apprecicated

terzonstefano · 2020-07-24T11:18:11Z

COMPLETE BACKUP AND RESTORE PROCEDURE FOR ETCD

NOTE: Check that in the file "/etc/kubernetes/etcd.yml" there is the port with the address configured like this below :

--listen-client-urls=https://127.0.0.1:2379,https://.......2379

Backup

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /tmp/snapshot-pre-boot.db

NOTE: etcdctl is a command normally found on the master

Restore ( for restore the parameter of "--initial-cluster-token" you can call it whatever you want )

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --name=master --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --data-dir /var/lib/etcd-from-backup --initial-cluster=master=https://127.0.0.1:2380 --initial-cluster-token etcd-cluster-1 --initial-advertise-peer-urls=https://127.0.0.1:2380 snapshot restore /tmp/snapshot-pre-boot.db

Change the parameters in the following file
vi /etc/kubernetes/manifests/etcd.yaml

--data-dir=/var/lib/etcd-from-backup ## Update --data-dir to use new target location (put in the previous restore command)

--initial-cluster-token=etcd-cluster-1 ## (put in the previous restore command)

volumeMounts:

mountPath: /var/lib/etcd-from-backup ## changes with the path set in the previous restore command
name: etcd-data

hostPath:
path: /var/lib/etcd-from-backup ## changes with the path set in the previous restore command
type: DirectoryOrCreate
name: etcd-data

Check the kubernetes environment after the changes

See if the container process is back on

docker ps -a | grep etcd

see if the cluster members have been recreated

ETCDCTL_API=3 etcdctl member list --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379

see if pods, deployments and services have been recreated

kubectl get pods,svc,deployments

terzonstefano · 2020-09-01T09:42:06Z

"""" INSTALL KUBERNETES WITH KUBEADM """"

!!! CHECK ALL INSTALLATION PREREQUISITES BEFORE INSTALLING kubernetes ----> https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ !!!

prerequisites: Check if it is already installed by running these commands on master nodes:

kubectl

kubeadm

prerequisites: check which version of linux you have: --> cat /etc/os-release

prerequisites: Letting iptables see bridged traffic

prerequisites: Check required ports

prerequisites: install docker on all nodes if not already installed --> https://kubernetes.io/docs/setup/production-environment/container-runtimes/

docker installation:

sudo -i

(Install Docker CE)

Set up the repository:

Install packages to allow apt to use a repository over HTTPS

apt-get update && apt-get install -y
apt-transport-https ca-certificates curl software-properties-common gnupg2

Add Docker’s official GPG key:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

Add the Docker apt repository:

add-apt-repository
"deb [arch=amd64] https://download.docker.com/linux/ubuntu
$(lsb_release -cs)
stable"

Install Docker CE

apt-get update && apt-get install -y containerd.io=1.2.13-2 docker-ce=5:19.03.113-0ubuntu-$(lsb_release -cs) docker-ce-cli=5:19.03.113-0ubuntu-$(lsb_release -cs)

Set up the Docker daemon

cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

Restart Docker

systemctl daemon-reload
systemctl restart docker

per vedere se docker è attivo:

systemctl status docker.service

############################end pre-requisites##############################################

INSTALL KUBERNETES

Return to the manual --> https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ and install the following components:

kubeadm: the command to bootstrap the cluster.

kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.

kubectl: the command line util to talk to your cluster.

I report below the steps of the manual, do it on each node:

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

systemctl daemon-reload
systemctl restart kubelet

Vedere la versione di kubeadm --> kubeadm version -o short
Vedere la versione di kubelet --> kubelet --version

we won't install "Configure cgroup driver" as you do when you don't have docker installed

Go to the bottom of the link page mentioned above "What's next":

will take you to the following link --> https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

Give this command only on the master:

do it with normal user

kubeadm init

Once installed, copy the output that appears and create the directories as follows on the master only:

do it with normal user

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

generate the token on the master --> "kubeadm token create --print-join-command" and then copy the output to all workers

Still on the master, give the following command:

kubectl get nodes ## you will see that the nodes are not active since the network for the nodes and pods has not been installed

ENABLE THE NETWORK:

user root:

Enable the following file /proc/sys/net/bridge/bridge-nf-call-iptables to "1" for all CNI PLUGINS !!! by running this command below on all nodes:

sysctl net.bridge.bridge-nf-call-iptables=1

Install the network on the master with the normal user

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Check again if the nodes are in the running state

watch kubectl get nodes

##################################################

Install the etcd and etcdctl command on master node

cat /etc/kubernetes/manifests/etcd.yml | grep -i image ## see the etcd version on master, then download it with the website instructions below

https://github.com/etcd-io/etcd/releases/

wget -q --show-progress --https-only --timestamping "https://github.com/etcd-io/etcd/releases/download/v3.4.13/etcd-v3.4.13-linux-amd64.tar.gz"

tar -xvf etcd-v3.4.13-linux-amd64.tar.gz

sudo mv etcd-v3.4.13-linux-amd64/etcd* /usr/local/bin/

cd /usr/local/bin/

chown root:root etcd*

check that everything is working

ETCDCTL_API=3 etcdctl member list --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key

terzonstefano · 2020-11-11T10:28:22Z

Installation of the METRICS SERVER on a MASTER without minkube

https://gitlab.datahub.erdc.ericsson.net/syafiq/assignment_3-4/tree/8042e20d34b883620f8d254a37a432b76f6683f7/metrics-server

copy the link of the zip file

wget https://gitlab.datahub.erdc.ericsson.net/syafiq/assignment_3-4/-/archive/8042e20d34b883620f8d254a37a432b76f6683f7/assignment_3-4-8042e20d34b883620f8d254a37a432b76f6683f7.zip
Enter on directory "assignment...." and subdirectory "metrics-server"
kubectl create -f deploy/1.8+/
wait
kubectl top nodes && kubectl top pods

terzonstefano · 2020-11-21T09:29:52Z

Deploying flannel network manually

Flannel can be added to any existing Kubernetes cluster though it's simplest to add flannel before any pods using the pod network have been started.

For Kubernetes v1.17+ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

#############################################################

Quickstart for Calico network on Kubernetes

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ ##
Initialize the master using the following command --> kubeadm init --pod-network-cidr=192.168.0.0/16
Execute the following commands to configure kubectl (also returned by kubeadm init)

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install the Tigera Calico operator and custom resource definitions

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml

Install Calico by creating the necessary custom resource

kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml

Note: Before creating this manifest, read its contents and make sure its settings are correct for your environment. For example, you may need to change the default IP pool CIDR to match your pod network CIDR

Confirm that all of the pods are running with the following command.

watch kubectl get pods -n calico-system

Note: The Tigera operator installs resources in the calico-system namespace. Other install methods may use the kube-system namespace instead

Remove the taints on the master so that you can schedule pods on it

kubectl taint nodes --all node-role.kubernetes.io/master-

Confirm that you now have a node in your cluster with the following command.

kubectl get nodes -o wide
It should return something like the following.

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
Ready master 52m v1.12.2 10.128.0.28 Ubuntu 18.04.1 LTS 4.15.0-1023-gcp docker://18.6.1

--------------

About installing calicoctl

calicoctl allows you to create, read, update, and delete Calico objects from the command line. Calico objects are stored in one of two datastores, either etcd or Kubernetes. The choice of datastore is determined at the time Calico is installed. Typically for Kubernetes installations the Kubernetes datastore is the default.

You can run calicoctl on any host with network access to the Calico datastore as either a binary or a container. For step-by-step instructions, refer to the section that corresponds to your desired deployment.

Installing calicoctl as a Kubernetes pod
Use the YAML that matches your datastore type to deploy the calicoctl container to your nodes.

etcd

kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl-etcd.yaml

Kubernetes API datastore

kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
Note: You can also view the YAML in a new tab.

You can then run commands using kubectl as shown below.

kubectl exec -ti -n kube-system calicoctl -- /calicoctl get profiles -o wide
An example response follows.

NAME TAGS
kns.default kns.default
kns.kube-system kns.kube-system
We recommend setting an alias as follows.

alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
Note: In order to use the calicoctl alias when reading manifests, redirect the file into stdin, for example:

calicoctl create -f - < my_manifest.yaml

goginenigvk · 2021-01-07T01:04:10Z

we are using kops. can someone help me on this
showing errors when checking systemctl status kubelet
Jan 07 00:42:11 ip-172-50-2-100 kubelet[5299]: E0107 00:42:11.991760 5299 kubelet.go:2268] node "ip-172-50-2-100.ec2.internal" not found
Jan 07 00:42:12 ip-172-50-2-100 kubelet[5299]: E0107 00:42:12.091892 5299 kubelet.go:2268] node "ip-172-50-2-100.ec2.internal" not found
Jan 07 00:42:12 ip-172-50-2-100 kubelet[5299]: W0107 00:42:12.167515 5299 container.go:409] Failed to create summary reader for "/system.slice/docker-healthcheck.service": none
of the resources are being tracked.

3.5 seems to have regressed on kubernetes/kubernetes#72102

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 17, 2018

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 17, 2018

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Dec 17, 2018

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Dec 17, 2018

k8s-ci-robot assigned detiber and timothysc Feb 1, 2019

timothysc assigned alexbrand and unassigned detiber Feb 1, 2019

k8s-ci-robot assigned gyuho, jpbetz and xiang90 Feb 4, 2019

timothysc added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 4, 2019

Nuru mentioned this issue Oct 21, 2019

Are kops clusters subject to Kubernetes known issue: etcd client balancer with secure endpoints kubernetes/kops#7816

Closed

jhunt mentioned this issue Nov 11, 2019

Upgrade to etcd 3.3.14 (minimum) to fix gRPC etcd security issues jhunt/k8s-boshrelease#24

Closed

piratos mentioned this issue Jan 10, 2020

salt: use all etcd servers as apiserver backends scality/metalk8s#2181

Merged

george-angel added a commit to utilitywarehouse/tf_kube_ignition that referenced this issue Feb 27, 2020

sys: rm SAN hack including 0 etcd host in all certs

4943e8d

Issue resolved in >v1.16.3 kubernetes/kubernetes#72102 (comment)

george-angel mentioned this issue Feb 27, 2020

sys: rm SAN hack including 0 etcd host in all certs utilitywarehouse/tf_kube_ignition#107

Merged

javendo mentioned this issue Aug 21, 2020

Leader election when Leader Machine goes down #81837

Closed

Michael-Sinz mentioned this issue Aug 26, 2020

chore: rev default etcd version to 3.4.13 Azure/aks-engine#3748

Closed

4 tasks

yelhouti mentioned this issue Jan 13, 2022

ApiServer crashed when first node becomes unresponsive kubernetes-sigs/kubespray#8423

Closed

justinsb added a commit to justinsb/etcdadm that referenced this issue Feb 4, 2023

Use etcd 3.4 client

bd3682d

3.5 seems to have regressed on kubernetes/kubernetes#72102

justinsb added a commit to justinsb/etcdadm that referenced this issue Feb 4, 2023

Use etcd 3.4 client

14051c3

3.5 seems to have regressed on kubernetes/kubernetes#72102

justinsb mentioned this issue Feb 4, 2023

Use etcd 3.4 client kubernetes-sigs/etcdadm#363

Merged

justinsb added a commit to justinsb/etcdadm that referenced this issue Feb 4, 2023

Use etcd 3.4 client

5882193

3.5 seems to have regressed on kubernetes/kubernetes#72102

justinsb added a commit to justinsb/etcdadm that referenced this issue Feb 4, 2023

Use etcd 3.4 client

5a13ba4

3.5 seems to have regressed on kubernetes/kubernetes#72102

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102

kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102

Cytrian commented Dec 17, 2018

Cytrian commented Dec 17, 2018

yue9944882 commented Dec 17, 2018

yue9944882 commented Dec 17, 2018

JishanXing commented Dec 20, 2018 •

edited

fedebongio commented Dec 20, 2018

timothysc commented Feb 1, 2019

Cytrian commented Feb 1, 2019

alexbrand commented Feb 4, 2019

timothysc commented Feb 4, 2019

timothysc commented Feb 4, 2019

neolit123 commented Feb 6, 2019

TheDukeDK commented Oct 7, 2019 •

edited

gjcarneiro commented Oct 16, 2019 •

edited

seh commented Oct 16, 2019

liggitt commented Oct 16, 2019

gjcarneiro commented Oct 16, 2019

Nuru commented Oct 21, 2019

seh commented Oct 21, 2019

yacinelazaar commented Nov 7, 2019 •

edited

r0bj commented Nov 7, 2019 •

edited

yacinelazaar commented Nov 16, 2019

Davidrjx commented Mar 21, 2020

terzonstefano commented Jul 24, 2020 •

edited

terzonstefano commented Sep 1, 2020 •

edited

terzonstefano commented Nov 11, 2020 •

edited

terzonstefano commented Nov 21, 2020

goginenigvk commented Jan 7, 2021

kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102

kube-apiserver 1.13.x refuses to work when first etcd-server is not available. #72102

Comments

Cytrian commented Dec 17, 2018

Cytrian commented Dec 17, 2018

yue9944882 commented Dec 17, 2018

yue9944882 commented Dec 17, 2018

JishanXing commented Dec 20, 2018 • edited

fedebongio commented Dec 20, 2018

timothysc commented Feb 1, 2019

Cytrian commented Feb 1, 2019

alexbrand commented Feb 4, 2019

timothysc commented Feb 4, 2019

timothysc commented Feb 4, 2019

neolit123 commented Feb 6, 2019

TheDukeDK commented Oct 7, 2019 • edited

gjcarneiro commented Oct 16, 2019 • edited

seh commented Oct 16, 2019

liggitt commented Oct 16, 2019

gjcarneiro commented Oct 16, 2019

Nuru commented Oct 21, 2019

seh commented Oct 21, 2019

yacinelazaar commented Nov 7, 2019 • edited

r0bj commented Nov 7, 2019 • edited

yacinelazaar commented Nov 16, 2019

Davidrjx commented Mar 21, 2020

terzonstefano commented Jul 24, 2020 • edited

terzonstefano commented Sep 1, 2020 • edited

(Install Docker CE)

Set up the repository:

Install packages to allow apt to use a repository over HTTPS

Add Docker’s official GPG key:

Add the Docker apt repository:

Install Docker CE

Set up the Docker daemon

Restart Docker

terzonstefano commented Nov 11, 2020 • edited

terzonstefano commented Nov 21, 2020

goginenigvk commented Jan 7, 2021

JishanXing commented Dec 20, 2018 •

edited

TheDukeDK commented Oct 7, 2019 •

edited

gjcarneiro commented Oct 16, 2019 •

edited

yacinelazaar commented Nov 7, 2019 •

edited

r0bj commented Nov 7, 2019 •

edited

terzonstefano commented Jul 24, 2020 •

edited

terzonstefano commented Sep 1, 2020 •

edited

terzonstefano commented Nov 11, 2020 •

edited