In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

majiwhut · 2022-12-02T02:44:24Z

What happened?

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping, restart failed

What did you expect to happen?

three node can success restart static pod of etcd

How can we reproduce it (as minimally and precisely as possible)?

node info is:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
dmoc-fa163e3cf94c Ready control-plane,master 10h v1.23.9
dmoc-fa163e894ff5 Ready control-plane,master 10h v1.23.9
dmoc-fa163e8eaff1 NotReady control-plane,master 10h v1.23.9

systemctl cat kubelet.service

/usr/lib/systemd/system/kubelet.service

[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=docker.service

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

/etc/systemd/system/kubelet.service.d/00-admin.conf

[Service]
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/hugetlb/kubernetes.slice /sys/fs/cgroup/cpuset/kubernetes.slice
Slice=kubernetes.slice
CPUAccounting=true
MemoryAccounting=true

/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

Note: This dropin only works with kubeadm and kubelet v1.11+

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"

This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically

EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env

This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use

the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.

EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

cat /var/lib/kubelet/config.yaml | grep staticPodPath
staticPodPath: /etc/kubernetes/manifests

kubectl get pod -n kube-system | grep etcd
etcd-dmoc-fa163e3cf94c 1/1 Running 0 10m
etcd-dmoc-fa163e894ff5 1/1 Running 0 10m
etcd-dmoc-fa163e8eaff1 1/1 Running 0 9h

kubectl describe pod -n kube-system etcd-dmoc-fa163e8eaff1
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
timezone:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
HostPathType:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /data/k8s/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message

Normal Killing 15m kubelet Stopping container etcd

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:26:51Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

three node cluster, other two node's static pod is success restart

OS version

# On Linux:
$ cat /etc/os-release
NAME="PlatOS"
VERSION="1.2 (LTS)"
ID="PlatOS"
VERSION_ID="1.2"
PRETTY_NAME="PlatOS 1.2 (LTS)"
ANSI_COLOR="0;31"
$ uname -a
Linux dmoc-fa163e3cf94c 4.18.0-372.9.1.15.po1.x86_64 #1 SMP Mon Jul 4 13:53:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2022-12-02T02:44:31Z

@majiwhut: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-12-02T02:44:32Z

@majiwhut: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aimuz · 2022-12-02T09:46:57Z

Can you add the etcd logs? And the yaml file for etcd

aimuz · 2022-12-02T09:47:19Z

/kind support

majiwhut · 2022-12-02T10:00:28Z

etcd.yaml

apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.10.10.193:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:

command:
- etcd
- --advertise-client-urls=https://10.10.10.193:2379
- --auto-compaction-mode=periodic
- --auto-compaction-retention=24h
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/data/k8s/etcd
- --election-timeout=10000
- --experimental-initial-corrupt-check=true
- --heartbeat-interval=1000
- --initial-advertise-peer-urls=https://10.10.10.193:2380
- --initial-cluster=dmoc-fa163e8eaff1=https://10.10.10.194:2380,dmoc-fa163e894ff5=https://10.10.10.193:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://10.10.10.193:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://10.10.10.193:2380
- --max-snapshots=3
- --max-wals=50
- --name=dmoc-fa163e894ff5
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
  image: proxy.harbor/local-xdr/google_containers/etcd:3.5.5-0
  imagePullPolicy: IfNotPresent
  livenessProbe:
  failureThreshold: 8
  httpGet:
  host: 127.0.0.1
  path: /health
  port: 2381
  scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 15
  name: etcd
  resources:
  requests:
  cpu: "1"
  memory: 2Gi
  startupProbe:
  failureThreshold: 30
  httpGet:
  host: 127.0.0.1
  path: /health
  port: 2381
  scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 15
  volumeMounts:
- mountPath: /etc/localtime
  name: timezone
  readOnly: true
- mountPath: /data/k8s/etcd
  name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
  name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
  seccompProfile:
  type: RuntimeDefault
  volumes:
hostPath:
path: /etc/localtime
type: ""
name: timezone
hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
hostPath:
path: /data/k8s/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}

majiwhut · 2022-12-02T10:05:42Z

etcd logs
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)

neolit123 · 2022-12-02T12:39:32Z

better to ask at the support channels ir #etcd on k8s slack
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

we don't provide support in the k8s repos

/kind support
/close

k8s-ci-robot · 2022-12-02T12:39:37Z

@neolit123: Closing this issue.

In response to this:

better to ask at the support channels ir #etcd on k8s slack
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

we don't provide support in the k8s repos
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

majiwhut added the kind/bug Categorizes issue or PR as related to a bug. label Dec 2, 2022

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 2, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 2, 2022

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Dec 2, 2022

k8s-ci-robot closed this as completed Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

majiwhut commented Dec 2, 2022

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Dec 2, 2022

k8s-ci-robot commented Dec 2, 2022

aimuz commented Dec 2, 2022

aimuz commented Dec 2, 2022

majiwhut commented Dec 2, 2022

majiwhut commented Dec 2, 2022

neolit123 commented Dec 2, 2022 •

edited

Loading

k8s-ci-robot commented Dec 2, 2022

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

Comments

majiwhut commented Dec 2, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

/usr/lib/systemd/system/kubelet.service

/etc/systemd/system/kubelet.service.d/00-admin.conf

/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

Note: This dropin only works with kubeadm and kubelet v1.11+

This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically

This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use

the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Dec 2, 2022

k8s-ci-robot commented Dec 2, 2022

aimuz commented Dec 2, 2022

aimuz commented Dec 2, 2022

majiwhut commented Dec 2, 2022

majiwhut commented Dec 2, 2022

neolit123 commented Dec 2, 2022 • edited Loading

k8s-ci-robot commented Dec 2, 2022

neolit123 commented Dec 2, 2022 •

edited

Loading