Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping #114244

Closed
majiwhut opened this issue Dec 2, 2022 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@majiwhut
Copy link

majiwhut commented Dec 2, 2022

What happened?

In three node cluster, when change the /etc/kubernetes/manifests/etcd.yaml at the same time, one node's etcd static pod keep stoping, restart failed

What did you expect to happen?

three node can success restart static pod of etcd

How can we reproduce it (as minimally and precisely as possible)?

node info is:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
dmoc-fa163e3cf94c Ready control-plane,master 10h v1.23.9
dmoc-fa163e894ff5 Ready control-plane,master 10h v1.23.9
dmoc-fa163e8eaff1 NotReady control-plane,master 10h v1.23.9

systemctl cat kubelet.service

/usr/lib/systemd/system/kubelet.service

[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=docker.service

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

/etc/systemd/system/kubelet.service.d/00-admin.conf

[Service]
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/hugetlb/kubernetes.slice /sys/fs/cgroup/cpuset/kubernetes.slice
Slice=kubernetes.slice
CPUAccounting=true
MemoryAccounting=true

/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

Note: This dropin only works with kubeadm and kubelet v1.11+

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"

This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically

EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env

This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use

the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.

EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

cat /var/lib/kubelet/config.yaml | grep staticPodPath
staticPodPath: /etc/kubernetes/manifests

kubectl get pod -n kube-system | grep etcd
etcd-dmoc-fa163e3cf94c 1/1 Running 0 10m
etcd-dmoc-fa163e894ff5 1/1 Running 0 10m
etcd-dmoc-fa163e8eaff1 1/1 Running 0 9h

kubectl describe pod -n kube-system etcd-dmoc-fa163e8eaff1
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
timezone:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
HostPathType:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /data/k8s/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message


Normal Killing 15m kubelet Stopping container etcd

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:26:51Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.9", GitCommit:"c1de2d70269039fe55efb98e737d9a29f9155246", GitTreeState:"clean", BuildDate:"2022-07-13T14:19:57Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

three node cluster, other two node's static pod is success restart

OS version

# On Linux:
$ cat /etc/os-release
NAME="PlatOS"
VERSION="1.2 (LTS)"
ID="PlatOS"
VERSION_ID="1.2"
PRETTY_NAME="PlatOS 1.2 (LTS)"
ANSI_COLOR="0;31"
$ uname -a
Linux dmoc-fa163e3cf94c 4.18.0-372.9.1.15.po1.x86_64 #1 SMP Mon Jul 4 13:53:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@majiwhut majiwhut added the kind/bug Categorizes issue or PR as related to a bug. label Dec 2, 2022
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 2, 2022
@k8s-ci-robot
Copy link
Contributor

@majiwhut: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 2, 2022
@k8s-ci-robot
Copy link
Contributor

@majiwhut: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aimuz
Copy link
Contributor

aimuz commented Dec 2, 2022

Can you add the etcd logs? And the yaml file for etcd

@aimuz
Copy link
Contributor

aimuz commented Dec 2, 2022

/kind support

@majiwhut
Copy link
Author

majiwhut commented Dec 2, 2022

etcd.yaml

apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.10.10.193:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:

  • command:
    • etcd
    • --advertise-client-urls=https://10.10.10.193:2379
    • --auto-compaction-mode=periodic
    • --auto-compaction-retention=24h
    • --cert-file=/etc/kubernetes/pki/etcd/server.crt
    • --client-cert-auth=true
    • --data-dir=/data/k8s/etcd
    • --election-timeout=10000
    • --experimental-initial-corrupt-check=true
    • --heartbeat-interval=1000
    • --initial-advertise-peer-urls=https://10.10.10.193:2380
    • --initial-cluster=dmoc-fa163e8eaff1=https://10.10.10.194:2380,dmoc-fa163e894ff5=https://10.10.10.193:2380
    • --initial-cluster-state=existing
    • --key-file=/etc/kubernetes/pki/etcd/server.key
    • --listen-client-urls=https://127.0.0.1:2379,https://10.10.10.193:2379
    • --listen-metrics-urls=http://127.0.0.1:2381
    • --listen-peer-urls=https://10.10.10.193:2380
    • --max-snapshots=3
    • --max-wals=50
    • --name=dmoc-fa163e894ff5
    • --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    • --peer-client-cert-auth=true
    • --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    • --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    • --snapshot-count=10000
    • --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      image: proxy.harbor/local-xdr/google_containers/etcd:3.5.5-0
      imagePullPolicy: IfNotPresent
      livenessProbe:
      failureThreshold: 8
      httpGet:
      host: 127.0.0.1
      path: /health
      port: 2381
      scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
      name: etcd
      resources:
      requests:
      cpu: "1"
      memory: 2Gi
      startupProbe:
      failureThreshold: 30
      httpGet:
      host: 127.0.0.1
      path: /health
      port: 2381
      scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
      volumeMounts:
    • mountPath: /etc/localtime
      name: timezone
      readOnly: true
    • mountPath: /data/k8s/etcd
      name: etcd-data
    • mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
      hostNetwork: true
      priorityClassName: system-node-critical
      securityContext:
      seccompProfile:
      type: RuntimeDefault
      volumes:
  • hostPath:
    path: /etc/localtime
    type: ""
    name: timezone
  • hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
    name: etcd-certs
  • hostPath:
    path: /data/k8s/etcd
    type: DirectoryOrCreate
    name: etcd-data
    status: {}

@majiwhut
Copy link
Author

majiwhut commented Dec 2, 2022

etcd logs
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)

@neolit123
Copy link
Member

neolit123 commented Dec 2, 2022

better to ask at the support channels ir #etcd on k8s slack
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

we don't provide support in the k8s repos

/kind support
/close

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Dec 2, 2022
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

better to ask at the support channels ir #etcd on k8s slack
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

we don't provide support in the k8s repos
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants