Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful Node Shutdown does not update endpoints for terminating pods #116965

Open
mikekap opened this issue Mar 28, 2023 · 48 comments · May be fixed by #117822 or #125774
Open

Graceful Node Shutdown does not update endpoints for terminating pods #116965

mikekap opened this issue Mar 28, 2023 · 48 comments · May be fixed by #117822 or #125774
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@mikekap
Copy link
Contributor

mikekap commented Mar 28, 2023

What happened?

When graceful node shutdown is enabled, I expect that services are drained appropriately during pod shutdown (i.e. the same way that kubectl delete pod would work). However this doesn't seem to be the case - the pod never enters the Terminating state according to kubectl and doesn't get removed from the service endpoints until it enters the Completed or Failed state. This causes connection errors and lost requests in Cluster-routed services.

What did you expect to happen?

After pod termination starts, the EndpointSlice should be updated to have the pod that is terminating removed or marked terminating, even before the pod is fully shut down. This is what happens when a pod gets removed for any other reason (e.g. kubectl delete pod).

How can we reproduce it (as minimally and precisely as possible)?

You can see the repro at https://github.com/mikekap/vagrant-kubeadm-kubernetes/tree/graceful-shutdown-repro . The repro sets up a 2 node cluster with graceful shutdown enabled and a simple deployment/service.

Here is the deployment/service yaml pasted again
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo-deployment
spec:
  selector:
    matchLabels:
      app: echo
  replicas: 2
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
      - name: echo
        image: mendhak/http-https-echo
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "echo 'Begin shutdown' > /proc/1/fd/1; sleep 20; echo 'End shutdown' > /proc/1/fd/1; exit 0"]
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: echo
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

To reproduce:

  • Run the deployment, preferably one pod on each node.
  • Run watch -n 1 'kubectl describe endpointslice my-service-5j2nw' in a separate terminal (you'll need to figure out the right name unfortunately)
  • Use the virtualbox UI to send an ACPI shutdown to one of the nodes.

Within 20 seconds (the preStop hook duration), the EndpointSlice should be updated to have the pod that is terminating removed. This is what happens when a pod gets removed for any other reason (e.g. kubectl delete pod).

To ensure that everything else worked, you can reboot the node and see the journal logs (via something like journalctl -b -1) of the shutdown sequence triggering the kubelet graceful shutdown logic.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux master-node 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@mikekap mikekap added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 28, 2023
@mikekap
Copy link
Contributor Author

mikekap commented Mar 28, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 28, 2023
@olderTaoist
Copy link
Contributor

/assign

@olderTaoist
Copy link
Contributor

olderTaoist commented Mar 29, 2023

When graceful node shutdown is enabled, I expect that services are drained appropriately during pod shutdown (i.e. the same way that kubectl delete pod would work). However this doesn't seem to be the case - the pod never enters the Terminating state according to kubectl and doesn't get removed from the service endpoints until it enters the Completed or Failed state. This causes connection errors and lost requests in Cluster-routed services.

i don't know your backend implementation of service,it’s significant feature to route backend healthy backend。the terminating pod in EndpointSlice is not ready as described below,shouldn't route new traffic to it。

Name:         my-service-rxsv4
Namespace:    default
Labels:       endpointslice.kubernetes.io/managed-by=endpointslice-controller.k8s.io
              kubernetes.io/service-name=my-service
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2023-03-29T08:41:01Z
AddressType:  IPv4
Ports:
  Name     Port  Protocol
  ----     ----  --------
  <unset>  8080  TCP
Endpoints:
  - Addresses:  10.244.2.2
    Conditions:
      Ready:    false
    Hostname:   <unset>
    TargetRef:  Pod/echo-deployment-754f54d797-hjptn
    Topology:   kubernetes.io/hostname=kubernetes1.23.8-worker3
  - Addresses:  10.244.1.16
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/echo-deployment-754f54d797-hstr7
    Topology:   kubernetes.io/hostname=kubernetes1.23.8-worker

but it‘s unreasonable terminating pod in EndpointSlice,should like Endpoints。

@aojea
Copy link
Member

aojea commented Mar 29, 2023

Within 5 seconds (the preStop hook duration), the EndpointSlice should be updated to have the pod that is terminating removed. This is what happens when a pod gets removed for any other reason (e.g. kubectl delete pod).

It will be important to check the logs of the endpointslice controller in the controller-manager and correlate with the pod and node states

@mikekap
Copy link
Contributor Author

mikekap commented Mar 29, 2023

Terminating pods appearing in EndpointSlice might be intentional due to KEP-1672, though that's just a guess from the name.

However, I'm not seeing what you posted above. What I'm seeing is both Endpoints & EndpointSlice are not updated for the duration of the shutdown - only when the shutdown ends. The readiness check continues reporting true. The only thing that gets updated is the status of the Node.

I updated the repro a little bit to print the preStop hook sequence in the log & an increase the grace period. It doesn't materially change anything. Here's the video of what happens from the repo I linked on my computer, with controller logs: https://drive.google.com/file/d/1-_hsvkJq3cOtLxEzQ2R8Fy4MEXka2Z82/view?usp=sharing . Both Endpoint and EndpointSlice have the terminating pod listed after "Begin shutdown" appears.

@SergeyKanzhelev
Copy link
Member

Do you see the terminating flag value on Endpoint slice (https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#terminating) changed for the terminating pod?

/triage needs-information

Also if you can share kubelet.log and @aojea requested the logs of the endpointslice controller - it will be helpful

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Mar 29, 2023
@mikekap
Copy link
Contributor Author

mikekap commented Mar 29, 2023

The EndpointSlice conditions remain as these for the entire duration:

  conditions:
    ready: true
    serving: true
    terminating: false

I captured the logs in the video I posted above as well. Here are the logs pasted (from a second attempt). I shut down the node @ 18:02:58. Spoiler: there's nothing there. Is there be a flag to turn on more specific logs for the endpoint slice controller?

kube-controller-manager.log
I0329 17:44:20.259182       1 serving.go:348] Generated self-signed cert in-memory
I0329 17:44:21.113389       1 controllermanager.go:182] Version: v1.26.3
I0329 17:44:21.113680       1 controllermanager.go:184] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0329 17:44:21.114684       1 secure_serving.go:210] Serving securely on 127.0.0.1:10257
I0329 17:44:21.115427       1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-controller-manager...
I0329 17:44:21.115695       1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt"
I0329 17:44:21.115791       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0329 17:44:21.115866       1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
E0329 17:44:22.929615       1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: leases.coordination.k8s.io "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
I0329 17:44:25.227809       1 leaderelection.go:258] successfully acquired lease kube-system/kube-controller-manager
I0329 17:44:25.228405       1 event.go:294] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="master-node_a9fb8898-c5ec-4850-8f26-8abe3edd4cb9 became leader"
I0329 17:44:25.253865       1 shared_informer.go:273] Waiting for caches to sync for tokens
I0329 17:44:25.273293       1 controllermanager.go:622] Started "podgc"
I0329 17:44:25.273722       1 gc_controller.go:102] Starting GC controller
I0329 17:44:25.273759       1 shared_informer.go:273] Waiting for caches to sync for GC
I0329 17:44:25.283313       1 node_lifecycle_controller.go:492] Controller will reconcile labels.
I0329 17:44:25.283388       1 controllermanager.go:622] Started "nodelifecycle"
W0329 17:44:25.283399       1 core.go:221] configure-cloud-routes is set, but no cloud provider specified. Will not configure cloud provider routes.
W0329 17:44:25.283404       1 controllermanager.go:600] Skipping "route"
I0329 17:44:25.283542       1 node_lifecycle_controller.go:527] Sending events to api server.
I0329 17:44:25.283598       1 node_lifecycle_controller.go:538] Starting node controller
I0329 17:44:25.283612       1 shared_informer.go:273] Waiting for caches to sync for taint
I0329 17:44:25.292067       1 controllermanager.go:622] Started "ttl-after-finished"
I0329 17:44:25.292225       1 ttlafterfinished_controller.go:104] Starting TTL after finished controller
I0329 17:44:25.292476       1 shared_informer.go:273] Waiting for caches to sync for TTL after finished
I0329 17:44:25.321570       1 controllermanager.go:622] Started "attachdetach"
I0329 17:44:25.322201       1 attach_detach_controller.go:328] Starting attach detach controller
I0329 17:44:25.322611       1 shared_informer.go:273] Waiting for caches to sync for attach detach
I0329 17:44:25.332003       1 controllermanager.go:622] Started "endpointslicemirroring"
I0329 17:44:25.332246       1 endpointslicemirroring_controller.go:211] Starting EndpointSliceMirroring controller
I0329 17:44:25.332261       1 shared_informer.go:273] Waiting for caches to sync for endpoint_slice_mirroring
I0329 17:44:25.342539       1 controllermanager.go:622] Started "daemonset"
I0329 17:44:25.343172       1 daemon_controller.go:265] Starting daemon sets controller
I0329 17:44:25.343239       1 shared_informer.go:273] Waiting for caches to sync for daemon sets
I0329 17:44:25.351166       1 controllermanager.go:622] Started "cronjob"
I0329 17:44:25.351346       1 cronjob_controllerv2.go:137] "Starting cronjob controller v2"
I0329 17:44:25.351678       1 shared_informer.go:273] Waiting for caches to sync for cronjob
I0329 17:44:25.354670       1 shared_informer.go:280] Caches are synced for tokens
E0329 17:44:25.360123       1 core.go:92] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail
W0329 17:44:25.360207       1 controllermanager.go:600] Skipping "service"
E0329 17:44:25.362385       1 core.go:207] failed to start cloud node lifecycle controller: no cloud provider provided
W0329 17:44:25.362429       1 controllermanager.go:600] Skipping "cloud-node-lifecycle"
I0329 17:44:25.368959       1 controllermanager.go:622] Started "clusterrole-aggregation"
I0329 17:44:25.369352       1 clusterroleaggregation_controller.go:188] Starting ClusterRoleAggregator
I0329 17:44:25.369655       1 shared_informer.go:273] Waiting for caches to sync for ClusterRoleAggregator
I0329 17:44:25.378420       1 controllermanager.go:622] Started "pv-protection"
I0329 17:44:25.378542       1 pv_protection_controller.go:75] Starting PV protection controller
I0329 17:44:25.378785       1 shared_informer.go:273] Waiting for caches to sync for PV protection
I0329 17:44:25.482651       1 controllermanager.go:622] Started "ephemeral-volume"
I0329 17:44:25.482726       1 controller.go:169] Starting ephemeral volume controller
I0329 17:44:25.482734       1 shared_informer.go:273] Waiting for caches to sync for ephemeral
I0329 17:44:25.633281       1 controllermanager.go:622] Started "endpoint"
I0329 17:44:25.633495       1 endpoints_controller.go:178] Starting endpoint controller
I0329 17:44:25.633539       1 shared_informer.go:273] Waiting for caches to sync for endpoint
I0329 17:44:25.975502       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for limitranges
I0329 17:44:25.975777       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for controllerrevisions.apps
I0329 17:44:25.976011       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for podtemplates
W0329 17:44:25.976097       1 shared_informer.go:550] resyncPeriod 16h24m13.897255872s is smaller than resyncCheckPeriod 20h37m54.968684142s and the informer has already started. Changing it to 20h37m54.968684142s
I0329 17:44:25.976825       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for serviceaccounts
I0329 17:44:25.976925       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for networkpolicies.networking.k8s.io
I0329 17:44:25.976951       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for ingresses.networking.k8s.io
I0329 17:44:25.976974       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for roles.rbac.authorization.k8s.io
I0329 17:44:25.977007       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for deployments.apps
I0329 17:44:25.977215       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for csistoragecapacities.storage.k8s.io
I0329 17:44:25.977547       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for leases.coordination.k8s.io
I0329 17:44:25.977938       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for endpoints
I0329 17:44:25.978454       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for horizontalpodautoscalers.autoscaling
I0329 17:44:25.978737       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for poddisruptionbudgets.policy
I0329 17:44:25.979088       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for daemonsets.apps
I0329 17:44:25.979255       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for cronjobs.batch
I0329 17:44:25.979450       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for jobs.batch
I0329 17:44:25.985947       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for rolebindings.rbac.authorization.k8s.io
I0329 17:44:25.986052       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for replicasets.apps
I0329 17:44:25.986079       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for statefulsets.apps
I0329 17:44:25.986106       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for endpointslices.discovery.k8s.io
I0329 17:44:25.986128       1 controllermanager.go:622] Started "resourcequota"
I0329 17:44:25.986550       1 resource_quota_controller.go:277] Starting resource quota controller
I0329 17:44:25.986560       1 shared_informer.go:273] Waiting for caches to sync for resource quota
I0329 17:44:25.986781       1 resource_quota_monitor.go:295] QuotaMonitor running
I0329 17:44:26.187306       1 controllermanager.go:622] Started "garbagecollector"
I0329 17:44:26.187498       1 garbagecollector.go:154] Starting garbage collector controller
I0329 17:44:26.187582       1 shared_informer.go:273] Waiting for caches to sync for garbage collector
I0329 17:44:26.187626       1 graph_builder.go:291] GraphBuilder running
I0329 17:44:26.434244       1 controllermanager.go:622] Started "replicaset"
I0329 17:44:26.434361       1 replica_set.go:201] Starting replicaset controller
I0329 17:44:26.434371       1 shared_informer.go:273] Waiting for caches to sync for ReplicaSet
I0329 17:44:26.635794       1 controllermanager.go:622] Started "disruption"
I0329 17:44:26.635830       1 disruption.go:424] Sending events to api server.
I0329 17:44:26.635867       1 disruption.go:435] Starting disruption controller
I0329 17:44:26.635873       1 shared_informer.go:273] Waiting for caches to sync for disruption
I0329 17:44:26.788951       1 controllermanager.go:622] Started "replicationcontroller"
I0329 17:44:26.789052       1 replica_set.go:201] Starting replicationcontroller controller
I0329 17:44:26.789060       1 shared_informer.go:273] Waiting for caches to sync for ReplicationController
I0329 17:44:26.935227       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-serving"
I0329 17:44:26.935244       1 shared_informer.go:273] Waiting for caches to sync for certificate-csrsigning-kubelet-serving
I0329 17:44:26.935262       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0329 17:44:26.935602       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-client"
I0329 17:44:26.935611       1 shared_informer.go:273] Waiting for caches to sync for certificate-csrsigning-kubelet-client
I0329 17:44:26.935623       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0329 17:44:26.935867       1 certificate_controller.go:112] Starting certificate controller "csrsigning-kube-apiserver-client"
I0329 17:44:26.935876       1 shared_informer.go:273] Waiting for caches to sync for certificate-csrsigning-kube-apiserver-client
I0329 17:44:26.935889       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0329 17:44:26.936108       1 controllermanager.go:622] Started "csrsigning"
I0329 17:44:26.936161       1 certificate_controller.go:112] Starting certificate controller "csrsigning-legacy-unknown"
I0329 17:44:26.936166       1 shared_informer.go:273] Waiting for caches to sync for certificate-csrsigning-legacy-unknown
I0329 17:44:26.936183       1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key"
I0329 17:44:36.998464       1 range_allocator.go:109] No Secondary Service CIDR provided. Skipping filtering out secondary service addresses.
I0329 17:44:36.998764       1 controllermanager.go:622] Started "nodeipam"
I0329 17:44:36.999126       1 node_ipam_controller.go:155] Starting ipam controller
I0329 17:44:36.999503       1 shared_informer.go:273] Waiting for caches to sync for node
E0329 17:44:37.082324       1 namespaced_resources_deleter.go:162] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0329 17:44:37.082557       1 controllermanager.go:622] Started "namespace"
I0329 17:44:37.082702       1 namespace_controller.go:195] Starting namespace controller
I0329 17:44:37.083406       1 shared_informer.go:273] Waiting for caches to sync for namespace
I0329 17:44:37.104763       1 controllermanager.go:622] Started "horizontalpodautoscaling"
I0329 17:44:37.105222       1 horizontal.go:181] Starting HPA controller
I0329 17:44:37.105300       1 shared_informer.go:273] Waiting for caches to sync for HPA
I0329 17:44:37.116242       1 controllermanager.go:622] Started "pvc-protection"
I0329 17:44:37.116599       1 pvc_protection_controller.go:99] "Starting PVC protection controller"
I0329 17:44:37.117124       1 shared_informer.go:273] Waiting for caches to sync for PVC protection
I0329 17:44:37.130583       1 controllermanager.go:622] Started "statefulset"
I0329 17:44:37.130734       1 stateful_set.go:152] Starting stateful set controller
I0329 17:44:37.130891       1 shared_informer.go:273] Waiting for caches to sync for stateful set
I0329 17:44:37.132483       1 controllermanager.go:622] Started "csrcleaner"
I0329 17:44:37.132702       1 cleaner.go:82] Starting CSR cleaner controller
I0329 17:44:37.147976       1 controllermanager.go:622] Started "bootstrapsigner"
I0329 17:44:37.148274       1 shared_informer.go:273] Waiting for caches to sync for bootstrap_signer
I0329 17:44:37.160552       1 controllermanager.go:622] Started "persistentvolume-expander"
I0329 17:44:37.160972       1 expand_controller.go:340] Starting expand controller
I0329 17:44:37.161019       1 shared_informer.go:273] Waiting for caches to sync for expand
I0329 17:44:37.170562       1 controllermanager.go:622] Started "serviceaccount"
I0329 17:44:37.170773       1 serviceaccounts_controller.go:111] Starting service account controller
I0329 17:44:37.170989       1 shared_informer.go:273] Waiting for caches to sync for service account
I0329 17:44:37.181090       1 controllermanager.go:622] Started "job"
I0329 17:44:37.181387       1 job_controller.go:191] Starting job controller
I0329 17:44:37.181481       1 shared_informer.go:273] Waiting for caches to sync for job
I0329 17:44:37.184259       1 controllermanager.go:622] Started "csrapproving"
I0329 17:44:37.184403       1 certificate_controller.go:112] Starting certificate controller "csrapproving"
I0329 17:44:37.184444       1 shared_informer.go:273] Waiting for caches to sync for certificate-csrapproving
I0329 17:44:37.289238       1 controllermanager.go:622] Started "tokencleaner"
I0329 17:44:37.289427       1 tokencleaner.go:111] Starting token cleaner controller
I0329 17:44:37.289563       1 shared_informer.go:273] Waiting for caches to sync for token_cleaner
I0329 17:44:37.289922       1 shared_informer.go:280] Caches are synced for token_cleaner
I0329 17:44:37.442330       1 controllermanager.go:622] Started "persistentvolume-binder"
I0329 17:44:37.442464       1 pv_controller_base.go:318] Starting persistent volume controller
I0329 17:44:37.442476       1 shared_informer.go:273] Waiting for caches to sync for persistent volume
I0329 17:44:37.591165       1 controllermanager.go:622] Started "endpointslice"
I0329 17:44:37.591430       1 endpointslice_controller.go:257] Starting endpoint slice controller
I0329 17:44:37.591695       1 shared_informer.go:273] Waiting for caches to sync for endpoint_slice
I0329 17:44:37.738674       1 controllermanager.go:622] Started "deployment"
I0329 17:44:37.738835       1 deployment_controller.go:154] "Starting controller" controller="deployment"
I0329 17:44:37.738984       1 shared_informer.go:273] Waiting for caches to sync for deployment
I0329 17:44:37.889823       1 controllermanager.go:622] Started "ttl"
I0329 17:44:37.889984       1 ttl_controller.go:120] Starting TTL controller
I0329 17:44:37.890032       1 shared_informer.go:273] Waiting for caches to sync for TTL
I0329 17:44:38.038471       1 controllermanager.go:622] Started "root-ca-cert-publisher"
I0329 17:44:38.038762       1 publisher.go:101] Starting root CA certificate configmap publisher
I0329 17:44:38.038810       1 shared_informer.go:273] Waiting for caches to sync for crt configmap
I0329 17:44:38.041054       1 shared_informer.go:273] Waiting for caches to sync for resource quota
W0329 17:44:38.057002       1 actual_state_of_world.go:541] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="master-node" does not exist
I0329 17:44:38.074322       1 shared_informer.go:280] Caches are synced for ClusterRoleAggregator
I0329 17:44:38.081546       1 shared_informer.go:280] Caches are synced for job
I0329 17:44:38.087283       1 shared_informer.go:280] Caches are synced for certificate-csrapproving
I0329 17:44:38.089101       1 shared_informer.go:280] Caches are synced for ReplicationController
I0329 17:44:38.091465       1 shared_informer.go:280] Caches are synced for TTL
I0329 17:44:38.094515       1 shared_informer.go:280] Caches are synced for namespace
I0329 17:44:38.095778       1 shared_informer.go:280] Caches are synced for ephemeral
I0329 17:44:38.095964       1 shared_informer.go:280] Caches are synced for taint
I0329 17:44:38.096046       1 node_lifecycle_controller.go:1438] Initializing eviction metric for zone: 
W0329 17:44:38.096119       1 node_lifecycle_controller.go:1053] Missing timestamp for Node master-node. Assuming now as a timestamp.
I0329 17:44:38.096164       1 node_lifecycle_controller.go:1254] Controller detected that zone  is now in state Normal.
I0329 17:44:38.096607       1 taint_manager.go:206] "Starting NoExecuteTaintManager"
I0329 17:44:38.096667       1 taint_manager.go:211] "Sending events to api server"
I0329 17:44:38.096945       1 event.go:294] "Event occurred" object="master-node" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node master-node event: Registered Node master-node in Controller"
E0329 17:44:38.097146       1 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0329 17:44:38.087915       1 shared_informer.go:280] Caches are synced for service account
I0329 17:44:38.087927       1 shared_informer.go:280] Caches are synced for GC
I0329 17:44:38.087985       1 shared_informer.go:280] Caches are synced for PV protection
I0329 17:44:38.099052       1 shared_informer.go:280] Caches are synced for TTL after finished
I0329 17:44:38.100587       1 shared_informer.go:280] Caches are synced for node
I0329 17:44:38.100655       1 range_allocator.go:167] Sending events to api server.
I0329 17:44:38.100701       1 range_allocator.go:171] Starting range CIDR allocator
I0329 17:44:38.100725       1 shared_informer.go:273] Waiting for caches to sync for cidrallocator
I0329 17:44:38.100748       1 shared_informer.go:280] Caches are synced for cidrallocator
I0329 17:44:38.105938       1 shared_informer.go:280] Caches are synced for HPA
E0329 17:44:38.114577       1 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0329 17:44:38.117716       1 shared_informer.go:280] Caches are synced for PVC protection
I0329 17:44:38.119761       1 shared_informer.go:273] Waiting for caches to sync for garbage collector
I0329 17:44:38.124153       1 range_allocator.go:372] Set node master-node PodCIDR to [172.16.0.0/24]
I0329 17:44:38.133904       1 shared_informer.go:280] Caches are synced for stateful set
I0329 17:44:38.137337       1 shared_informer.go:280] Caches are synced for certificate-csrsigning-legacy-unknown
I0329 17:44:38.137488       1 shared_informer.go:280] Caches are synced for ReplicaSet
I0329 17:44:38.137560       1 shared_informer.go:280] Caches are synced for disruption
I0329 17:44:38.137669       1 shared_informer.go:280] Caches are synced for certificate-csrsigning-kube-apiserver-client
I0329 17:44:38.137897       1 shared_informer.go:280] Caches are synced for certificate-csrsigning-kubelet-serving
I0329 17:44:38.137973       1 shared_informer.go:280] Caches are synced for certificate-csrsigning-kubelet-client
I0329 17:44:38.139471       1 shared_informer.go:280] Caches are synced for crt configmap
I0329 17:44:38.139529       1 shared_informer.go:280] Caches are synced for deployment
I0329 17:44:38.144566       1 shared_informer.go:280] Caches are synced for daemon sets
I0329 17:44:38.144690       1 shared_informer.go:280] Caches are synced for persistent volume
I0329 17:44:38.149458       1 shared_informer.go:280] Caches are synced for bootstrap_signer
I0329 17:44:38.152381       1 shared_informer.go:280] Caches are synced for cronjob
I0329 17:44:38.161708       1 shared_informer.go:280] Caches are synced for expand
I0329 17:44:38.223092       1 shared_informer.go:280] Caches are synced for attach detach
I0329 17:44:38.291183       1 shared_informer.go:280] Caches are synced for resource quota
I0329 17:44:38.297172       1 shared_informer.go:280] Caches are synced for endpoint_slice
I0329 17:44:38.333085       1 shared_informer.go:280] Caches are synced for endpoint_slice_mirroring
I0329 17:44:38.334962       1 shared_informer.go:280] Caches are synced for endpoint
I0329 17:44:38.342374       1 shared_informer.go:280] Caches are synced for resource quota
I0329 17:44:38.688777       1 shared_informer.go:280] Caches are synced for garbage collector
I0329 17:44:38.688868       1 garbagecollector.go:163] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I0329 17:44:38.721936       1 shared_informer.go:280] Caches are synced for garbage collector
I0329 17:44:38.962531       1 event.go:294] "Event occurred" object="kube-system/calico-kube-controllers" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set calico-kube-controllers-57b57c56f to 1"
I0329 17:44:38.967324       1 event.go:294] "Event occurred" object="kube-system/metrics-server" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set metrics-server-85dfd9f587 to 1"
I0329 17:44:38.967373       1 event.go:294] "Event occurred" object="default/echo-deployment" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set echo-deployment-d658654dc to 2"
I0329 17:44:38.967568       1 event.go:294] "Event occurred" object="kube-system/coredns" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set coredns-787d4945fb to 2"
I0329 17:44:39.045181       1 event.go:294] "Event occurred" object="kube-system/kube-proxy" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: kube-proxy-r4z4s"
I0329 17:44:39.052824       1 event.go:294] "Event occurred" object="kube-system/calico-node" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: calico-node-gmchc"
I0329 17:44:39.272219       1 event.go:294] "Event occurred" object="kube-system/calico-kube-controllers-57b57c56f" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: calico-kube-controllers-57b57c56f-2mld7"
I0329 17:44:39.281771       1 event.go:294] "Event occurred" object="kube-system/metrics-server-85dfd9f587" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: metrics-server-85dfd9f587-526gn"
I0329 17:44:39.298651       1 event.go:294] "Event occurred" object="kube-system/coredns-787d4945fb" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-787d4945fb-cbjgc"
I0329 17:44:39.376524       1 event.go:294] "Event occurred" object="kube-system/coredns-787d4945fb" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-787d4945fb-4wj8k"
I0329 17:44:39.377219       1 event.go:294] "Event occurred" object="default/echo-deployment-d658654dc" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: echo-deployment-d658654dc-nq5wx"
I0329 17:44:39.402610       1 event.go:294] "Event occurred" object="default/echo-deployment-d658654dc" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: echo-deployment-d658654dc-n96q7"
E0329 17:45:08.386017       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0329 17:45:08.386237       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for networksets.crd.projectcalico.org
I0329 17:45:08.386282       1 resource_quota_monitor.go:218] QuotaMonitor created object count evaluator for networkpolicies.crd.projectcalico.org
I0329 17:45:08.386528       1 shared_informer.go:273] Waiting for caches to sync for resource quota
I0329 17:45:08.690887       1 shared_informer.go:280] Caches are synced for resource quota
W0329 17:45:08.767657       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0329 17:45:08.775842       1 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0329 17:45:08.789507       1 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0329 17:45:08.790986       1 shared_informer.go:273] Waiting for caches to sync for garbage collector
I0329 17:45:09.491928       1 shared_informer.go:280] Caches are synced for garbage collector
E0329 17:45:38.707225       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0329 17:45:39.587820       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0329 17:46:08.731545       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0329 17:46:09.642296       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0329 17:46:38.746253       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0329 17:46:39.715956       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
W0329 17:46:42.059229       1 actual_state_of_world.go:541] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="worker-node01" does not exist
I0329 17:46:42.103010       1 range_allocator.go:372] Set node worker-node01 PodCIDR to [172.16.1.0/24]
I0329 17:46:42.127848       1 event.go:294] "Event occurred" object="kube-system/kube-proxy" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: kube-proxy-d94qh"
I0329 17:46:42.132599       1 event.go:294] "Event occurred" object="kube-system/calico-node" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: calico-node-ffp7j"
W0329 17:46:42.503162       1 topologycache.go:232] Can't get CPU or zone information for worker-node01 node
W0329 17:46:43.122804       1 node_lifecycle_controller.go:1053] Missing timestamp for Node worker-node01. Assuming now as a timestamp.
I0329 17:46:43.123023       1 event.go:294] "Event occurred" object="worker-node01" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node worker-node01 event: Registered Node worker-node01 in Controller"
E0329 17:47:08.779181       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0329 17:47:09.787004       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
E0329 17:47:38.803273       1 resource_quota_controller.go:417] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0329 17:47:39.815692       1 garbagecollector.go:752] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
W0329 17:49:36.642933       1 topologycache.go:232] Can't get CPU or zone information for worker-node01 node
W0329 17:49:36.643821       1 actual_state_of_world.go:541] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="worker-node02" does not exist
I0329 17:49:36.664881       1 range_allocator.go:372] Set node worker-node02 PodCIDR to [172.16.2.0/24]
I0329 17:49:36.697228       1 event.go:294] "Event occurred" object="kube-system/kube-proxy" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: kube-proxy-dkw9j"
I0329 17:49:36.720105       1 event.go:294] "Event occurred" object="kube-system/calico-node" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: calico-node-g5mcr"
W0329 17:49:38.161727       1 node_lifecycle_controller.go:1053] Missing timestamp for Node worker-node02. Assuming now as a timestamp.
I0329 17:49:38.162203       1 event.go:294] "Event occurred" object="worker-node02" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node worker-node02 event: Registered Node worker-node02 in Controller"
W0329 17:49:39.691377       1 topologycache.go:232] Can't get CPU or zone information for worker-node01 node
I0329 17:49:40.793578       1 event.go:294] "Event occurred" object="kubernetes-dashboard/kubernetes-dashboard" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set kubernetes-dashboard-6c7ccbcf87 to 1"
I0329 17:49:40.821788       1 event.go:294] "Event occurred" object="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: kubernetes-dashboard-6c7ccbcf87-vssvh"
I0329 17:49:40.927636       1 event.go:294] "Event occurred" object="kubernetes-dashboard/dashboard-metrics-scraper" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set dashboard-metrics-scraper-7bc864c59 to 1"
I0329 17:49:40.960028       1 event.go:294] "Event occurred" object="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: dashboard-metrics-scraper-7bc864c59-zz598"
I0329 17:49:43.177221       1 event.go:294] "Event occurred" object="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh"
I0329 17:49:43.177311       1 event.go:294] "Event occurred" object="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598"
I0329 17:49:52.172015       1 event.go:294] "Event occurred" object="default/echo-deployment-d658654dc" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: echo-deployment-d658654dc-8gkvk"
W0329 18:02:58.062499       1 topologycache.go:232] Can't get CPU or zone information for worker-node01 node
I0329 18:02:59.691439       1 event.go:294] "Event occurred" object="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: dashboard-metrics-scraper-7bc864c59-gc8tt"
I0329 18:02:59.762007       1 event.go:294] "Event occurred" object="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: kubernetes-dashboard-6c7ccbcf87-tfx44"
I0329 18:03:18.692228       1 event.go:294] "Event occurred" object="default/echo-deployment-d658654dc" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: echo-deployment-d658654dc-6qbvk"
W0329 18:05:59.952210       1 topologycache.go:232] Can't get CPU or zone information for worker-node02 node
I0329 18:06:03.461551       1 event.go:294] "Event occurred" object="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh"
I0329 18:06:03.462453       1 event.go:294] "Event occurred" object="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598"
I0329 18:06:03.462743       1 event.go:294] "Event occurred" object="default/echo-deployment-d658654dc-8gkvk" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod default/echo-deployment-d658654dc-8gkvk"
kubelet.log
Mar 29 17:49:13 worker-node02 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 29 17:49:13 worker-node02 kubelet[4596]: E0329 17:49:13.196597    4596 run.go:74] "command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint to set"
Mar 29 17:49:13 worker-node02 systemd[1]: kubelet.service: Current command vanished from the unit file, execution of the command list won't be resumed.
Mar 29 17:49:13 worker-node02 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Mar 29 17:49:13 worker-node02 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Mar 29 17:49:13 worker-node02 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Mar 29 17:49:13 worker-node02 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 29 17:49:13 worker-node02 kubelet[4634]: E0329 17:49:13.760008    4634 run.go:74] "command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint to set"
Mar 29 17:49:13 worker-node02 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Mar 29 17:49:13 worker-node02 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Mar 29 17:49:22 worker-node02 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Mar 29 17:49:22 worker-node02 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 29 17:49:22 worker-node02 kubelet[5319]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Mar 29 17:49:22 worker-node02 kubelet[5319]: I0329 17:49:22.999361    5319 server.go:198] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Mar 29 17:49:23 worker-node02 kubelet[5319]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.650221    5319 server.go:412] "Kubelet version" kubeletVersion="v1.26.1"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.650505    5319 server.go:414] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.651809    5319 server.go:836] "Client rotation is on, will bootstrap in background"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.681671    5319 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.694583    5319 server.go:659] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.697917    5319 container_manager_linux.go:267] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.698203    5319 container_manager_linux.go:272] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.699195    5319 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.699413    5319 container_manager_linux.go:308] "Creating device plugin manager"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.700517    5319 state_mem.go:36] "Initialized new in-memory state store"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.722092    5319 kubelet.go:398] "Attempting to sync node with API server"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.722455    5319 kubelet.go:286] "Adding static pod path" path="/etc/kubernetes/manifests"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.724164    5319 kubelet.go:297] "Adding apiserver pod source"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.725532    5319 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.739636    5319 kuberuntime_manager.go:244] "Container runtime initialized" containerRuntime="cri-o" version="1.26.1" apiVersion="v1"
Mar 29 17:49:23 worker-node02 kubelet[5319]: W0329 17:49:23.743400    5319 probe.go:268] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.747169    5319 nodeshutdown_manager_linux.go:137] "Creating node shutdown manager" shutdownGracePeriodRequested="1m0s" shutdownGracePeriodCriticalPods="10s" shutdownGracePeriodByPodPriority=[{Priority:0 ShutdownGracePeriodSeconds:50} {Priority:2000000000 ShutdownGracePeriodSeconds:10}]
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.747510    5319 server.go:1186] "Started kubelet"
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.755795    5319 kubelet.go:1386] "Image garbage collection failed once. Stats initialization may not have completed yet" err="failed to get imageFs info: unable to find data in memory cache"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.758748    5319 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.770366    5319 server.go:161] "Starting to listen" address="0.0.0.0" port=10250
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.772856    5319 server.go:451] "Adding debug handlers to kubelet server"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.781367    5319 volume_manager.go:293] "Starting Kubelet Volume Manager"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.785647    5319 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.885447    5319 container_manager_linux.go:945] "Unable to get rootfs data from cAdvisor interface" err="unable to find data in memory cache"
Mar 29 17:49:23 worker-node02 kubelet[5319]: W0329 17:49:23.943444    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.949633    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: W0329 17:49:23.950041    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.966374    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: W0329 17:49:23.951435    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.966929    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.952464    5319 controller.go:146] failed to ensure lease exists, will retry in 200ms, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.952493    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c7efae8d0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 747457232, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 747457232, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.961713    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.974850    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.975278    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.980926    5319 cpu_manager.go:214] "Starting CPU manager" policy="none"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.981572    5319 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Mar 29 17:49:23 worker-node02 kubelet[5319]: I0329 17:49:23.981757    5319 state_mem.go:36] "Initialized new in-memory state store"
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.987763    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:23 worker-node02 kubelet[5319]: E0329 17:49:23.993583    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.003505    5319 policy_none.go:49] "None policy: Start"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.005208    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961673297, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.008529    5319 memory_manager.go:169] "Starting memorymanager" policy="None"
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.008742    5319 state_mem.go:35] "Initializing new in-memory state store"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.013124    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961684197, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.044883    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961686568, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.088591    5319 manager.go:455] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.091117    5319 eviction_manager.go:261] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"worker-node02\" not found"
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.091400    5319 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.102309    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c93c31c08", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeAllocatableEnforced", Message:"Updated Node Allocatable limit across pods", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 96121864, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 96121864, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.105903    5319 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.165022    5319 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.165382    5319 status_manager.go:176] "Starting to sync pod status with apiserver"
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.165659    5319 kubelet.go:2113] "Starting kubelet main sync loop"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.165831    5319 kubelet.go:2137] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.173853    5319 controller.go:146] failed to ensure lease exists, will retry in 400ms, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.176533    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:24 worker-node02 kubelet[5319]: W0329 17:49:24.179374    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.179584    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.180428    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.181089    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 176506536, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.186811    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 176512538, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.190136    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 176514599, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.578556    5319 controller.go:146] failed to ensure lease exists, will retry in 800ms, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:24 worker-node02 kubelet[5319]: I0329 17:49:24.582160    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.587346    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.588007    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 582116830, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.591801    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 582122798, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.595349    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 24, 582127009, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:24 worker-node02 kubelet[5319]: W0329 17:49:24.817128    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:24 worker-node02 kubelet[5319]: E0329 17:49:24.817439    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: W0329 17:49:25.197126    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.197747    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: W0329 17:49:25.323987    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.324376    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.383015    5319 controller.go:146] failed to ensure lease exists, will retry in 1.6s, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:25 worker-node02 kubelet[5319]: I0329 17:49:25.388362    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.393135    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.394930    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 25, 388324240, time.Local), Count:5, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.399977    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 25, 388331417, time.Local), Count:5, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.403524    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 25, 388334787, time.Local), Count:5, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:25 worker-node02 kubelet[5319]: W0329 17:49:25.722736    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:25 worker-node02 kubelet[5319]: E0329 17:49:25.723084    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:26 worker-node02 kubelet[5319]: W0329 17:49:26.768736    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:26 worker-node02 kubelet[5319]: E0329 17:49:26.769902    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:26 worker-node02 kubelet[5319]: E0329 17:49:26.986715    5319 controller.go:146] failed to ensure lease exists, will retry in 3.2s, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:26 worker-node02 kubelet[5319]: I0329 17:49:26.994921    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:26 worker-node02 kubelet[5319]: E0329 17:49:26.999112    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:26 worker-node02 kubelet[5319]: E0329 17:49:26.999691    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 26, 994885992, time.Local), Count:6, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:27 worker-node02 kubelet[5319]: E0329 17:49:27.002684    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 26, 994894464, time.Local), Count:6, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:27 worker-node02 kubelet[5319]: E0329 17:49:27.005487    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 26, 994896749, time.Local), Count:6, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:27 worker-node02 kubelet[5319]: W0329 17:49:27.113162    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:27 worker-node02 kubelet[5319]: E0329 17:49:27.113556    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:27 worker-node02 kubelet[5319]: W0329 17:49:27.734969    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:27 worker-node02 kubelet[5319]: E0329 17:49:27.735290    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:28 worker-node02 kubelet[5319]: W0329 17:49:28.292273    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:28 worker-node02 kubelet[5319]: E0329 17:49:28.292777    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 29 17:49:30 worker-node02 kubelet[5319]: E0329 17:49:30.190412    5319 controller.go:146] failed to ensure lease exists, will retry in 6.4s, error: leases.coordination.k8s.io "worker-node02" is forbidden: User "system:anonymous" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Mar 29 17:49:30 worker-node02 kubelet[5319]: I0329 17:49:30.201525    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:30 worker-node02 kubelet[5319]: E0329 17:49:30.206234    5319 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="worker-node02"
Mar 29 17:49:30 worker-node02 kubelet[5319]: E0329 17:49:30.206553    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc34f1", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node worker-node02 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961451761, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 30, 201473598, time.Local), Count:7, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc34f1" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:30 worker-node02 kubelet[5319]: E0329 17:49:30.210005    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc673f", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node worker-node02 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961464639, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 30, 201492432, time.Local), Count:7, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc673f" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:30 worker-node02 kubelet[5319]: E0329 17:49:30.212623    5319 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"worker-node02.1750f51c8bbc7668", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-node02", UID:"worker-node02", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node worker-node02 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"worker-node02"}, FirstTimestamp:time.Date(2023, time.March, 29, 17, 49, 23, 961468520, time.Local), LastTimestamp:time.Date(2023, time.March, 29, 17, 49, 30, 201495623, time.Local), Count:7, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-node02.1750f51c8bbc7668" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 29 17:49:32 worker-node02 kubelet[5319]: W0329 17:49:32.068738    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:32 worker-node02 kubelet[5319]: E0329 17:49:32.069206    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes "worker-node02" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 29 17:49:32 worker-node02 kubelet[5319]: W0329 17:49:32.119492    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:32 worker-node02 kubelet[5319]: E0329 17:49:32.119785    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 29 17:49:33 worker-node02 kubelet[5319]: W0329 17:49:33.120012    5319 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:33 worker-node02 kubelet[5319]: E0329 17:49:33.120700    5319 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 29 17:49:33 worker-node02 kubelet[5319]: I0329 17:49:33.674904    5319 transport.go:135] "Certificate rotation detected, shutting down client connections to start using new credentials"
Mar 29 17:49:34 worker-node02 kubelet[5319]: E0329 17:49:34.091952    5319 eviction_manager.go:261] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"worker-node02\" not found"
Mar 29 17:49:34 worker-node02 kubelet[5319]: E0329 17:49:34.316024    5319 csi_plugin.go:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "worker-node02" not found
Mar 29 17:49:35 worker-node02 kubelet[5319]: E0329 17:49:35.557139    5319 csi_plugin.go:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "worker-node02" not found
Mar 29 17:49:36 worker-node02 kubelet[5319]: E0329 17:49:36.601473    5319 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"worker-node02\" not found" node="worker-node02"
Mar 29 17:49:36 worker-node02 kubelet[5319]: I0329 17:49:36.608079    5319 kubelet_node_status.go:70] "Attempting to register node" node="worker-node02"
Mar 29 17:49:36 worker-node02 kubelet[5319]: I0329 17:49:36.960982    5319 kubelet_node_status.go:73] "Successfully registered node" node="worker-node02"
Mar 29 17:49:36 worker-node02 kubelet[5319]: E0329 17:49:36.999045    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.099445    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.201268    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.302078    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.403584    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.513182    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.614796    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.716226    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.817642    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:37 worker-node02 kubelet[5319]: E0329 17:49:37.918766    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.019620    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.119974    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.223264    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.325452    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.428814    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.534339    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.643592    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.744673    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.845778    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:38 worker-node02 kubelet[5319]: E0329 17:49:38.946143    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.047431    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.148149    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.249388    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.350270    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.454387    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.561692    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.664680    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.774840    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: E0329 17:49:39.884676    5319 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"worker-node02\" not found"
Mar 29 17:49:39 worker-node02 kubelet[5319]: I0329 17:49:39.997503    5319 kuberuntime_manager.go:1114] "Updating runtime config through cri with podcidr" CIDR="172.16.2.0/24"
Mar 29 17:49:39 worker-node02 kubelet[5319]: I0329 17:49:39.998942    5319 kubelet_network.go:61] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="172.16.2.0/24"
Mar 29 17:49:39 worker-node02 kubelet[5319]: I0329 17:49:39.999957    5319 kubelet_node_status.go:493] "Fast updating node status as it just became ready"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.756302    5319 apiserver.go:52] "Watching apiserver"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.783470    5319 topology_manager.go:210] "Topology Admit Handler"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.783848    5319 topology_manager.go:210] "Topology Admit Handler"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.791625    5319 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.798996    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-proxy\") pod \"kube-proxy-dkw9j\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") " pod="kube-system/kube-proxy-dkw9j"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.799358    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"bpffs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-bpffs\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.799594    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-log-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-log-dir\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.800396    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-net-dir\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.800650    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-local-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-host-local-net-dir\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.800895    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"policysync\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-policysync\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.801166    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-lib-modules\") pod \"kube-proxy-dkw9j\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") " pod="kube-system/kube-proxy-dkw9j"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.801413    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-cxprl\" (UniqueName: \"kubernetes.io/projected/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-api-access-cxprl\") pod \"kube-proxy-dkw9j\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") " pod="kube-system/kube-proxy-dkw9j"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.801761    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-lib-modules\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.802031    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"nodeproc\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-nodeproc\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.802204    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-6wb2h\" (UniqueName: \"kubernetes.io/projected/8402f0e8-dc01-4de3-8c0f-585b491d49a1-kube-api-access-6wb2h\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.802401    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"var-run-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-run-calico\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.803471    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"var-lib-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-lib-calico\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.803705    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-xtables-lock\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.804434    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-bin-dir\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.804683    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-xtables-lock\") pod \"kube-proxy-dkw9j\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") " pod="kube-system/kube-proxy-dkw9j"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.805348    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"sys-fs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-sys-fs\") pod \"calico-node-g5mcr\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") " pod="kube-system/calico-node-g5mcr"
Mar 29 17:49:40 worker-node02 kubelet[5319]: I0329 17:49:40.805534    5319 reconciler.go:41] "Reconciler: start to sync state"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.165122    5319 topology_manager.go:210] "Topology Admit Handler"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.208568    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/b210cff4-254b-4942-9c41-82388850e7a8-tmp-volume\") pod \"kubernetes-dashboard-6c7ccbcf87-vssvh\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") " pod="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.208639    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubernetes-dashboard-certs\" (UniqueName: \"kubernetes.io/secret/b210cff4-254b-4942-9c41-82388850e7a8-kubernetes-dashboard-certs\") pod \"kubernetes-dashboard-6c7ccbcf87-vssvh\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") " pod="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.208671    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-gzsch\" (UniqueName: \"kubernetes.io/projected/b210cff4-254b-4942-9c41-82388850e7a8-kube-api-access-gzsch\") pod \"kubernetes-dashboard-6c7ccbcf87-vssvh\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") " pod="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.303068    5319 topology_manager.go:210] "Topology Admit Handler"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.414820    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/27512b05-aa14-4653-8feb-99a4ee019dbf-tmp-volume\") pod \"dashboard-metrics-scraper-7bc864c59-zz598\" (UID: \"27512b05-aa14-4653-8feb-99a4ee019dbf\") " pod="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598"
Mar 29 17:49:41 worker-node02 kubelet[5319]: I0329 17:49:41.415362    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-k66lp\" (UniqueName: \"kubernetes.io/projected/27512b05-aa14-4653-8feb-99a4ee019dbf-kube-api-access-k66lp\") pod \"dashboard-metrics-scraper-7bc864c59-zz598\" (UID: \"27512b05-aa14-4653-8feb-99a4ee019dbf\") " pod="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598"
Mar 29 17:49:52 worker-node02 kubelet[5319]: I0329 17:49:52.545781    5319 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-proxy-dkw9j" podStartSLOduration=-9.223372020309105e+09 pod.CreationTimestamp="2023-03-29 17:49:36 +0000 UTC" firstStartedPulling="2023-03-29 17:49:42.909102202 +0000 UTC m=+19.983180697" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-03-29 17:49:52.337600649 +0000 UTC m=+29.411679157" watchObservedRunningTime="2023-03-29 17:49:52.545671383 +0000 UTC m=+29.619749883"
Mar 29 17:49:52 worker-node02 kubelet[5319]: I0329 17:49:52.547325    5319 topology_manager.go:210] "Topology Admit Handler"
Mar 29 17:49:52 worker-node02 kubelet[5319]: W0329 17:49:52.566908    5319 reflector.go:424] object-"default"/"kube-root-ca.crt": failed to list *v1.ConfigMap: configmaps "kube-root-ca.crt" is forbidden: User "system:node:worker-node02" cannot list resource "configmaps" in API group "" in the namespace "default": no relationship found between node 'worker-node02' and this object
Mar 29 17:49:52 worker-node02 kubelet[5319]: E0329 17:49:52.567402    5319 reflector.go:140] object-"default"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "kube-root-ca.crt" is forbidden: User "system:node:worker-node02" cannot list resource "configmaps" in API group "" in the namespace "default": no relationship found between node 'worker-node02' and this object
Mar 29 17:49:52 worker-node02 kubelet[5319]: I0329 17:49:52.624118    5319 reconciler_common.go:253] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-jr2b5\" (UniqueName: \"kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5\") pod \"echo-deployment-d658654dc-8gkvk\" (UID: \"86bce07d-98ce-4c7c-a3eb-bb23753bcaa0\") " pod="default/echo-deployment-d658654dc-8gkvk"
Mar 29 17:49:53 worker-node02 kubelet[5319]: E0329 17:49:53.744368    5319 projected.go:292] Couldn't get configMap default/kube-root-ca.crt: failed to sync configmap cache: timed out waiting for the condition
Mar 29 17:49:53 worker-node02 kubelet[5319]: E0329 17:49:53.745381    5319 projected.go:198] Error preparing data for projected volume kube-api-access-jr2b5 for pod default/echo-deployment-d658654dc-8gkvk: failed to sync configmap cache: timed out waiting for the condition
Mar 29 17:49:53 worker-node02 kubelet[5319]: E0329 17:49:53.745678    5319 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5 podName:86bce07d-98ce-4c7c-a3eb-bb23753bcaa0 nodeName:}" failed. No retries permitted until 2023-03-29 17:49:54.245642839 +0000 UTC m=+31.319721339 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access-jr2b5" (UniqueName: "kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5") pod "echo-deployment-d658654dc-8gkvk" (UID: "86bce07d-98ce-4c7c-a3eb-bb23753bcaa0") : failed to sync configmap cache: timed out waiting for the condition
Mar 29 17:50:22 worker-node02 kubelet[5319]: I0329 17:50:22.913276    5319 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kubernetes-dashboard/kubernetes-dashboard-6c7ccbcf87-vssvh" podStartSLOduration=-9.223371993952316e+09 pod.CreationTimestamp="2023-03-29 17:49:40 +0000 UTC" firstStartedPulling="2023-03-29 17:49:43.341007377 +0000 UTC m=+20.415085872" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-03-29 17:50:17.847190216 +0000 UTC m=+54.921268720" watchObservedRunningTime="2023-03-29 17:50:22.902460351 +0000 UTC m=+59.976538869"
Mar 29 17:50:32 worker-node02 kubelet[5319]: I0329 17:50:32.994132    5319 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kubernetes-dashboard/dashboard-metrics-scraper-7bc864c59-zz598" podStartSLOduration=-9.223371983860693e+09 pod.CreationTimestamp="2023-03-29 17:49:40 +0000 UTC" firstStartedPulling="2023-03-29 17:49:43.601677507 +0000 UTC m=+20.675756002" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-03-29 17:50:22.91946549 +0000 UTC m=+59.993544001" watchObservedRunningTime="2023-03-29 17:50:32.994082485 +0000 UTC m=+70.068160987"
Mar 29 17:50:32 worker-node02 kubelet[5319]: I0329 17:50:32.995737    5319 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="default/echo-deployment-d658654dc-8gkvk" podStartSLOduration=-9.223371995859076e+09 pod.CreationTimestamp="2023-03-29 17:49:52 +0000 UTC" firstStartedPulling="2023-03-29 17:49:54.579967652 +0000 UTC m=+31.654046144" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-03-29 17:50:32.993814844 +0000 UTC m=+70.067893343" watchObservedRunningTime="2023-03-29 17:50:32.995700441 +0000 UTC m=+70.069778943"
Mar 29 17:51:01 worker-node02 kubelet[5319]: I0329 17:51:01.462434    5319 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/calico-node-g5mcr" podStartSLOduration=-9.22337195139845e+09 pod.CreationTimestamp="2023-03-29 17:49:36 +0000 UTC" firstStartedPulling="2023-03-29 17:49:42.911898155 +0000 UTC m=+19.985976645" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-03-29 17:50:48.290684359 +0000 UTC m=+85.364762862" watchObservedRunningTime="2023-03-29 17:51:01.456324583 +0000 UTC m=+98.530403089"
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.231150    5319 setters.go:548] "Node became not ready" node="worker-node02" condition={Type:Ready Status:False LastHeartbeatTime:2023-03-29 18:03:00.226360686 +0000 UTC m=+817.300439173 LastTransitionTime:2023-03-29 18:03:00.226360686 +0000 UTC m=+817.300439173 Reason:KubeletNotReady Message:node is shutting down}
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.938975    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-api-access-gzsch\" (UniqueName: \"kubernetes.io/projected/b210cff4-254b-4942-9c41-82388850e7a8-kube-api-access-gzsch\") pod \"b210cff4-254b-4942-9c41-82388850e7a8\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") "
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.939598    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-api-access-k66lp\" (UniqueName: \"kubernetes.io/projected/27512b05-aa14-4653-8feb-99a4ee019dbf-kube-api-access-k66lp\") pod \"27512b05-aa14-4653-8feb-99a4ee019dbf\" (UID: \"27512b05-aa14-4653-8feb-99a4ee019dbf\") "
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.939829    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/27512b05-aa14-4653-8feb-99a4ee019dbf-tmp-volume\") pod \"27512b05-aa14-4653-8feb-99a4ee019dbf\" (UID: \"27512b05-aa14-4653-8feb-99a4ee019dbf\") "
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.939999    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/b210cff4-254b-4942-9c41-82388850e7a8-tmp-volume\") pod \"b210cff4-254b-4942-9c41-82388850e7a8\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") "
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.940174    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kubernetes-dashboard-certs\" (UniqueName: \"kubernetes.io/secret/b210cff4-254b-4942-9c41-82388850e7a8-kubernetes-dashboard-certs\") pod \"b210cff4-254b-4942-9c41-82388850e7a8\" (UID: \"b210cff4-254b-4942-9c41-82388850e7a8\") "
Mar 29 18:03:00 worker-node02 kubelet[5319]: W0329 18:03:00.947464    5319 empty_dir.go:525] Warning: Failed to clear quota on /var/lib/kubelet/pods/27512b05-aa14-4653-8feb-99a4ee019dbf/volumes/kubernetes.io~empty-dir/tmp-volume: clearQuota called, but quotas disabled
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.948423    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/empty-dir/27512b05-aa14-4653-8feb-99a4ee019dbf-tmp-volume" (OuterVolumeSpecName: "tmp-volume") pod "27512b05-aa14-4653-8feb-99a4ee019dbf" (UID: "27512b05-aa14-4653-8feb-99a4ee019dbf"). InnerVolumeSpecName "tmp-volume". PluginName "kubernetes.io/empty-dir", VolumeGidValue ""
Mar 29 18:03:00 worker-node02 kubelet[5319]: W0329 18:03:00.950924    5319 empty_dir.go:525] Warning: Failed to clear quota on /var/lib/kubelet/pods/b210cff4-254b-4942-9c41-82388850e7a8/volumes/kubernetes.io~empty-dir/tmp-volume: clearQuota called, but quotas disabled
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.955372    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/27512b05-aa14-4653-8feb-99a4ee019dbf-kube-api-access-k66lp" (OuterVolumeSpecName: "kube-api-access-k66lp") pod "27512b05-aa14-4653-8feb-99a4ee019dbf" (UID: "27512b05-aa14-4653-8feb-99a4ee019dbf"). InnerVolumeSpecName "kube-api-access-k66lp". PluginName "kubernetes.io/projected", VolumeGidValue ""
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.955973    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/empty-dir/b210cff4-254b-4942-9c41-82388850e7a8-tmp-volume" (OuterVolumeSpecName: "tmp-volume") pod "b210cff4-254b-4942-9c41-82388850e7a8" (UID: "b210cff4-254b-4942-9c41-82388850e7a8"). InnerVolumeSpecName "tmp-volume". PluginName "kubernetes.io/empty-dir", VolumeGidValue ""
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.957225    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/b210cff4-254b-4942-9c41-82388850e7a8-kube-api-access-gzsch" (OuterVolumeSpecName: "kube-api-access-gzsch") pod "b210cff4-254b-4942-9c41-82388850e7a8" (UID: "b210cff4-254b-4942-9c41-82388850e7a8"). InnerVolumeSpecName "kube-api-access-gzsch". PluginName "kubernetes.io/projected", VolumeGidValue ""
Mar 29 18:03:00 worker-node02 kubelet[5319]: I0329 18:03:00.961317    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/b210cff4-254b-4942-9c41-82388850e7a8-kubernetes-dashboard-certs" (OuterVolumeSpecName: "kubernetes-dashboard-certs") pod "b210cff4-254b-4942-9c41-82388850e7a8" (UID: "b210cff4-254b-4942-9c41-82388850e7a8"). InnerVolumeSpecName "kubernetes-dashboard-certs". PluginName "kubernetes.io/secret", VolumeGidValue ""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.041237    5319 reconciler_common.go:295] "Volume detached for volume \"kube-api-access-k66lp\" (UniqueName: \"kubernetes.io/projected/27512b05-aa14-4653-8feb-99a4ee019dbf-kube-api-access-k66lp\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.041530    5319 reconciler_common.go:295] "Volume detached for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/27512b05-aa14-4653-8feb-99a4ee019dbf-tmp-volume\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.041678    5319 reconciler_common.go:295] "Volume detached for volume \"tmp-volume\" (UniqueName: \"kubernetes.io/empty-dir/b210cff4-254b-4942-9c41-82388850e7a8-tmp-volume\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.041978    5319 reconciler_common.go:295] "Volume detached for volume \"kubernetes-dashboard-certs\" (UniqueName: \"kubernetes.io/secret/b210cff4-254b-4942-9c41-82388850e7a8-kubernetes-dashboard-certs\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.041997    5319 reconciler_common.go:295] "Volume detached for volume \"kube-api-access-gzsch\" (UniqueName: \"kubernetes.io/projected/b210cff4-254b-4942-9c41-82388850e7a8-kube-api-access-gzsch\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.803460    5319 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="cda7376715405fa75d64824c51d5325938fabeb33f41c40c5f810c379331725a"
Mar 29 18:03:01 worker-node02 kubelet[5319]: I0329 18:03:01.816393    5319 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="0c9f84ba5e7fc3375cd9ab700747b8e78b2f915bd25f32dd2258c24899b944de"
Mar 29 18:03:20 worker-node02 kubelet[5319]: I0329 18:03:20.843305    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-api-access-jr2b5\" (UniqueName: \"kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5\") pod \"86bce07d-98ce-4c7c-a3eb-bb23753bcaa0\" (UID: \"86bce07d-98ce-4c7c-a3eb-bb23753bcaa0\") "
Mar 29 18:03:20 worker-node02 kubelet[5319]: I0329 18:03:20.853974    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5" (OuterVolumeSpecName: "kube-api-access-jr2b5") pod "86bce07d-98ce-4c7c-a3eb-bb23753bcaa0" (UID: "86bce07d-98ce-4c7c-a3eb-bb23753bcaa0"). InnerVolumeSpecName "kube-api-access-jr2b5". PluginName "kubernetes.io/projected", VolumeGidValue ""
Mar 29 18:03:20 worker-node02 kubelet[5319]: I0329 18:03:20.856434    5319 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="9aea2a5d0aea33642a08b84685ec6ece93ecebc138398a2e79e228111efd772b"
Mar 29 18:03:20 worker-node02 kubelet[5319]: I0329 18:03:20.945286    5319 reconciler_common.go:295] "Volume detached for volume \"kube-api-access-jr2b5\" (UniqueName: \"kubernetes.io/projected/86bce07d-98ce-4c7c-a3eb-bb23753bcaa0-kube-api-access-jr2b5\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252513    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-xtables-lock\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252572    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-bin-dir\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252593    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"cni-log-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-log-dir\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252611    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"host-local-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-host-local-net-dir\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252629    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-lib-modules\") pod \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252648    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-lib-modules\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252674    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-api-access-6wb2h\" (UniqueName: \"kubernetes.io/projected/8402f0e8-dc01-4de3-8c0f-585b491d49a1-kube-api-access-6wb2h\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252691    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"var-run-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-run-calico\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252709    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"sys-fs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-sys-fs\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252743    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"bpffs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-bpffs\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252784    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-api-access-cxprl\" (UniqueName: \"kubernetes.io/projected/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-api-access-cxprl\") pod \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252813    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-xtables-lock\") pod \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252842    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-net-dir\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252888    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"policysync\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-policysync\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252918    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"var-lib-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-lib-calico\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252959    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-proxy\") pod \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\" (UID: \"075134a2-8dbf-46d2-98b1-6f2d722cfee7\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.252992    5319 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"nodeproc\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-nodeproc\") pod \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\" (UID: \"8402f0e8-dc01-4de3-8c0f-585b491d49a1\") "
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.268690    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-nodeproc" (OuterVolumeSpecName: "nodeproc") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "nodeproc". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.268801    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-bin-dir" (OuterVolumeSpecName: "cni-bin-dir") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "cni-bin-dir". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.268824    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-log-dir" (OuterVolumeSpecName: "cni-log-dir") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "cni-log-dir". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.268842    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-host-local-net-dir" (OuterVolumeSpecName: "host-local-net-dir") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "host-local-net-dir". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.268967    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-xtables-lock" (OuterVolumeSpecName: "xtables-lock") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "xtables-lock". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.269004    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-lib-modules" (OuterVolumeSpecName: "lib-modules") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "lib-modules". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.269780    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-lib-modules" (OuterVolumeSpecName: "lib-modules") pod "075134a2-8dbf-46d2-98b1-6f2d722cfee7" (UID: "075134a2-8dbf-46d2-98b1-6f2d722cfee7"). InnerVolumeSpecName "lib-modules". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.269829    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-run-calico" (OuterVolumeSpecName: "var-run-calico") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "var-run-calico". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.269867    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-sys-fs" (OuterVolumeSpecName: "sys-fs") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "sys-fs". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.269892    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-bpffs" (OuterVolumeSpecName: "bpffs") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "bpffs". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405165    5319 reconciler_common.go:295] "Volume detached for volume \"nodeproc\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-nodeproc\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405220    5319 reconciler_common.go:295] "Volume detached for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-bin-dir\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405248    5319 reconciler_common.go:295] "Volume detached for volume \"cni-log-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-log-dir\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405281    5319 reconciler_common.go:295] "Volume detached for volume \"host-local-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-host-local-net-dir\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405311    5319 reconciler_common.go:295] "Volume detached for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-xtables-lock\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405331    5319 reconciler_common.go:295] "Volume detached for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-lib-modules\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405343    5319 reconciler_common.go:295] "Volume detached for volume \"var-run-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-run-calico\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405357    5319 reconciler_common.go:295] "Volume detached for volume \"sys-fs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-sys-fs\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405370    5319 reconciler_common.go:295] "Volume detached for volume \"bpffs\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-bpffs\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405379    5319 reconciler_common.go:295] "Volume detached for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-lib-modules\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405433    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-xtables-lock" (OuterVolumeSpecName: "xtables-lock") pod "075134a2-8dbf-46d2-98b1-6f2d722cfee7" (UID: "075134a2-8dbf-46d2-98b1-6f2d722cfee7"). InnerVolumeSpecName "xtables-lock". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405493    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-net-dir" (OuterVolumeSpecName: "cni-net-dir") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "cni-net-dir". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.405522    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-policysync" (OuterVolumeSpecName: "policysync") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "policysync". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.449886    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-lib-calico" (OuterVolumeSpecName: "var-lib-calico") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "var-lib-calico". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: W0329 18:03:21.455137    5319 empty_dir.go:525] Warning: Failed to clear quota on /var/lib/kubelet/pods/075134a2-8dbf-46d2-98b1-6f2d722cfee7/volumes/kubernetes.io~configmap/kube-proxy: clearQuota called, but quotas disabled
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.457759    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-proxy" (OuterVolumeSpecName: "kube-proxy") pod "075134a2-8dbf-46d2-98b1-6f2d722cfee7" (UID: "075134a2-8dbf-46d2-98b1-6f2d722cfee7"). InnerVolumeSpecName "kube-proxy". PluginName "kubernetes.io/configmap", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: E0329 18:03:21.473124    5319 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-api-access-cxprl podName:075134a2-8dbf-46d2-98b1-6f2d722cfee7 nodeName:}" failed. No retries permitted until 2023-03-29 18:03:21.957929295 +0000 UTC m=+839.032007790 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "kube-api-access-cxprl" (UniqueName: "kubernetes.io/projected/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-api-access-cxprl") pod "075134a2-8dbf-46d2-98b1-6f2d722cfee7" (UID: "075134a2-8dbf-46d2-98b1-6f2d722cfee7") : unmount failed: signal: terminated
Mar 29 18:03:21 worker-node02 kubelet[5319]: Unmounting arguments: /var/lib/kubelet/pods/075134a2-8dbf-46d2-98b1-6f2d722cfee7/volumes/kubernetes.io~projected/kube-api-access-cxprl
Mar 29 18:03:21 worker-node02 kubelet[5319]: Output:
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631595    5319 operation_generator.go:890] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/8402f0e8-dc01-4de3-8c0f-585b491d49a1-kube-api-access-6wb2h" (OuterVolumeSpecName: "kube-api-access-6wb2h") pod "8402f0e8-dc01-4de3-8c0f-585b491d49a1" (UID: "8402f0e8-dc01-4de3-8c0f-585b491d49a1"). InnerVolumeSpecName "kube-api-access-6wb2h". PluginName "kubernetes.io/projected", VolumeGidValue ""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631745    5319 reconciler_common.go:295] "Volume detached for volume \"kube-api-access-6wb2h\" (UniqueName: \"kubernetes.io/projected/8402f0e8-dc01-4de3-8c0f-585b491d49a1-kube-api-access-6wb2h\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631764    5319 reconciler_common.go:295] "Volume detached for volume \"policysync\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-policysync\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631777    5319 reconciler_common.go:295] "Volume detached for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/075134a2-8dbf-46d2-98b1-6f2d722cfee7-xtables-lock\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631788    5319 reconciler_common.go:295] "Volume detached for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-cni-net-dir\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.631798    5319 reconciler_common.go:295] "Volume detached for volume \"var-lib-calico\" (UniqueName: \"kubernetes.io/host-path/8402f0e8-dc01-4de3-8c0f-585b491d49a1-var-lib-calico\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 kubelet[5319]: I0329 18:03:21.632347    5319 reconciler_common.go:295] "Volume detached for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/075134a2-8dbf-46d2-98b1-6f2d722cfee7-kube-proxy\") on node \"worker-node02\" DevicePath \"\""
Mar 29 18:03:21 worker-node02 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Mar 29 18:03:21 worker-node02 systemd[1]: kubelet.service: Deactivated successfully.
Mar 29 18:03:21 worker-node02 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Mar 29 18:03:21 worker-node02 systemd[1]: kubelet.service: Consumed 13.964s CPU time.

@aojea
Copy link
Member

aojea commented Mar 29, 2023

poiler: there's nothing there. Is there be a flag to turn on more specific logs for the endpoint slice controller?

all the components can increase the verbosity level with the -v flag, -v 4 should be enough to see some important data

@aojea
Copy link
Member

aojea commented Mar 29, 2023

Within 5 seconds (the preStop hook duration), the EndpointSlice should be updated to have the pod that is terminating removed. This is what happens when a pod gets removed for any other reason (e.g. kubectl delete pod).

The endpointslice reflects the state of the Pod, it literally checks the pod state, same you are doing with kubectl describe pod , if it is reflecting the conditions

  conditions:
    ready: true
    serving: true
    terminating: false

you'll need to check the pod at that time, it should be ready, if not, there is a problem we need to investigate more

@mikekap
Copy link
Contributor Author

mikekap commented Mar 29, 2023

The pod also reports everything is great while shutting down:

Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True

Afaict the pod never even gets marked as Terminating -- only the containers get killed. Here are the same logs with -v 4: https://gist.github.com/mikekap/f55753e23d050bdaf39cd2b06ad747a0 . I don't fully grok the logs, but this seemed like an interesting snippet from the kubelet logs:

Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.041733     783 nodeshutdown_manager_linux.go:265] "Shutdown manager detected new shutdown event, isNodeShuttingDownNow" event=true
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.042229     783 nodeshutdown_manager_linux.go:273] "Shutdown manager detected new shutdown event" event="shutdown"
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.042396     783 nodeshutdown_manager_linux.go:322] "Shutdown manager processing shutdown event"
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.047298     783 nodeshutdown_manager_linux.go:375] "Shutdown manager killing pod with gracePeriod" pod="default/echo-deployment-d658654dc-k8srg" gracePeriod=30
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.047490     783 pod_workers.go:652] "Pod is being removed by the kubelet, begin teardown" pod="default/echo-deployment-d658654dc-k8srg" podUID=bc22c80a-a843-49da-b7a2-5d70512bb54d
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.047676     783 pod_workers.go:902] "Processing pod event" pod="default/echo-deployment-d658654dc-k8srg" podUID=bc22c80a-a843-49da-b7a2-5d70512bb54d updateType=1
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.047829     783 pod_workers.go:1019] "Pod worker has observed request to terminate" pod="default/echo-deployment-d658654dc-k8srg" podUID=bc22c80a-a843-49da-b7a2-5d70512bb54d
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.049317     783 kubelet.go:1858] "syncTerminatingPod enter" pod="default/echo-deployment-d658654dc-k8srg" podUID=bc22c80a-a843-49da-b7a2-5d70512bb54d
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.049448     783 kubelet_pods.go:1461] "Generating pod status" pod="default/echo-deployment-d658654dc-k8srg"
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.050428     783 kubelet_pods.go:1471] "Got phase for pod" pod="default/echo-deployment-d658654dc-k8srg" oldPhase=Running phase=Running
Mar 29 19:04:15 worker-node02 kubelet[783]: I0329 19:04:15.050671     783 kubelet.go:1888] "Pod terminating with grace period" pod="default/echo-deployment-d658654dc-k8srg" podUID=bc22c80a-a843-49da-b7a2-5d70512bb54d gracePeriod=30

@mikekap
Copy link
Contributor Author

mikekap commented Mar 29, 2023

Seems the code around pod status merging is at least partially responsible for the dropped api server update. The shutdown manager code only sets Phase, Message, Reason and adds a DisruptionTarget condition. The status merging code mentioned above disregards updates to all of these fields when the pod still has running containers (i.e. during draining).

Not sure what the right fix is here, but I think that's the root cause? If I'm understanding the code right, this would affect pod eviction and any other kubelet-led graceful pod shutdown.

@olderTaoist olderTaoist removed their assignment Mar 30, 2023
@mikekap
Copy link
Contributor Author

mikekap commented Apr 5, 2023

@SergeyKanzhelev would you need anything else from me to triage this?

This may be a bit presumptuous but I noticed @mimowo & @smarterclayton worked on this area recently. Would either of you have the time to see if my diagnosis in the previous comment is correct/useful?

@SergeyKanzhelev
Copy link
Member

/triage accepted

xref to the KEP: kubernetes/enhancements#2000 to consider when going GA

Will this problem be the same when we try to delete Pod due to some eviction reasons than? The PR is changing the existing tests - need to understand that the proposed fix is not regressing some other behavior.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 24, 2023
@SergeyKanzhelev
Copy link
Member

/remove-triage needs-information

@k8s-ci-robot k8s-ci-robot removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 28, 2023
@noony
Copy link

noony commented Apr 5, 2024

Hi, we also encounter this issue on 1.27.6.

What we see, is that after the node deletion triggered, pod is still Running during the prestop lifecycle, also when sigterm is sent. Then the pod transition to status to completed.

Endpoint is updated when the pod pass in terminating state, but because during shutdown it never transition to Terminating, it stills send traffic to this pod until the pod is Completed, which cause errors.

@alculquicondor
Copy link
Member

/remove-triage accepted

Can someone from SIG node re-triage this?

IIUC, the question is how/when does kubelet report to the apiserver that the Pod is terminating when doing a graceful node shutdown, so that other controllers can react.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 5, 2024
@rphillips
Copy link
Member

rphillips commented Apr 17, 2024

Double check that your service has a sigterm handler that sets the readiness endpoint to false. Some of the behavior described on this ticket is expected behavior. If the readiness endpoint is never set to false, then the pod is expected to stay ready until the entire lifecycle is terminated.

The kubelet also needs to be setup in a way where if it is running under systemd the dependency chain has the kubelet terminate prior to networking.

@rphillips
Copy link
Member

Also note, the pre-stop hook blocks the pod termination until it has completed.

Hook handler calls are synchronous within the context of the Pod containing the Container.

https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution

@mikekap
Copy link
Contributor Author

mikekap commented Apr 22, 2024

@rphillips I don't think this is specced behavior: readiness probes auto-fail when pods start shutdown normally. See this page: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes

Note: If you want to be able to drain requests when the Pod is deleted, you do not necessarily need a readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the containers in the Pod to stop.

@hshiina
Copy link

hshiina commented May 1, 2024

Double check that your service has a sigterm handler that sets the readiness endpoint to false. Some of the behavior described on this ticket is expected behavior. If the readiness endpoint is never set to false, then the pod is expected to stay ready until the entire lifecycle is terminated.

Because of a bug(#124648), readiness probe doesn’t work during pod termination triggered in kubelet such as eviction and node shutdown. This may be one of reasons why an endpoint is not updated early.

@aojea
Copy link
Member

aojea commented May 15, 2024

Endpoint is updated when the pod pass in terminating state, but because during shutdown it never transition to Terminating, it stills send traffic to this pod until the pod is Completed, which cause errors.

@rphillips @smarterclayton it seems the main problem here is the lack of update on the status, once the status on eviction is propagated Terminating or Failing, Services will work correctly, we had a similar issue in the past #110255

@smarterclayton
Copy link
Contributor

So zooming out:

GracefulShutdown by itself does not say whether the node is coming back or not - that's the responsibility of the agent that "owns" the node (cloud provider, human admin, vm orchestrator, whatever) to determine whether the node is coming back or not. So using graceful shutdown by itself can make no decision on whether to delete the pods or not, only to correctly represent that the pod is shutting down and being not ready.

GracefulShutdown was not intended to replace a controlled drain operation on the node. If the use case described here is assuming that graceful shutdown will guarantee that pods are terminated, that is not generally true and not the design we intended for GracefulShutdown. Any "set delete timestamp" action would have to be opt in and probably not the responsibility of the kubelet generally - the responsibility today belongs to the initiator of the "node going away permanently" action (cloud provider, orchestrator, human) and I could argue it should stay there.

I think we can say that GracefulShutdown is intended to end the processes of pods as best effort within a specific timeframe and report the status of those pods appropriately. At minimum, the pods should be described as non ready, and the containers should be described as stopped with terminal exit codes and/or phase (RestartAlways pods should be left in running phase, because there was no signal that the node is not coming back). But it is best effort - it cannot guarantee due to being part of a distributed system that those status updates will propagate.

The remaining question is whether the node is doing a good enough job during graceful shutdown. I think the answer given this thread is "no" - I would expect the following (as i commented on the other bug):

  1. Pod deletion and pod readiness are orthogonal, they should be kept orthogonal
  2. When the kubelet shuts down a pod gracefully for any reason (including deletion, static pod update, graceful node shutdown) the kubelet should continue to run readiness probes until a readiness check is "not ready" or the container stops, then the kubelet should stop readiness probes and not restart them until the container is started again
    a. if the container has no readiness probe, the ready state should go to False after the prestop completes
  3. The kubelet should aggressively attempt to report the most important status transitions during graceful shutdown, prioritizing ready -> false and container -> stop. This is challenging today because the kubelet status loop is single threaded, but we have discussed improving it.
  4. The initiator of node graceful shutdown is responsible for deciding whether the pod will never come back - if so invoke pod deletion before or concurrent with node graceful shutdown.

Anything I'm missing? Does this match everyone's experience and expectations?

@mikekap
Copy link
Contributor Author

mikekap commented May 16, 2024

One piece I might want to clarify is point 2:

  1. When the kubelet shuts down a pod gracefully for any reason (including deletion, static pod update, graceful node shutdown) the kubelet should continue to run readiness probes until a readiness check is "not ready" or the container stops, then the kubelet should stop readiness probes and not restart them until the container is started again

    a. if the container has no readiness probe, the ready state should go to False after the prestop completes

This isn't entirely complete - the endpoint slices that include these pods need to be updated correctly. Specifically, there are 3 booleans currently exposed: "ready", "serving", and "terminating". For pods on nodes undergoing graceful node shutdown these should be:

  • serving=(normal readiness probe result) - but this should be true if no probe is defined - same as it is when the pod is running normally.
  • ready=false - regardless of probe result; see the note about terminating pods here.
  • terminating=true - this is sort of debatable - but I think most ingress controllers use this as an indicator to drain traffic permanently (eg shut down persistent connections). This helps maintain their correctness.

Unfortunately doing this in the endpoint controller is impossible since the kubelet doesn't expose any indicator that the pod is shutting down. There may be a disruption condition now - so maybe this bug is about having the endpoint controller use that.

Separately - and this is just a non-advanced end user opinion that you can definitely ignore - IMO kubelet eviction should use pod deletion. The problem ultimately is doing anything else violates the "principle of least surprise". During eviction the pod is "Terminating" but unless everyone remembers that there are 2 ways of checking that - deletion timestamp & a disruption condition (or something similar) - there will be bugs everywhere. Even kubectl doesn't do this right now (to show "Terminating"). Nobody expects "kubectl drain" and "sudo halt" to do anything different - it's a nasty surprise.

I'm not totally sure I understand what's gained by leaving the undead pods around. In theory you might be able to restart them if the node comes back up, but even if that doesn't happen - the parent controller should be recreating them, right? Then is this done to protect "fully uncontrolled" pods (ie non-static and non-daemonset/replicaset/statefulset/etc)? These are super weird - you can't even use "kubectl delete" to restart them - right? I readily admit I have not run kube on bare metal or with pet nodes (only cattle nodes) - so I'd love to understand how this feature helps folks.

Ultimately if allowing kubelets to actually delete pods is a breaking change in the kube API guarantees, IMO it's worth doing. It likely fixes more bugs than it creates.

All that said, the endpoint slice controller behavior is what this bug report is about. So long as that readiness starts reporting false when termination begins - that works to fix this issue.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 17, 2024

Addressing this first:

IMO kubelet eviction should use pod deletion. The problem ultimately is doing anything else violates the "principle of least surprise" ... I'm not totally sure I understand what's gained by leaving the undead pods around. In theory you might be able to restart them if the node comes back up, but even if that doesn't happen - the parent controller should be recreating them, right?

In the specific, we can't rely on there being a parent controller because there are many scenarios where automation creates singular pods - all pods are mortal, but we try to avoid disrupting them unnecessarily (which is indeed surprising to users when they do get disrupted). A node restart is not a guarantee that a pod can't happily continue running past that restart.

More generally, I should clarify that graceful node shutdown is an an additive feature to Kubernetes - it added the concept of a node being shutdown with the intent that the node may not come back (i.e. to support spot nodes on cloud primarily). However, the default and historical behavior of Kubernetes is the exact opposite - nodes can go away and come back at any time and Kubernetes' was designed to be resilient to all of those scenarios (so we're paranoid about specific aspects of "stopping early" to avoid failing to restart things).

The addition of graceful shutdown thus had to support both scenarios, and does so with two different intersecting configurables:

  1. Timeframe - Is the shutdown under a short deadline (spot) or not (any other shutdown) - do we have the necessary time to propagate the node's information
  2. Temporary or Permanent - Does the node expect to come back after the restart (historically all nodes, most on-premise, most cloud up until a year or two ago) or not (spot, some cloud VMs)

The design thus had to support both of these and the full intersection with features like graceful termination of endpoints (which serving=x was added to support).

While kubelet has never initiated graceful deletions for pods, there's nothing that says it couldn't when the node isn't coming back. We discussed it in the graceful shutdown implementation, several of us myself included were nervous because we didn't understand all of the issues. It's certainly possible, and might be something we can consider in the future.

However, if we had, it would have masked the particular issue you found for restartable workloads, and we probably would not have detected the issue here, which is:

There is no reliable signal sent by the kubelet during graceful node shutdown to ensure that the endpoints controller can eagerly removing pods from rotation.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 17, 2024

Now addressing the second part:

The key difference with spot instances is that they are time limited - we have a fixed budget of time in which to ensure all of the right signals are sent to other components before permanent disruption occurs (e.g. 25s before the machine is forcibly powered off in some examples).

Appropriate signals:

  1. Indicating the pod is destined to terminate via graceful deletion of the pod
  2. The kubelet communicating the pod is not ready aggressively once we are certain removing it from rotation is appropriate

We need a solution for permanent graceful node shutdown that maximizes the chance that the endpoints controller can eagerly remove pods from rotation - we need to both ensure a low latency signal is sent in the normal case, and a high reliability signal is sent.

Problem 1: The Kubelet may not be able to reliably update all pods in time

The Kubelet has a lot of loops and work going on it and it it is possible that the Kubelet receives the signal but is unable to act upon it before 25s are over. The kubelet status loop is single threaded today - it updates each pod status sequentially. A single blip of apiserver connectivity, a bad load balancer, or a network error on the Kubelet could contribute 10s of seconds of delay, which if there are enough pods on the node may mean that some pods are not signaled before the time limit is up.

Problem 2: The Kubelet may not receive the signal

Depending on the cloud and environment, the signal for graceful termination may not be delivered to the node, so the actor who receives the signal is also capable of sending it. If different enviroments have different signals, it is hard to justify doing this only in the Kubelet.

Problem 3: Once the node is terminated, the Kubelet can no longer act

Other actors in the system are responsible post termination for minimizing the failure of the Kubelet to act (specifically pod garbage collection and node controller). A concern with adding support to the Kubelet for deletion is that it is duplicative to the above.

@smarterclayton
Copy link
Contributor

Between the two signals, graceful deletion is only appropriate for nodes that aren't coming back ever, so if we changed we would only fix a portion of the problem you identified. We should consider improving that, but given the scope of the change it's probably better to leverage the other signal and look at how we could propagate ready=false more correctly.

Propagating ready=false does make problem 1 worse if the implementation of post-node cleanup for a given provider is lacking (because it increases the chance that the completed pod update does not make it out). But I think practically that's a gap in the provider and correct propagation of ready=false is necessary for all footprints no matter what. Providers should be aware of that.

My recommendation

The minimum Kube fix here seems to be to ensure that once readiness probes are stopped for a container, the container is marked as not ready, and the kubelet propagates that status rapidly to the api. I think by definition when we are no longer probing we are assuming the pod is not ready, but a risk involved in this fix that may need to be mitigated is that we'll be more rapidly marking the pod not ready during node-initiated shutdown, and then if we fix the early probe stopping at a later time we'd have another impact on users. We might need to identify which early probe-stopping improvements we make at the same time.

@smarterclayton
Copy link
Contributor

Something @bobbypage noted - graceful node shutdown should be triggering the node to go not ready, which should propagate to the control plane, which should cause the node lifecycle controller to detect the node going unready, which should then result in the pods being marked as not ready (should = we see the code for that, it should be working).

That would then at least temporarily result in pod status changing to not ready, which should then trigger an endpoint restart. It might not if the kubelet sees the status change and fixes it rapidly, but a watch result would show the ready state changing.

@mikekap, when you ran your reproducer, did you see the pod status changing during a watch during live shutdown?

@mikekap
Copy link
Contributor Author

mikekap commented Jun 18, 2024

@mikekap, when you ran your reproducer, did you see the pod status changing during a watch during live shutdown?

No I didn't see anything change in the pod description (except the disruption condition).

In the original PR I had traced the reason for that to

// Delay transitioning a pod to a terminal status unless the pod is actually terminal.
// The Kubelet should never transition a pod to terminal status that could have running
// containers and thus actively be leveraging exclusive resources. Note that resources
// like volumes are reconciled by a subsystem in the Kubelet and will converge if a new
// pod reuses an exclusive resource (unmount -> free -> mount), which means we do not
// need wait for those resources to be detached by the Kubelet. In general, resources
// the Kubelet exclusively owns must be released prior to a pod being reported terminal,
// while resources that have participanting components above the API use the pod's
// transition to a terminal phase (or full deletion) to release those resources.
if transitioningToTerminalPhase {
if couldHaveRunningContainers {
newPodStatus.Phase = oldPodStatus.Phase
newPodStatus.Reason = oldPodStatus.Reason
newPodStatus.Message = oldPodStatus.Message
}
}
- though I'm not sure if changing that is the right fix. For what It's worth, we've been running a cluster with the PR linked and it works to fix this particular issue (though maybe incorrectly).

@smarterclayton
Copy link
Contributor

Re: node controller disruption - David indicates there seems to be a bug in the controller that has prevented this from working ever, so can discount for now.

Re: your other code change

I tried to think about all the ways the linked fix could interact with the cluster, and I believe it corrects the issue you are seeing because of an unintended side effect of setting the state to be failed in the kubelet. However, I believe (but have not tested) breaks the expected behavior of the Kubelet allowing readiness probes to keep a pod in the serving rotation after it is marked deleted, so it works to fix your scenario but breaks others (please correct me if my review is inaccurate).

Re: this issue

My triage of this issue is that as described, "kubernetes is working as intended" in this case. There is a gap in that there is no recommendation for cloud-portable shutdown for workloads, but expecting delete pod semantics is a provider specific behavior that graceful shutdown does not provide today (it's not an unreasonable ask, it's just not supported by Kubernetes but is instead something provider specific).

The Kubernetes-portable behavior is to use the readiness endpoint to control being in rotation during graceful shutdown - being deleted is not required to be drained in a conformant distro today. So in the scenario you describe, your readiness endpoint should start returning false during your preStop hook if you want to be taken out of rotation fast.


Details:

When graceful node shutdown is enabled, I expect that services are drained appropriately during pod shutdown (i.e. the same way that kubectl delete pod would work).

Graceful shutdown as a feature is intended to work for all node shutdown scenarios, not just spot instances. One scenario is "node isn't coming back", and the other is "node is coming back". The current design of graceful shutdown does not include a way for the kubelet to know for certain that "node shutting down" = "not coming back", and so the kubelet can't take the action of "deleting pods" today (and thus can't behave like kubectl delete pod).

In general a reboot signal is probably not sufficient to result in a delete pods action even on spot instances - there may be other use cases for transient spot-like nodes where there may be multiple types of shutdown (live migration, suspend for forensic analysis, etc). At the current time, it's the responsibility of the infrastructure provider using the graceful shutdown feature to implement the "delete pod if node isn't coming back" semantic.

I think we as a project should be open to an improvement to graceful node shutdown that standardized that signal (like out-of-service) and standardized the behavior of "we know this node is going away and thus needs to be drained". However, this is effectively asking for an implementation of declarative standardized node drain and is too large for the scope of an issue.

However this doesn't seem to be the case - the pod never enters the Terminating state according to kubectl and doesn't get removed from the service endpoints until it enters the Completed or Failed state. This causes connection errors and lost requests in Cluster-routed services.

The behavior of graceful node shutdown is that Kubernetes respects the readiness probe during the shutdown process. To be removed from rotation eagerly, you must return false from your readiness probe during the preStop phase.

Deleting the pod during a "node isn't coming back" is a separate signal that has the same behavior as above, but would not be portable to all scenarios (naive node reboots when graceful shutdown is used to avoid disruption of workloads without draining).

@mikekap
Copy link
Contributor Author

mikekap commented Jun 19, 2024

In the specific, we can't rely on there being a parent controller because there are many scenarios where automation creates singular pods - all pods are mortal, but we try to avoid disrupting them unnecessarily (which is indeed surprising to users when they do get disrupted). A node restart is not a guarantee that a pod can't happily continue running past that restart.

Would you happen to have an example of where this is used? I can't claim to have run a lot of different things on kube, but I have never seen naked, unmanaged pods used for anything other than demos (i.e. kubectl run). Or is this more about recovering contents of emptyDir volumes and the like after a restart?

One issue here might be that if you want this to continue working as before graceful shutdown, the container might need to know if it has a chance of coming back. e.g. some sort of long-running distributed data crunching task would likely signal a "full disconnect" on a SIGTERM since that's the deletion signal too.

More generally, I should clarify that graceful node shutdown is an an additive feature to Kubernetes - it added the concept of a node being shutdown with the intent that the node may not come back (i.e. to support spot nodes on cloud primarily). However, the default and historical behavior of Kubernetes is the exact opposite - nodes can go away and come back at any time and Kubernetes' was designed to be resilient to all of those scenarios (so we're paranoid about specific aspects of "stopping early" to avoid failing to restart things).

This is what I meant by "if this is a breaking change, IMO it's worth doing." This default does not match with the majority of kube clusters deployed today. Not to say the new default here should be to "break" the current behavior and prevent nodes from rejoining clusters entirely under their old identity -- the breaking change would be to start with no pods after graceful restart.

Re: node controller disruption - David indicates there seems to be a bug in the controller that has prevented this from working ever, so can discount for now.

Re: your other code change

I tried to think about all the ways the linked fix could interact with the cluster, and I believe it corrects the issue you are seeing because of an unintended side effect of setting the state to be failed in the kubelet. However, I believe (but have not tested) breaks the expected behavior of the Kubelet allowing readiness probes to keep a pod in the serving rotation after it is marked deleted, so it works to fix your scenario but breaks others (please correct me if my review is inaccurate).

That sounds entirely right - we don't use unready endpoints anywhere in our clusters so "breaking" it probably just doesn't affect us. Your review seems accurate to me. Definitely happy to close this PR - I realized I was wrong in my diagnosis later in the thread here.

Re: this issue

My triage of this issue is that as described, "kubernetes is working as intended" in this case. There is a gap in that there is no recommendation for cloud-portable shutdown for workloads, but expecting delete pod semantics is a provider specific behavior that graceful shutdown does not provide today (it's not an unreasonable ask, it's just not supported by Kubernetes but is instead something provider specific).

The Kubernetes-portable behavior is to use the readiness endpoint to control being in rotation during graceful shutdown - being deleted is not required to be drained in a conformant distro today. So in the scenario you describe, your readiness endpoint should start returning false during your preStop hook if you want to be taken out of rotation fast.

A few points here:

  • Most tactically, this page needs a serious caveat that readiness probes are load-bearing for shutdown - you need a lame duck mode or similar. A simple sleep in a preStop hook won't work anymore (or doesn't already, but graceful shutdown is not GA).
  • This stance probably defeats the purpose of KEP-3960. "This enhancement aims to provide a more straightforward way to manage graceful [pod] shutdowns." It cannot fulfill that in a world where lame duck mode is required. The only real way I could see keeping it is saying "graceful NODE shutdown does not mean graceful POD shutdown" but that sounds ridiculous and even the graceful shutdown KEP says "Gracefully terminate all ... pods."
  • This stance is against the "motivation" of the graceful shutdown KEP - which was to REDUCE the variety of behaviors of processes that kube manages. https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2000-graceful-node-shutdown/README.md#motivation : "Users and cluster administrators expect that pods will adhere to expected pod lifecycle including pod termination."
  • Ultimately this is your decision (i.e. to try to salvage pods post node restart over letting the node start empty), but this choice is definitely user-hostile and controller-author hostile. Kubelet-led evictions are all corner cases (e.g. node shutdown, resource exhaustion eviction, etc) and nobody will test these. The decision to divide "pod shutdown" and "container shutdown" does not help the majority of users: e.g. the sleep preStop hook is fundamentally broken in cloud clusters under these assumptions.
  • Regardless of the outcome, thank you @smarterclayton for the in-depth explanations and rationale.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 20, 2024

shutdown KEP says "Gracefully terminate all ... pods."

Correct - what the KEP failed to implement (and is obvious in retrospect) was there was no design to implement the desirable property that “gracefully shutdown pods are aggressively removed from load balanced services eagerly unless the workload owner opts in to a slower removal”. Deleting the pod isn’t the right general choice than that, but in the KEP we failed to design the mechanism that would result in graceful pods temporarily stopping.

The right fix is to correct that oversight via a new signal that indicates the kubelet will be stopping the pod (desirable for other use cases) and have endpoints treat that as identical to “deleting the pod”.

A secondary fix would be to standardize node drain via a control plane signal, which then all implementations could honor (being discussed on sig arch mailing list for other reasons). A provider could then correctly map their specific instance termination signals to that request, avoiding divergent drain implementations.

Regarding pods without controllers, it’s fairly common to orchestrate pods directly via non-kube-centric orchesrtators (batch frameworks, CI/CD systems) vs workload controllers, and in many cases those are run in environments where pods survive power outages. Deleting the pod would violate current behavior and potentially lead to data loss for any metadata attached to the pod.

@smarterclayton
Copy link
Contributor

(i.e. to try to salvage pods post node restart over letting the node start empty), but this choice is definitely user-hostile and controller-author hostile

Kube as a project has chosen to not change default behavior over time so as to reduce "surprise" for existing users, and that puts a higher burden on new users (more new specific things to learn). Because we have existing users with large fleets in "forever" node environments, we can't just delete the pods - it would be user-hostile and hostile to their non-controller automation.

I agree with you that this is surprising and I agree we should be finding a way to either improve graceful node shutdown via the KEP for GA or spawn a new KEP to formally define an a way to indicate workloads will be temporarily disrupted and endpoints should properly react (that'll be new API surface area).

@ervikrant06
Copy link

Not sure whether the issue we are seeing is similar to this report or not.

Using 1.24 version

kubectl get pod databend-meta-1
NAME              READY   STATUS        RESTARTS   AGE
databend-meta-1   1/1     Terminating   0          54d

kubectl get pod databend-meta-1 -oyaml | grep -i deletionTimestamp:
  deletionTimestamp: "2024-06-16T19:12:58Z"

kubectl get endpoints databend-meta -oyaml | grep hostname
  - hostname: databend-meta-2
  - hostname: databend-meta-0
  - hostname: databend-meta-1

describe on endpoint slice shows 

  - Addresses:  172.28.xx.xx
    Conditions:
      Ready:    true
    Hostname:   databend-meta-1
    TargetRef:  Pod/databend-meta-1
    NodeName:   kube5wkr0099.example.com
    Zone:       <unset>

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 27, 2024

In reviewing the behavior of the kubelet in this scenario, I'm taking a look at the code paths for termination to identify whether we should be marking the pod unready from the kubelet if we know it will be removed. my previous concern was around static pods, but static pods should propagate local deletion to the apiserver consistent with regular pods, and marking the pod not ready aggressively might be correct for the graceful case.

@smarterclayton
Copy link
Contributor

Updated:

@mikekap your triage in #116965 (comment) is accurate - the deferral of reporting a terminal phase implemented by #108366 (which is what allows components on the cluster to know for certain when a pod has cleaned up all of its running resources, which is required for safety and scheduling new capacity to the node) exposed a flaw in how the pod prober manager (which owns readiness state) tracks readiness. The pod is still considered running, so prober_manager.go#UpdatePodStatus continues to leave the pod as ready because it doesn't know that the pod will be terminal.

As mentioned earlier up thread, when we are certain a pod will be terminating, that means that it will eventually be disrupted no matter how long the pod reports ready (in graceful node shutdown, for some period of time). I was originally concerned that static pods could be disrupted if we were to return unready too early, but we have a mitigation for that which is publishNotReadyAddresses which those components already need to use (because static pods can have deletion propagated in some cases).

But a different and more important argument that we neglected to focus on is that ready propagation is a one way signal for basic services - it should be fast, but we cannot be certain when it will fully propagate to all consuming network components. This requires us to design unready propagation to be "eager" so that we maximize the chance that clients are updated before an expected disruption happens.

Endpoint controller originally added the deletionTimestamp as a signal for removing a pod from endpoints eagerly for this reason - the pod will eventually be disrupted, in advance of when the controller observes the kubelet set the ready state (the kubelet may also be down). However, the ready state on the pod is owned by the kubelet, so if we can be certain that there is no existing use case that depends on pods remaining ready during graceful termination, we can eagerly set ready=false.

  • preStop is a way to defer pod shutdown so that we can continue to serve traffic until we are removed from rotation. A user can set ready=false from a readiness probe during this period (except that we have to fix this bug first Readiness probe stops too early at eviction #124648). However, preStop delaying the Kubelet's reporting of ready=false is not ideal for this usage case because it delays the removal of rotation through the period of time we were hoping to be removed.
  • A pod that expects to remain serving during non-graceful node shutdown (relying on current behavior) will take disruption when the node stops, and so changing behavior during graceful termination won't impact those workloads
  • Graceful node shutdown is configured by administrators or platform admins to mitigate disruption to serving workloads, so without eager unready, it can't accomplish its goals

I need some additional review to ensure that there aren't legitimate workloads that would be impacted by eagerly going unready when the kubelet knows the pod will be shut down, but if we get that certainty then the fix will probably be:

  1. SyncTerminatingPod knows that the pod will eventually be stopped, so the first pod status update in that method can pass a new boolean podIsTerminating to generateAPIPodStatus
  2. generateAPIPodStatus can pass that to probeManager.UpdatePodStatus, which then knows that the pod should be considered unready (and further probe requests can stop, which fixes Readiness probe stops too early at eviction #124648)
  3. The kubelet will then propagate ready false to the API, which will remove the pod from the endpoint's ready addresses

Will open a PoC PR to see what else might break. The e2e test would be a simple variation of the yaml at the top of this PR and confirmation that ready propagates quickly and potentially verify the same conditions as #125404

@dims
Copy link
Member

dims commented Jun 27, 2024

cc @cartermckinnon

@smarterclayton
Copy link
Contributor

As part of the exploration of this, I've created a design proposal for the solution that would address this - please read https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Status: Triaged