Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet is not able to delete pod with mounted secret/configmap after restart #96635

Closed
mborsz opened this issue Nov 17, 2020 · 24 comments · Fixed by #110670
Closed

kubelet is not able to delete pod with mounted secret/configmap after restart #96635

mborsz opened this issue Nov 17, 2020 · 24 comments · Fixed by #110670
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mborsz
Copy link
Member

mborsz commented Nov 17, 2020

From #96038 (comment)

What happened:
In https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936 one of the nodes (gce-scale-cluster-minion-group-bddx) restarted for some reason (some kernel panic).

The last kubelet log entry is at 08:13:05.362859 and the first one after restart is 08:15:40.361826.

In the meantime (at 08:13:17.496033), one of the pods (small-deployment-167-56c965c4cf-9pw8k) running on that kubelet has been deleted by generic-garbage-collector (i.e. deletionTimestamp has been set).

Kubelet after restart was never able to mark this pod as deleted (i.e. object has been never actually deleted).

What you expected to happen:
After kubelet's restart, the pod object will be deleted from etcd.

How to reproduce it (as minimally and precisely as possible):
Based on our logs, stopping kubelet for a while, deleting pod running on it and restarting kubelet should trigger this issue.

Potentially, it may be important that the pod was using configmap (that has been already deleted).
Anything else we need to know?:

Link to the test run: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936
Kubelet's logs: http://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1328563866136743936/artifacts/gce-scale-cluster-minion-group-bddx/kubelet.log
Pod name: test-sk0eco-5/small-deployment-167-56c965c4cf-9pw8k
More logs (like all kube-apiserver's logs for that pod can be found here): #96038 (comment)

Environment:

  • Kubernetes version (use kubectl version): v1.20.0-beta.1.663+147a120948482e
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

/cc @kubernetes/sig-node-bugs

@mborsz mborsz added the kind/bug Categorizes issue or PR as related to a bug. label Nov 17, 2020
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 17, 2020
@mborsz
Copy link
Member Author

mborsz commented Nov 17, 2020

It looks like kubelet is continuously trying to mount secret that has been deleted in the meantime:

I1117 08:15:58.345135    2017 reconciler.go:269] operationExecutor.MountVolume started for volume "secret" (UniqueName: "kubernetes.io/secret/7b4d462c-23ad-4a45-9e1e-ee73fee9e1da-secret") pod "small-deployment-167-56c965c4cf-9pw8k" (UID: "7b4d462c-23ad-4a45-9e1e-ee73fee9e1da")
E1117 08:15:58.345326    2017 secret.go:195] Couldn't get secret test-sk0eco-5/small-deployment-167: secret "small-deployment-167" not found
E1117 08:15:58.345419    2017 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/7b4d462c-23ad-4a45-9e1e-ee73fee9e1da-secret podName:7b4d462c-23ad-4a45-9e1e-ee73fee9e1da nodeName:}" failed. No retries permitted until 2020-11-17 08:16:06.345386223 +0000 UTC m=+26.080620409 (durationBeforeRetry 8s). Error: "MountVolume.SetUp failed for volume \"secret\" (UniqueName: \"kubernetes.io/secret/7b4d462c-23ad-4a45-9e1e-ee73fee9e1da-secret\") pod \"small-deployment-167-56c965c4cf-9pw8k\" (UID: \"7b4d462c-23ad-4a45-9e1e-ee73fee9e1da\") : secret \"small-deployment-167\" not found"
I1117 08:15:58.445474    2017 reconciler.go:269] operationExecutor.MountVolume started for volume "configmap" (UniqueName: "kubernetes.io/configmap/7b4d462c-23ad-4a45-9e1e-ee73fee9e1da-configmap") pod "small-deployment-167-56c965c4cf-9pw8k" (UID: "7b4d462c-23ad-4a45-9e1e-ee73fee9e1da")
E1117 08:15:58.445673    2017 configmap.go:200] Couldn't get configMap test-sk0eco-5/small-deployment-167: configmap "small-deployment-167" not found

Is it possible to delete configmap or secret that is still mounted somewhere?

@chenyw1990
Copy link
Contributor

mark

@mborsz
Copy link
Member Author

mborsz commented Nov 17, 2020

The secret and configmap have been deleted at 08:13:17.372114 and 08:13:17.407192:

I1117 08:13:17.372114      11 httplog.go:89] "HTTP" verb="DELETE" URI="/api/v1/namespaces/test-sk0eco-5/secrets/small-deployment-167" latency="4.841432ms" userAgent="clusterloader/v0.0.0 (linux/amd64) kubernetes/$Format" srcIP="34.123.35.171:58728" resp=200
I1117 08:13:17.407192      11 httplog.go:89] "HTTP" verb="DELETE" URI="/api/v1/namespaces/test-sk0eco-5/configmaps/small-deployment-167" latency="2.654193ms" userAgent="clusterloader/v0.0.0 (linux/amd64) kubernetes/$Format" srcIP="34.123.35.171:58732" resp=200

The pod has been deleted at 08:13:17.496033:

I1117 08:13:17.496033      11 httplog.go:89] "HTTP" verb="DELETE" URI="/api/v1/namespaces/test-sk0eco-5/pods/small-deployment-167-56c965c4cf-9pw8k" latency="7.49838ms" userAgent="kube-controller-manager/v1.20.0 (linux/amd64) kubernetes/147a120/system:serviceaccount:kube-system:generic-garbage-collector" srcIP="[::1]:52478" resp=200

So the secret and configmap have been deleted before pod. This seems to be incorrect.

So most likely the question is: Should we support pod deletion after mounted secret/configmap has been deleted?

I don't have that much kubelet context, but I think we should (e.g. if some user accidentally removes configmap before pod, then the user should have a way to delete pod without recreating configmap) and this is the bug that it's not always the case.

@mborsz mborsz changed the title kubelet is not able to delete pod after restart kubelet is not able to delete pod with mounted secret/configmap after restart Nov 17, 2020
@liggitt
Copy link
Member

liggitt commented Nov 17, 2020

Should we support pod deletion after mounted secret/configmap has been deleted?

Yes, the kubelet should not block cleanup of the pod on unavailability of secret/configmap resources

/sig storage

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Nov 17, 2020
@chenyw1990
Copy link
Contributor

/assign

I have found the reason, i will fix it.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2021
@marseel
Copy link
Member

marseel commented Apr 8, 2021

/remove-lifecycle rotten
We are still observing this flake in our tests, here is recent run in which it occurred: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1377667417462804480

This time pod "test-vil5th-43/small-deployment-227-5cd8b577c9-mfhx5" was not deleted.
Logs confirming kubelet restart:

13:I0401 17:09:11.320405    1908 flags.go:59] FLAG: --add-dir-header="false"
14:I0401 17:09:11.320452    1908 flags.go:59] FLAG: --address="0.0.0.0"
...
2440:I0401 20:23:03.271231  151431 flags.go:59] FLAG: --add-dir-header="false"
2441:I0401 20:23:03.271285  151431 flags.go:59] FLAG: --address="0.0.0.0"

Example kubelet errors while trying to read deleted configmap/secret for this pod:

3337:I0401 21:28:24.861587  151431 reconciler.go:269] "operationExecutor.MountVolume started for volume \"configmap\" (UniqueName: \"kubernetes.io/configmap/8f73aa25-79dd-4222-b5b7-0d29c06f948c-configmap\") pod \"small-deployment-227-5cd8b577c9-mfhx5\" (UID: \"8f73aa25-79dd-4222-b5b7-0d29c06f948c\") "
3338:I0401 21:28:24.862309  151431 reconciler.go:269] "operationExecutor.MountVolume started for volume \"secret\" (UniqueName: \"kubernetes.io/secret/8f73aa25-79dd-4222-b5b7-0d29c06f948c-secret\") pod \"small-deployment-227-5cd8b577c9-mfhx5\" (UID: \"8f73aa25-79dd-4222-b5b7-0d29c06f948c\") "
3341:E0401 21:28:24.862654  151431 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/8f73aa25-79dd-4222-b5b7-0d29c06f948c-secret podName:8f73aa25-79dd-4222-b5b7-0d29c06f948c nodeName:}" failed. No retries permitted until 2021-04-01 21:30:26.862598616 +0000 UTC m=+4043.677596199 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"secret\" (UniqueName: \"kubernetes.io/secret/8f73aa25-79dd-4222-b5b7-0d29c06f948c-secret\") pod \"small-deployment-227-5cd8b577c9-mfhx5\" (UID: \"8f73aa25-79dd-4222-b5b7-0d29c06f948c\") : secret \"small-deployment-227\" not found"
3342:E0401 21:28:24.862703  151431 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/8f73aa25-79dd-4222-b5b7-0d29c06f948c-configmap podName:8f73aa25-79dd-4222-b5b7-0d29c06f948c nodeName:}" failed. No retries permitted until 2021-04-01 21:30:26.862675209 +0000 UTC m=+4043.677672777 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"configmap\" (UniqueName: \"kubernetes.io/configmap/8f73aa25-79dd-4222-b5b7-0d29c06f948c-configmap\") pod \"small-deployment-227-5cd8b577c9-mfhx5\" (UID: \"8f73aa25-79dd-4222-b5b7-0d29c06f948c\") : configmap \"small-deployment-227\" not found"

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 8, 2021
@mm4tt
Copy link
Contributor

mm4tt commented Apr 13, 2021

Ping, any updates?

@tosi3k
Copy link
Member

tosi3k commented May 10, 2021

Friendly ping for the update. We came across this issue one again in a recent scale test job run: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1390713430448541696.

@mborsz
Copy link
Member Author

mborsz commented Jun 7, 2021

@ehashman
Copy link
Member

/triage accepted
/kind flake

I think this bug was known (we've been tracking similar issues in OpenShift) but no one had seen this issue. Let's get it on our CI board.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. kind/flake Categorizes issue or PR as related to a flaky test. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2021
@Oats87
Copy link

Oats87 commented Oct 14, 2021

This looks like a "duplicate" of the issue here: #70044

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 13, 2021
@joshimoo
Copy link

joshimoo commented Dec 7, 2021

/remove-lifecycle rotten
related issue #70044

A similar failure scenario exists for CSI volumes.
Currently checking to see if I can track the fault and create a reliable reproducer :)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 7, 2021
@wojtek-t
Copy link
Member

wojtek-t commented Dec 9, 2021

FTR - it seems that #96790 is dead, so it's still an issue.

@msau42
Copy link
Member

msau42 commented Jan 29, 2022

I think the key here is that this issue occurs if the kubelet restarts. This points to a problem with volume reconstruction logic. cc @jingxu97 @gnufied
@kubernetes/sig-storage-bugs

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2022
@mborsz
Copy link
Member Author

mborsz commented Apr 29, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2022
@gnufied
Copy link
Member

gnufied commented Jun 1, 2022

Is this still an issue?(see below - yes it is still reproducible) I wonder if fixes @liggitt made in #105204 helped solve the issue. I wrote following script to try and reproduce it and I let it run for 3-4 hours and I could not see an single instance of pod stuck in terminating state:

#!/bin/bash

/home/hekumar/redhat/kubernetes/_output/local/bin/linux/amd64/kubelet --v=3 --vmodule= --container-runtime=remote --hostname-override=127.0.0.1 --cloud-provider= --cloud-config= --bootstrap-kubeconfig=/var/run/kubernetes/kubelet.kubeconfig --kubeconfig=/var/run/kubernetes/kubelet-rotated.kubeconfig --container-runtime-endpoint=unix:///var/run/crio/crio.sock --config=/tmp/kubelet.yaml > /tmp/kubelet.log 2>&1 &
kubelet_pid=$!
sleep 20
echo "Kubelet is running with pid ${kubelet_pid}"

while true
do
    echo "creating pods"
    kubectl create -f config_map_path_pod.yaml
    kubectl wait --for=condition=ready pod -l app=test
    # wait for 10s
    sleep 10
    # stop kubelet
    echo "Killing kubelet with pid ${kubelet_pid}"
    kill -9 "${kubelet_pid}"
    kubectl delete -f config_map_path_pod.yaml --wait=true --timeout=2s
    echo "Starting Kubelet backup"
    /home/hekumar/redhat/kubernetes/_output/local/bin/linux/amd64/kubelet --v=3 --vmodule= --container-runtime=remote --hostname-override=127.0.0.1 --cloud-provider= --cloud-config= --bootstrap-kubeconfig=/var/run/kubernetes/kubelet.kubeconfig --kubeconfig=/var/run/kubernetes/kubelet-rotated.kubeconfig --container-runtime-endpoint=unix:///var/run/crio/crio.sock --config=/tmp/kubelet.yaml > /tmp/kubelet.log 2>&1 &
    kubelet_pid=$!
    echo "Kubelet is running with pid ${kubelet_pid}"
    kubectl delete pod dapi-test-pod --wait=true --timeout=40s
    if [ $? -eq 0 ]
    then
       echo "Pod deleted successfully"
    else
        echo "Failed to delete pod"
        exit -1
    fi
done

The job history looks clean too - https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance

@gnufied
Copy link
Member

gnufied commented Jun 1, 2022

Hmm, I take my previous comment back. It appears that the error condition is still a valid bug and reproducible.

@gnufied
Copy link
Member

gnufied commented Jun 2, 2022

I was taking a look at reproducing and fixing this issue and wanted to post my findings. Basically what is happening is:

  1. Pods gets created.
  2. While pod was running secret is deleted.
  3. Kubelet gets restarted.
  4. Kubelet does reconstruction and volume gets added to DSOW. It would have got added to DSOW anyways, because pod was still running. The key thing is volume does not get added to ASOW during reconcile because secret/configmap is not found.
  5. Pod is deleted.
  6. Now the code here- https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go#L249 prevents the volume+pod from being removed from DSOW because pod never made it to ASOW.
  7. So pod_worker tries to terminate the container and pod, and most of it works fine. But kubelet is unable to fully terminate the pod because https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_volumes.go#L77 returns true.
  8. And hence pod is stuck in terminating state.

The problem is - we simply can't choose to skip adding volumes to DSOW if pod has deletionTimestamp, because that will result in volume never getting cleaned up. So fix proposed #96790 is not full proof.

A real solution IMO is to add all pods+volumes in uncertain state during reconstruction, so as volumes can be removed from DSOW and volumes are still required to be cleaned up before pod can be terminated. @jsafrane has a PR that implements part of this solution - #108180 . I am looking in using it to fix this bug.

@gnufied
Copy link
Member

gnufied commented Jun 2, 2022

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.