Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

Open
yuyue9284 opened this issue Jan 13, 2023 · 16 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@yuyue9284
Copy link

What happened?

I have a pod writing the termination message into an empty dir volume with bidirectional mount propagation, every time I delete the pod, the pod will stuck at terminating, and I have to do a force deletion. the logs of kubelet showing something like

UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName:
"kubernetes.io/empty-dir/44ae6082-160d-442e-8234-a2b85905b19c-cache-volume") pod "44ae6082-160d-442e-8234-a2b85905b19c" (UID: "44ae6082-160d-442e-8234-a2b85905b19c") : unlinkat /var/lib/kubelet/pods/44ae6082-160d-442e-8234-a2b85905b19c/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

What did you expect to happen?

The pod can be deleted without force deletion, and the mount point, empty directory on the node running the pod should be cleared.

How can we reproduce it (as minimally and precisely as possible)?

Apply the following yaml, and delete the pod.

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  restartPolicy: Never
  containers:
  - image: ubuntu
    name: test-container
    command:
    - bash
    - "-c"
    - "/script/test.sh"
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
      mountPropagation: Bidirectional
    - mountPath: /script
      name: script
    terminationMessagePath: /cache/log.txt
    terminationMessagePolicy: FallbackToLogsOnError
    securityContext:
      privileged: true
  volumes:
  - name: cache-volume
    emptyDir: {}
  - name: script
    configMap:
      name: test-config
      items:
      - key: test.sh
        path: test.sh
        mode: 0777
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: test-config
data:
  test.sh: |
    #!/bin/bash
    trap "echo 'SIGTERM received, gracefully shutting down'; echo 'shutdown logs...' > /cache/log.txt; exit 0" SIGTERM
    for i in {1..100000000}
    do
      echo "test $i"
      sleep 1
    done

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"c86d003ea699ec4bcffee10ad563a26b63561c0e", GitTreeState:"clean", BuildDate:"2022-12-17T10:31:53Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Azure AKS

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux xxx-microsoft-standard-WSL2 #1 SMP Wed Nov 23 01:01:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@yuyue9284 yuyue9284 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2023
@neolit123
Copy link
Member

/sig node storage

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2023
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Jan 18, 2023
@SergeyKanzhelev
Copy link
Member

Can it be a side effect of this bug: #106735?

@harche can you please take a look?

@SergeyKanzhelev
Copy link
Member

/triage accepted

this bug has enough details. If you can attach the whole kubelet.log it may help troubleshooting.

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 18, 2023
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Jan 18, 2023
@SergeyKanzhelev
Copy link
Member

@yuyue9284 why do you make the /cache Bidirectional?

@yuyue9284
Copy link
Author

yuyue9284 commented Jan 19, 2023

@yuyue9284 why do you make the /cache Bidirectional?

Hi @SergeyKanzhelev , sometimes we need to create mountpoint in one container and share the mountpoint to another container in the same pod using the bidirectional, we found only setting the terminationMessagePath will trigger this bug.

@yuyue9284
Copy link
Author

Part of the kubelet logs.

Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: I0119 07:17:20.597429 2597 reconciler.go:201] "operationExecutor.UnmountVolume started for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume") pod "43b392cd-6bf3-4733-ae2a-6ce8e375293d" (UID: "43b392cd-6bf3-4733-ae2a-6ce8e375293d") "
Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: W0119 07:17:20.597531 2597 empty_dir.go:519] Warning: Failed to clear quota on /var/lib/kubelet/pods/43b392cd-6bf3-4733-ae2a-6ce8e375293d/volumes/kubernetes.io~empty-dir/cache-volume: clearQuota called, but quotas disabled
Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: E0119 07:17:20.597907 2597 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume podName:43b392cd-6bf3-4733-ae2a-6ce8e375293d nodeName:}" failed. No retries permitted until 2023-01-19 07:19:22.597870946 +0000 UTC m=+2353813.337359038 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume") pod "43b392cd-6bf3-4733-ae2a-6ce8e375293d" (UID: "43b392cd-6bf3-4733-ae2a-6ce8e375293d") : unlinkat /var/lib/kubelet/pods/43b392cd-6bf3-4733-ae2a-6ce8e375293d/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

@harche
Copy link
Contributor

harche commented Jan 24, 2023

I was able to reproduce this with master branch k8s and crio,

E0124 10:59:07.051716   38591 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume podName:a50d8417-4de3-4046-99e2-fc21435293a0 nodeName:}" failed. No retries permitted until 2023-01-24 10:59:07.551685171 -0500 EST m=+167.220155170 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume") pod "a50d8417-4de3-4046-99e2-fc21435293a0" (UID: "a50d8417-4de3-4046-99e2-fc21435293a0") : unlinkat /var/lib/kubelet/pods/a50d8417-4de3-4046-99e2-fc21435293a0/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy
E0124 10:59:07.553126   38591 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume podName:a50d8417-4de3-4046-99e2-fc21435293a0 nodeName:}" failed. No retries permitted until 2023-01-24 10:59:08.55311909 -0500 EST m=+168.221589046 (durationBeforeRetry 1s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume") pod "a50d8417-4de3-4046-99e2-fc21435293a0" (UID: "a50d8417-4de3-4046-99e2-fc21435293a0") : unlinkat /var/lib/kubelet/pods/a50d8417-4de3-4046-99e2-fc21435293a0/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

@harche
Copy link
Contributor

harche commented Jan 24, 2023

However, if I change the terminationMessagePath to not use the cache volume, e.g. terminationMessagePath: /tmp/log.txt then the pod terminates without an issue.

$ time kubectl delete -f pod.yaml 
pod "test-pd" deleted
configmap "test-config" deleted

real	0m27.914s
user	0m0.035s
sys	0m0.013s

So it seems like the volume manager is trying to delete the volume while the pod is still terminating and that volume is still in use by the terminating pod.

@harche
Copy link
Contributor

harche commented Jan 24, 2023

terminationMessagePath is mapped to ContainerPath at,

volumeMounts = append(volumeMounts, &runtimeapi.Mount{
HostPath: containerLogPath,
ContainerPath: terminationMessagePath,
SelinuxRelabel: selinuxRelabel,
})

@smarterclayton
Copy link
Contributor

terminationMessage writing should definitely block container volume unmounting. Where is that being written? By CRI?

@harche
Copy link
Contributor

harche commented Jan 24, 2023

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-config
data:
  test.sh: |
    #!/bin/bash
    trap "echo 'SIGTERM received, gracefully shutting down'; echo 'shutdown logs...' > /cache/log.txt; exit 0" SIGTERM
    for i in {1..100000000}
    do
      echo "test $i"
      sleep 1
    done

from the pod yaml in the description, test-container is executing that script which is writing to /cache

@smarterclayton
Copy link
Contributor

I just meant, is there any chance the container write to the volume is doing something unusual such that the OS thinks the volume is still busy even though the container is stopped?

If not, the question would be whether the kubelet is reading the volume post container stop that would cause this. There is a terminationMessagePath read in the status manager and the status manager is asynchronous to the core pod loop. So this may be an unintentional race between kubelet unmounting and kubelet trying to read the message path to update status.

@harche
Copy link
Contributor

harche commented Jan 24, 2023

If not, the question would be whether the kubelet is reading the volume post container stop that would cause this. There is a terminationMessagePath read in the status manager and the status manager is asynchronous to the core pod loop. So this may be an unintentional race between kubelet unmounting and kubelet trying to read the message path to update status.

That's a good pointer. Thanks.

@harche
Copy link
Contributor

harche commented Jan 30, 2023

The bug surfaces only if you specify mountPropagation: Bidirectional in volumeMounts.volumePath section in the pod yaml.

@harche
Copy link
Contributor

harche commented Feb 23, 2023

/assign

@harche
Copy link
Contributor

harche commented Feb 27, 2023

@SergeyKanzhelev Do you know whom to approach in sig storage to explore this further?

@harche harche removed their assignment Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

6 participants