Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

yuyue9284 · 2023-01-13T10:13:05Z

What happened?

I have a pod writing the termination message into an empty dir volume with bidirectional mount propagation, every time I delete the pod, the pod will stuck at terminating, and I have to do a force deletion. the logs of kubelet showing something like

UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName:
"kubernetes.io/empty-dir/44ae6082-160d-442e-8234-a2b85905b19c-cache-volume") pod "44ae6082-160d-442e-8234-a2b85905b19c" (UID: "44ae6082-160d-442e-8234-a2b85905b19c") : unlinkat /var/lib/kubelet/pods/44ae6082-160d-442e-8234-a2b85905b19c/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

What did you expect to happen?

The pod can be deleted without force deletion, and the mount point, empty directory on the node running the pod should be cleared.

How can we reproduce it (as minimally and precisely as possible)?

Apply the following yaml, and delete the pod.

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  restartPolicy: Never
  containers:
  - image: ubuntu
    name: test-container
    command:
    - bash
    - "-c"
    - "/script/test.sh"
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
      mountPropagation: Bidirectional
    - mountPath: /script
      name: script
    terminationMessagePath: /cache/log.txt
    terminationMessagePolicy: FallbackToLogsOnError
    securityContext:
      privileged: true
  volumes:
  - name: cache-volume
    emptyDir: {}
  - name: script
    configMap:
      name: test-config
      items:
      - key: test.sh
        path: test.sh
        mode: 0777
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: test-config
data:
  test.sh: |
    #!/bin/bash
    trap "echo 'SIGTERM received, gracefully shutting down'; echo 'shutdown logs...' > /cache/log.txt; exit 0" SIGTERM
    for i in {1..100000000}
    do
      echo "test $i"
      sleep 1
    done

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"c86d003ea699ec4bcffee10ad563a26b63561c0e", GitTreeState:"clean", BuildDate:"2022-12-17T10:31:53Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Azure AKS

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux xxx-microsoft-standard-WSL2 #1 SMP Wed Nov 23 01:01:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

neolit123 · 2023-01-15T15:12:34Z

/sig node storage

SergeyKanzhelev · 2023-01-18T18:52:45Z

Can it be a side effect of this bug: #106735?

@harche can you please take a look?

SergeyKanzhelev · 2023-01-18T18:53:41Z

/triage accepted

this bug has enough details. If you can attach the whole kubelet.log it may help troubleshooting.

/priority important-longterm

SergeyKanzhelev · 2023-01-18T22:32:58Z

@yuyue9284 why do you make the /cache Bidirectional?

yuyue9284 · 2023-01-19T07:15:31Z

@yuyue9284 why do you make the /cache Bidirectional?

Hi @SergeyKanzhelev , sometimes we need to create mountpoint in one container and share the mountpoint to another container in the same pod using the bidirectional, we found only setting the terminationMessagePath will trigger this bug.

yuyue9284 · 2023-01-19T07:21:10Z

Part of the kubelet logs.

Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: I0119 07:17:20.597429 2597 reconciler.go:201] "operationExecutor.UnmountVolume started for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume") pod "43b392cd-6bf3-4733-ae2a-6ce8e375293d" (UID: "43b392cd-6bf3-4733-ae2a-6ce8e375293d") "
Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: W0119 07:17:20.597531 2597 empty_dir.go:519] Warning: Failed to clear quota on /var/lib/kubelet/pods/43b392cd-6bf3-4733-ae2a-6ce8e375293d/volumes/kubernetes.io~empty-dir/cache-volume: clearQuota called, but quotas disabled
Jan 19 07:17:20 agentpool-17631869 kubelet[2597]: E0119 07:17:20.597907 2597 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume podName:43b392cd-6bf3-4733-ae2a-6ce8e375293d nodeName:}" failed. No retries permitted until 2023-01-19 07:19:22.597870946 +0000 UTC m=+2353813.337359038 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/43b392cd-6bf3-4733-ae2a-6ce8e375293d-cache-volume") pod "43b392cd-6bf3-4733-ae2a-6ce8e375293d" (UID: "43b392cd-6bf3-4733-ae2a-6ce8e375293d") : unlinkat /var/lib/kubelet/pods/43b392cd-6bf3-4733-ae2a-6ce8e375293d/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

harche · 2023-01-24T16:00:22Z

I was able to reproduce this with master branch k8s and crio,

E0124 10:59:07.051716   38591 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume podName:a50d8417-4de3-4046-99e2-fc21435293a0 nodeName:}" failed. No retries permitted until 2023-01-24 10:59:07.551685171 -0500 EST m=+167.220155170 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume") pod "a50d8417-4de3-4046-99e2-fc21435293a0" (UID: "a50d8417-4de3-4046-99e2-fc21435293a0") : unlinkat /var/lib/kubelet/pods/a50d8417-4de3-4046-99e2-fc21435293a0/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy
E0124 10:59:07.553126   38591 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume podName:a50d8417-4de3-4046-99e2-fc21435293a0 nodeName:}" failed. No retries permitted until 2023-01-24 10:59:08.55311909 -0500 EST m=+168.221589046 (durationBeforeRetry 1s). Error: UnmountVolume.TearDown failed for volume "cache-volume" (UniqueName: "kubernetes.io/empty-dir/a50d8417-4de3-4046-99e2-fc21435293a0-cache-volume") pod "a50d8417-4de3-4046-99e2-fc21435293a0" (UID: "a50d8417-4de3-4046-99e2-fc21435293a0") : unlinkat /var/lib/kubelet/pods/a50d8417-4de3-4046-99e2-fc21435293a0/volumes/kubernetes.io~empty-dir/cache-volume/log.txt: device or resource busy

harche · 2023-01-24T17:27:56Z

However, if I change the terminationMessagePath to not use the cache volume, e.g. terminationMessagePath: /tmp/log.txt then the pod terminates without an issue.

$ time kubectl delete -f pod.yaml 
pod "test-pd" deleted
configmap "test-config" deleted

real	0m27.914s
user	0m0.035s
sys	0m0.013s

So it seems like the volume manager is trying to delete the volume while the pod is still terminating and that volume is still in use by the terminating pod.

harche · 2023-01-24T17:52:01Z

terminationMessagePath is mapped to ContainerPath at,

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container.go

Lines 420 to 424 in 703361f

    
           volumeMounts = append(volumeMounts, &runtimeapi.Mount{ 
        
           	HostPath:       containerLogPath, 
        
           	ContainerPath:  terminationMessagePath, 
        
           	SelinuxRelabel: selinuxRelabel, 
        
           })

smarterclayton · 2023-01-24T18:50:17Z

terminationMessage writing should definitely block container volume unmounting. Where is that being written? By CRI?

harche · 2023-01-24T18:56:41Z

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-config
data:
  test.sh: |
    #!/bin/bash
    trap "echo 'SIGTERM received, gracefully shutting down'; echo 'shutdown logs...' > /cache/log.txt; exit 0" SIGTERM
    for i in {1..100000000}
    do
      echo "test $i"
      sleep 1
    done

from the pod yaml in the description, test-container is executing that script which is writing to /cache

smarterclayton · 2023-01-24T19:41:31Z

I just meant, is there any chance the container write to the volume is doing something unusual such that the OS thinks the volume is still busy even though the container is stopped?

If not, the question would be whether the kubelet is reading the volume post container stop that would cause this. There is a terminationMessagePath read in the status manager and the status manager is asynchronous to the core pod loop. So this may be an unintentional race between kubelet unmounting and kubelet trying to read the message path to update status.

harche · 2023-01-24T19:47:35Z

If not, the question would be whether the kubelet is reading the volume post container stop that would cause this. There is a terminationMessagePath read in the status manager and the status manager is asynchronous to the core pod loop. So this may be an unintentional race between kubelet unmounting and kubelet trying to read the message path to update status.

That's a good pointer. Thanks.

harche · 2023-01-30T19:23:12Z

The bug surfaces only if you specify mountPropagation: Bidirectional in volumeMounts.volumePath section in the pod yaml.

harche · 2023-02-23T14:55:03Z

/assign

harche · 2023-02-27T14:57:52Z

@SergeyKanzhelev Do you know whom to approach in sig storage to explore this further?

yuyue9284 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 13, 2023

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2023

SergeyKanzhelev added this to Triage in SIG Node Bugs Jan 18, 2023

SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Jan 18, 2023

k8s-ci-robot assigned harche Feb 23, 2023

harche removed their assignment Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

yuyue9284 commented Jan 13, 2023

neolit123 commented Jan 15, 2023

SergeyKanzhelev commented Jan 18, 2023

SergeyKanzhelev commented Jan 18, 2023

SergeyKanzhelev commented Jan 18, 2023

yuyue9284 commented Jan 19, 2023 •

edited

yuyue9284 commented Jan 19, 2023

harche commented Jan 24, 2023

harche commented Jan 24, 2023 •

edited

harche commented Jan 24, 2023

smarterclayton commented Jan 24, 2023

harche commented Jan 24, 2023

smarterclayton commented Jan 24, 2023

harche commented Jan 24, 2023

harche commented Jan 30, 2023 •

edited

harche commented Feb 23, 2023

harche commented Feb 27, 2023

Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

Set TerminationMessagePath to a Path Under Bidirectional Mounted EmptyDir Blocking the Pod Deleting. #115054

Comments

yuyue9284 commented Jan 13, 2023

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

neolit123 commented Jan 15, 2023

SergeyKanzhelev commented Jan 18, 2023

SergeyKanzhelev commented Jan 18, 2023

SergeyKanzhelev commented Jan 18, 2023

yuyue9284 commented Jan 19, 2023 • edited

yuyue9284 commented Jan 19, 2023

harche commented Jan 24, 2023

harche commented Jan 24, 2023 • edited

harche commented Jan 24, 2023

smarterclayton commented Jan 24, 2023

harche commented Jan 24, 2023

smarterclayton commented Jan 24, 2023

harche commented Jan 24, 2023

harche commented Jan 30, 2023 • edited

harche commented Feb 23, 2023

harche commented Feb 27, 2023

yuyue9284 commented Jan 19, 2023 •

edited

harche commented Jan 24, 2023 •

edited

harche commented Jan 30, 2023 •

edited