Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes stuck in Released state #9833

Open
zc-devs opened this issue Mar 29, 2024 · 6 comments
Open

Volumes stuck in Released state #9833

zc-devs opened this issue Mar 29, 2024 · 6 comments

Comments

@zc-devs
Copy link
Contributor

zc-devs commented Mar 29, 2024

Environmental Info:
K3s Version: v1.29.3+k3s1
Local path provisioner: v0.0.26

Node(s) CPU architecture, OS, and Version:
Linux 5.14.0-362.24.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 20 04:52:13 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 servers, embedded etcd

Describe the bug:
After deploying new 1.29.3 cluster I noticed that PVs doesn't delete and stuck in released state, even though they have persistentVolumeReclaimPolicy: Delete.

Steps To Reproduce:

  1. Install K3s cluster.
  2. Check Local path provisioner version:
# kubectl get deployment -n kube-system local-path-provisioner -o=jsonpath='{$.spec.template.spec.con
tainers[:1].image}'
rancher/local-path-provisioner:v0.0.26
  1. Create test pvc.yaml.
kubectl apply -f pvc.yaml
  1. Create test pod.yaml.
kubectl apply -f pod.yaml
  1. Delete Pod.
kubectl delete -f pod.yaml
  1. Delete PVC.
kubectl delete -f pvc.yaml
  1. Check volume.
# kubectl get persistentvolume | grep test-pvc
pvc-3f2233a9-795e-4ba0-a52f-e4bf335979a4   10Mi       RWO            Delete           Released   kube-system/test-pvc             local-ssd      <unset>                          17m
  1. Check logs of Local path provisioner (after applying a workaround from Local path provisioner disallowed from reading Pods logs #9834).
    local-path-provisioner.log

Expected behavior:
Persistent volume deletes, there are no errors in Local path provisioner, there are no failed helper-pod-delete-pvc-* Pods.

Actual behavior:
Persistent volume doesn't delete, but stuck in released state, there are errors in Local path provisioner's logs and failed helper-pod-delete-pvc-* pods appear.

Additional context / logs:
helper-pod-delete-pvc-3f2233a9-795e-4ba0-a52f-e4bf335979a4.yml

Workaround:
If downgrade Local path provisioner to v0.0.24, then previously stuck in released state PVs automatically get deleted.

@zc-devs
Copy link
Contributor Author

zc-devs commented Mar 29, 2024

First change may be introducing new opt:

  setup: |-
    #!/bin/sh
    while getopts "m:s:p:a:" opt
    do
        case $opt in
            p)
            absolutePath=$OPTARG
            ;;
            s)
            sizeInBytes=$OPTARG
            ;;
            m)
            volMode=$OPTARG
            ;;
            a)
            action=$OPTARG
            ;;
        esac
    done
    if [ "$action" = "create" ]
    then
      mkdir -m 0777 -p ${absolutePath}
      chmod 700 ${absolutePath}/..
    fi
  teardown: |-
    #!/bin/sh
    set -x
    while getopts "m:s:p:a:" opt
    do
        case $opt in
            p)
            absolutePath=$OPTARG
            ;;
            s)
            sizeInBytes=$OPTARG
            ;;
            m)
            volMode=$OPTARG
            ;;
            a)
            action=$OPTARG
            ;;
        esac
    done
    if [ "$action" = "delete" ]
    then
      rm -rf ${absolutePath}
    fi

In that case I don't get Illegal option -a error, but helper pod fails anyways.
Then I tried to debug:

  teardown: |-
    #!/bin/sh
    sleep infinity
/ # ls -lah /var/lib/rancher/k3s/storage/local/ssd/pvc-a9809075-743a-4a86-ba28-38ca3d8256cd_kube-system_test-pvc
ls: can't open '/var/lib/rancher/k3s/storage/local/ssd/pvc-a9809075-743a-4a86-ba28-38ca3d8256cd_kube-system_test-pvc': Permission denied
total 0

/ # ls -lah /var/lib/rancher/k3s/storage/local/ssd
total 3G
drwx------   14 root     root        4.0K Mar 29 22:12 .
drwxr-xr-x    3 root     root        4.0K Mar 29 22:33 ..
drwxrwxrwx    2 root     root        4.0K Mar 29 22:13 pvc-a9809075-743a-4a86-ba28-38ca3d8256cd_kube-system_test-pvc

/ # id
uid=0(root) gid=0(root) groups=0(root),10(wheel)

Then I compared Pod definitions between v0.24 and v0.26:
Screenshot 2024-03-30

@VestigeJ
Copy link

TLDR from above link where I encountered this issue while testing the latest COMMIT_ID for v1.28 branch

$ kg pv -A

NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
checking-path   5Gi        RWO            Recycle          Failed   default/test-pvc   local-path              50m

@0xMALVEE
Copy link
Contributor

0xMALVEE commented May 15, 2024

is this resolved already?

@brandond
Copy link
Contributor

nope. issue is still open and PR is not merged. Waiting for end of code freeze.

@zc-devs
Copy link
Contributor Author

zc-devs commented May 21, 2024

Hi, I've just tested #9964 with Local path config: local-storage.yaml.

While PV creates, it cannot be deleted, the helper pod fails.

helper-pod-create-pvc-v0.24.yaml
helper-pod-create-pvc-v0.26.yaml
helper-pod-delete-pvc-v0.24.yaml
helper-pod-delete-pvc-v0.26.yaml

I think, the main difference is v0.26 doesn't use privileged security context flag, as I've noticed in #9833 (comment). Have to mention also, that I use Oracle Linux 9 with enabled SELinux. Directory permissions are:

# ls -lanZ /var/lib/rancher/k3s/storage/
total 48
drwx------. 12    0    0 system_u:object_r:container_file_t:s0           4096 May 21 19:24 .
drwxr-xr-x.  6    0    0 system_u:object_r:container_var_lib_t:s0        4096 Jan 18 18:32 ..

Volume permissions (v0.24):

# ls -lanZ pvc-9391085e-05dc-42f4-9c0d-25a1b4e6fe4a_kube-system_test-pvc/
total 12
drwxrwxrwx.  2 0 0 system_u:object_r:container_file_t:s0:c131,c199 4096 May 21 19:26 .
drwx------. 12 0 0 system_u:object_r:container_file_t:s0           4096 May 21 19:24 ..
-rw-r--r--.  1 0 0 system_u:object_r:container_file_t:s0:c131,c199    5 May 21 19:29 test.txt

Volume permissions (v0.26):

# ls -lanZ pvc-1fdbed66-a718-463d-908e-21bd731934c4_kube-system_test-pvc/
total 12
drwxrwxrwx.  2 0 0 system_u:object_r:container_file_t:s0:c143,c741 4096 May 21 19:34 .
drwx------. 11 0 0 system_u:object_r:container_file_t:s0           4096 May 21 19:33 ..
-rw-r--r--.  1 0 0 system_u:object_r:container_file_t:s0:c143,c741    5 May 21 19:34 test.txt

Should I file a separate issue?

@brandond
Copy link
Contributor

yes, that sounds like a separate issue. I don't see any difference in the volume permissions or contexts between the two versions though? It is expected that the numeric portion at the end will differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Next Up
Development

No branches or pull requests

5 participants