Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pod mount took a long time even PV/PVC bound #2590

Closed
jenting opened this issue May 13, 2021 · 11 comments
Closed

[BUG] Pod mount took a long time even PV/PVC bound #2590

jenting opened this issue May 13, 2021 · 11 comments
Assignees
Labels
area/csi CSI related like control/node driver, sidecars area/kubernetes Kubernetes related like K8s version compatibility area/stability System or volume stability component/longhorn-manager Longhorn manager (control plane) investigation-needed Need to identify the case before estimating and starting the development kind/bug priority/0 Must be fixed in this release (managed by PO) require/manual-test-plan Require adding/updating manual test cases if they can't be automated severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@jenting
Copy link
Contributor

jenting commented May 13, 2021

Describe the bug

Pod mount took a long time even PV/PVC bound.
The way to troubleshoot it is to get the kubelet log, and generally, we'll see what's happening in it.

To Reproduce

I can make it happen when the longhorn manager is busy receiving the request, that is I have 1 control plane + 3 worker nodes and longhorn runs on these 4 nodes on k3s v1.21.0+k3s1.

Steps to reproduce the behavior:

  1. Create lots of storage class like 30 with recurring job 1 min 1/* * * * *.
  2. Create the Pods that use these storage classes.
  3. You could see that the PV/PVC bound, however, the Pod took a long time to mount the volume or never be able to mount the volume.
  4. At kubelet log, you could see either
2021-05-12T05:03:40.341728+00:00 jenting-longhorn-master-0 k3s[17498]: E0512 05:03:40.341637   17498 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded" containerID="2772def98866331ae76c100ab412ce1961a1e35e19bf4f9f9345077a0d733f9b" cmd=[sh -c ls /data/longhorn && /data/longhorn version --client-only]

or

2021-05-12T04:41:01.854806+00:00 jenting-longhorn-master-0 k3s[17498]: I0512 04:41:01.854710   17498 reconciler.go:295] attacherDetacher.AttachVolume started for volume "pvc-2f923abe-9a0d-4262-8d71-a85e1245724a" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-2f923abe-9a0d-4262-8d71-a85e1245724a") from node "jenting-longhorn-master-0"
2021-05-12T04:41:02.442548+00:00 jenting-longhorn-master-0 k3s[17498]: E0512 04:41:02.441322   17498 csi_attacher.go:698] kubernetes.io/csi: attachment for pvc-2f923abe-9a0d-4262-8d71-a85e1245724a failed: rpc error: code = DeadlineExceeded desc = Failed to attach volume pvc-2f923abe-9a0d-4262-8d71-a85e1245724a to node jenting-longhorn-master-0
2021-05-12T04:41:02.442891+00:00 jenting-longhorn-master-0 k3s[17498]: E0512 04:41:02.441429   17498 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-2f923abe-9a0d-4262-8d71-a85e1245724a podName: nodeName:}" failed. No retries permitted until 2021-05-12 04:43:04.441401417 +0000 UTC m=+71475.522372066 (durationBeforeRetry 2m2s). Error: "AttachVolume.Attach failed for volume \"pvc-2f923abe-9a0d-4262-8d71-a85e1245724a\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-2f923abe-9a0d-4262-8d71-a85e1245724a\") from node \"jenting-longhorn-master-0\" : rpc error: code = DeadlineExceeded desc = Failed to attach volume pvc-2f923abe-9a0d-4262-8d71-a85e1245724a to node jenting-longhorn-master-0"
2021-05-12T04:41:02.443059+00:00 jenting-longhorn-master-0 k3s[17498]: I0512 04:41:02.441842   17498 event.go:291] "Event occurred" object="default/volume-test-1" kind="Pod" apiVersion="v1" type="Warning" reason="FailedAttachVolume" message="AttachVolume.Attach failed for volume \"pvc-2f923abe-9a0d-4262-8d71-a85e1245724a\" : rpc error: code = DeadlineExceeded desc = Failed to attach volume pvc-2f923abe-9a0d-4262-8d71-a85e1245724a to node jenting-longhorn-master-0"

Expected behavior

The Pod should be able to mount the volume quickly.

Log
If applicable, add the Longhorn managers' log when the issue happens.

You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment:

  • Longhorn version: v1.1.1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.21.0+k3s1
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: SLES15 SP2
    • CPU per node: 4
    • Memory per node: 8G
    • Disk type(e.g. SSD/NVMe): SSD
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): VMWare
  • Number of Longhorn volumes in the cluster: 0

Additional context

Since the longhorn manager is the control plan of the longhorn cluster.
I'm think how to make the longhorn manager won't influence like running recurring job very often.

For example, if we run a recurring job for 3 mins, but the job needs 5 mins to finished. This would make the resource starvation. When we launch a new job to run the recurring job, we should make sure this volume has no previous job running.

Furthermore, the current snapshot backup is running in longhorn-manager Pod, we should make it run as a job to not occupy longhorn-manager memory resource.

Or, the simplest way is to run longhorn manager Pods on control plane nodes the other longhorn components run on the worker node.

Furthermore, if possible, we should enhance the longhorn support bundle to get the host kubelet log as well.

@jenting jenting changed the title [TASBUG] Pod mount took a long time even PV/PVC bound [BUG] Pod mount took a long time even PV/PVC bound May 13, 2021
@innobead innobead added area/stability System or volume stability component/longhorn-manager Longhorn manager (control plane) labels May 17, 2021
@surenraju-zz
Copy link

I am adding the same comment from #2583

Often I notice this issue. On random node/nodes, CSI plugin is stuck on mount command. I also noticed that after restarting longhorn-csi-plugin-____ for that problematic node, issue is getting resolved. But issue appears in another node after few days.

Could this be related to amount of data(about 10GB) the RWX volume holds or number of pods associated with this volume? We are using RWX volumes with argo workflows. 100's of pods are launched with that specific RWX volumes. These pods run to complete some steps and new pods will be launched. All those 100's of the pods will consume same RWX volume.

@innobead
Copy link
Member

cc @joshimoo

@innobead innobead added the area/csi CSI related like control/node driver, sidecars label May 26, 2021
@innobead
Copy link
Member

@khushboo-rancher please help reproduce this one and attach the bundle for the debugging reference.

cc @kaxing

@jenting
Copy link
Contributor Author

jenting commented Jun 8, 2021

Besides that, I encounter another issue on Kubernetes v1.21.0.

2021-05-12T05:03:40.341728+00:00 jenting-longhorn-master-0 k3s[17498]: E0512 05:03:40.341637   17498 remote_runtime.go:394] "ExecSync cmd from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded" containerID="2772def98866331ae76c100ab412ce1961a1e35e19bf4f9f9345077a0d733f9b" cmd=[sh -c ls /data/longhorn && /data/longhorn version --client-only]

The workaround to fix it is to increase the readinessProbe.timeoutSeconds on engine-image-ei-xxx DaemonSet from 1 sec to 3 secs. According to the https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes, start from Kubernetes 1.20, the timeoutSeconds behavior got fixed. Therefore, if the user's Kubernetes cluster that the readinesssProbe is over than 1 sec, the Pod can't mount value successfully. The only way to know the root cause is to dump the kubelet log.
image

@innobead innobead added this to the Planning milestone Jun 8, 2021
@innobead innobead added severity/3 Function working but has a major issue w/ workaround area/performance System, volume performance and removed area/stability System or volume stability labels Jun 8, 2021
@innobead innobead modified the milestones: Planning, v1.2.0 Jun 8, 2021
@joshimoo
Copy link
Contributor

joshimoo commented Jun 8, 2021

@innobead @jenting If the current liveness probes that we use require more then 1s to return, we might want to consider using alternative probes. In a previous issue we discussed that the exec probes are pretty heavy. While a http probe is pretty cheap.

@innobead
Copy link
Member

innobead commented Jun 17, 2021

@innobead @jenting If the current liveness probes that we use require more then 1s to return, we might want to consider using alternative probes. In a previous issue we discussed that the exec probes are pretty heavy. While a http probe is pretty cheap.

@jenting enhanced the timeout increase in readiness probe and changing some liveness probe from exec to socket in longhorn/longhorn-manager#928, but this is not the fix for this issue directly, but instead, another finding when testing on 1.21.0.

@innobead innobead added the require/manual-test-plan Require adding/updating manual test cases if they can't be automated label Jun 17, 2021
@innobead innobead modified the milestones: v1.2.0, v1.3.0 Aug 12, 2021
@innobead innobead modified the milestones: v1.3.0, v1.4.0 Mar 31, 2022
@innobead innobead removed this from the v1.4.0 milestone Nov 7, 2022
@innobead innobead added this to the v1.5.0 milestone Nov 7, 2022
@innobead innobead modified the milestones: v1.5.0, v1.6.0 May 3, 2023
@innobead innobead added investigation-needed Need to identify the case before estimating and starting the development priority/0 Must be fixed in this release (managed by PO) area/stability System or volume stability area/kubernetes Kubernetes related like K8s version compatibility and removed area/performance System, volume performance labels Sep 14, 2023
@mantissahz
Copy link
Contributor

When we launch a new job to run the recurring job, we should make sure this volume has no previous job running.
Furthermore, the current snapshot backup is running in longhorn-manager Pod, we should make it run as a job to not occupy longhorn-manager memory resource.

And recurring jobs had been removed from storage class
Now try to changes all ReadnessProbe and LivenessProbe from executing longhorn command to HTTP or TCP sockets.

Related: #3907
@longhorn/qa Could anyone help me reproduce this issue and I would continue reproducing this issue as well.

@innobead innobead modified the milestones: v1.6.0, v1.7.0 Nov 29, 2023
@derekbit
Copy link
Member

@longhorn/qa Can you help check if the issue is still valid? Thank you.

cc @roger-ryao @chriscchien @yangchiu

@khushboo-rancher
Copy link
Contributor

@roger-ryao Could you please help with this whenever get a chance?

@yangchiu
Copy link
Member

yangchiu commented May 8, 2024

Verified passed on master-head (longhorn-manager 1f86d76) with the following steps:

  1. Create recurring jobs for backup and snapshot with cron * * * * *.
  2. Create storage class with the recurring jobs:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-test-recurring
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  recurringJobSelector: '[{"name":"snapshot", "isGroup":false},{"name":"backup", "isGroup":false}]'
  1. Create statefulset with 50 replicas to create lots of volumes with the storage class:
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-statefulset
  namespace: default
spec:
  selector:
    matchLabels:
      app: test-statefulset
  serviceName: test-statefulset
  replicas: 50
  template:
    metadata:
      labels:
        app: test-statefulset
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - image: ubuntu:22.04
          imagePullPolicy: IfNotPresent
          name: sleep
          args: ['/bin/sh', '-c', 'while true;do date;sleep 5; done']
          volumeMounts:
            - name: pod-data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: pod-data
      spec:
        accessModes: ['ReadWriteOnce']
        storageClassName: 'longhorn-test-recurring'
        resources:
          requests:
            storage: 1Gi

  1. 50 volumes can be created and attached without problems. 50 statefulset replica pods can run without problems. And check k3s logs on each worker node with sudo journalctl -u k3s-agent.service --no-pager | grep -i deadline , there is no failed to attach related error logs.

@yangchiu yangchiu closed this as completed May 8, 2024
@yangchiu yangchiu self-assigned this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi CSI related like control/node driver, sidecars area/kubernetes Kubernetes related like K8s version compatibility area/stability System or volume stability component/longhorn-manager Longhorn manager (control plane) investigation-needed Need to identify the case before estimating and starting the development kind/bug priority/0 Must be fixed in this release (managed by PO) require/manual-test-plan Require adding/updating manual test cases if they can't be automated severity/3 Function working but has a major issue w/ workaround
Projects
Status: Resolved/Scheduled
Development

No branches or pull requests

8 participants