Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Support both NFS hard and soft with custom timeo and retrans options for RWX volumes #6655

Closed
derekbit opened this issue Sep 11, 2023 · 12 comments
Assignees
Labels
area/volume-rwx Volume RWX related backport/1.4.4 backport/1.5.2 component/longhorn-share-manager Longhorn share manager (control plane for NFS server, RWX) kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation
Milestone

Comments

@derekbit
Copy link
Member

derekbit commented Sep 11, 2023

Is your improvement request related to a feature? Please describe (👍 if you like this request)

When a RWX volume is attached, a share-manager pod embedded with a userspace NFS server is created and the volume is exported. A remote exported share is hard mounted by Longhorn, and it is then provided to the workload. When the share-manager pod or embedded NFS server is somehow crashed or unreachable, the 'hard mount' option keeps the client trying to connect to NFS server and prevents data loss.

It has been reported that a reboot hangs when the NFS share is hard mounted and that the connection to the NFS server is lost during I/O operations.

The root cause is the Linux kernel is trying to maintain filesystem stability. Linux kernel will not allow a filesystem to be unmounted until all its pending IO is written back to storage, and the system can't shut down until all file systems are
unmounted. Currently, the bug/issue is not resolved.

A feasible workaround is using soft instead. It is possible to mitigate the potential data loss by using the sync mount option. However, the sync option makes the RWX volumes unusable in most practical applications due to poor IO performance.

As discussed with @innobead, at the trade-off the potential data loss and IO performance, the Longhorn system can use both the "hard" option and allow users to use the "soft" option with long custom timeo and retrans values.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cc @james-munson

@derekbit derekbit added require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation area/volume-rwx Volume RWX related kind/improvement Request for improvement of existing function component/longhorn-share-manager Longhorn share manager (control plane for NFS server, RWX) labels Sep 11, 2023
@derekbit derekbit added this to the v1.6.0 milestone Sep 11, 2023
@innobead innobead added priority/0 Must be fixed in this release (managed by PO) backport/1.4.4 backport/1.5.2 labels Sep 11, 2023
@james-munson
Copy link
Contributor

This is for use as a mounted volume, but a similar ticket exists for backup target mount options: #6608

They are separate cases, and would not share a common configuration, but is the answer in either case to allow a list of mount options (or maybe even a customized mount command) and if specified, use that string instead of the default set of mount options?

@derekbit
Copy link
Member Author

They are separate cases, and would not share a common configuration, but is the answer in either case to allow a list of mount options (or maybe even a customized mount command) and if specified, use that string instead of the default set of mount options?

Yes, I think we can allow a list of mount options.

@derekbit
Copy link
Member Author

derekbit commented Sep 20, 2023

Options

There are two mount options in a storage class

mountOptions

The options are recorded in the pv.spec.mountOptions. share-manager controller will fetch the options from the PV and append them to the share-manager pod's container args. Longhorn block device will be mounted with the custom options and exported as a NFS share. (code)

parameters.nfsOptions

For a RWX volume, CSI's NodeStageVolume will retrieve the nfsOptions in a storage class. (code)

  • If the value is empty, the NFS client uses the options
     	mountOptions = []string{
     		"vers=4.1",
     		"noresvport",
     		"intr",
     		"hard",
     	}
    
  • If the value is given, NFS client will use the given options instead.

v1.4.0-v1.4.3 and v1.5.0-v1.5.1

Users can create a storage class with nfsOptions if they want to switch to soft mode for avoiding the reboot hang issue

vers=4.1,noresvport,soft,timeo=150,retrans=3

Here, the timeout is 15 seconds and retransmission is 3.

timeo and retrans should be long enough. Then, If the share-manager pod is back or a replacement is created, the client can reconnect to it without data loss.

v1.4.4+, v1.5.2+ and v1.6+

The reboot hang issue in the Linux kernel is known but without a fix currently. To improve the stability, soft mode with long timeout will be used for mounting a NFS share.

@innobead
Copy link
Member

innobead commented Sep 20, 2023

@derekbit Can we make the default soft mode in upcoming 1.4.4 and 1.5.2 if no any compatibility concerns?

It seems it happens at runtime when mount the volume used by the share manager pod, so we would expect the NFS mount (share manager) will be changed to use soft mode after restarting the share manager pod, correct?

@derekbit
Copy link
Member Author

Can we make the default soft mode in upcoming 1.4.4 and 1.5.2 if no any compatibility concerns?

Sure. Not compatibility concern.
The setting will be applied after the share-manager pod is restarted.

@innobead
Copy link
Member

After discussing with @derekbit , we will investigate the feasibility of introducing hard mode back when share manager HA is introduced in 1.7. #6205

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Sep 20, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

#6655 (comment)

  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-manager#2167
longhorn/longhorn-manager#2170

  • Which areas/issues this PR might have potential impacts on?
    Area: RWX volume
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

longhorn/website#777

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

@roger-ryao
Copy link

Hi @derekbit

  1. Create a directory to mount the NFS share: mkdir -p /mnt/nfs_share
  2. Mount the NFS share from the server with this command: mount -t nfs XX.XX.XX.XX:/var/nfs /mnt/nfs_share
  3. Start a data writing process in the background: dd if=/dev/zero of=/mnt/nfs_share/a &
  4. Shutdown the NFS server.
  5. Reboot the client machine.
  6. After rebooting the client, check the syslog on the Ubuntu 22.04 machine for any relevant information.
    we didn't see umount.nfs: /mnt: device is busy information in the syslog

@innobead
Copy link
Member

Hi @derekbit

  1. Create a directory to mount the NFS share: mkdir -p /mnt/nfs_share
  2. Mount the NFS share from the server with this command: mount -t nfs XX.XX.XX.XX:/var/nfs /mnt/nfs_share
  3. Start a data writing process in the background: dd if=/dev/zero of=/mnt/nfs_share/a &
  4. Shutdown the NFS server.
  5. Reboot the client machine.

Before rebooting the client machine, did you wait until the IO is stuck? I assume what you tested in the current implementation, hard mode, correct?

@derekbit should we have the above criteria when testing? it seems this case is not always able to reproduce.

@derekbit
Copy link
Member Author

derekbit commented Sep 20, 2023

should we have the above criteria when testing? it seems this case is not always able to reproduce.

On-the-fly IO is required. If the server is down, the IO should be stuck. Yes, it is not easy to reproduce...

@innobead
Copy link
Member

@roger-ryao probably can have a script to run n times to see whether it's able to reproduce if you are running this locally.

@roger-ryao
Copy link

roger-ryao commented Sep 26, 2023

Verified on master-head 20230926

The test steps

Scenario 1: RWX Volume with Hard Mount

  1. Deploy an RWX volume with the following storage class settings (nfsOptions: "hard,timeo=50,retrans=1")
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  nfsOptions: "hard,timeo=50,retrans=1"
  1. Create a Pod that mounts this volume.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-volv-pvc
#  annotations:
#    volume.beta.kubernetes.io/storage-class: longhorn-test
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: longhorn-test
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test
  namespace: default
spec:
  restartPolicy: Always
  containers:
  - name: volume-test
    image: nginx
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
          - ls
          - /data/lost+found
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: longhorn-volv-pvc
  1. Verify that the remote export mount by Longhorn is in hard mode
kubectl exec -it volume-test -- /bin/bash -c "mount | grep nfs"
10.43.78.52:/pvc-433cb644-d339-4158-a477-6e1b82a903c1 on /data type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=10.0.2.43,local_lock=none,addr=10.43.78.52)

Scenario 2: RWX Volume with Soft Mount

  1. Deploy an RWX volume with the following storage class settings (nfsOptions: "soft,timeo=250,retrans=5")
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  nfsOptions: "soft,timeo=250,retrans=5"
  1. Create a Pod that mounts this volume
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-volv-pvc
#  annotations:
#    volume.beta.kubernetes.io/storage-class: longhorn-test
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: longhorn-test
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test
  namespace: default
spec:
  restartPolicy: Always
  containers:
  - name: volume-test
    image: nginx
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
          - ls
          - /data/lost+found
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: longhorn-volv-pvc
  1. Verify that the remote export mount by Longhorn is in soft mode
kubectl exec -it volume-test -- /bin/bash -c "mount | grep nfs"
10.43.22.27:/pvc-93725304-9c5f-46e6-bb04-ab2ae6bfc6dd on /data type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=250,retrans=5,sec=sys,clientaddr=10.0.2.43,local_lock=none,addr=10.43.22.27)

Result Passed

@derekbit derekbit reopened this Oct 3, 2023
@derekbit derekbit closed this as completed Oct 3, 2023
roger-ryao added a commit to roger-ryao/longhorn-tests that referenced this issue Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volume-rwx Volume RWX related backport/1.4.4 backport/1.5.2 component/longhorn-share-manager Longhorn share manager (control plane for NFS server, RWX) kind/improvement Request for improvement of existing function priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation
Projects
None yet
Development

No branches or pull requests

5 participants