New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVEMENT] Support both NFS hard
and soft
with custom timeo
and retrans
options for RWX volumes
#6655
Comments
This is for use as a mounted volume, but a similar ticket exists for backup target mount options: #6608 They are separate cases, and would not share a common configuration, but is the answer in either case to allow a list of mount options (or maybe even a customized mount command) and if specified, use that string instead of the default set of mount options? |
Yes, I think we can allow a list of mount options. |
OptionsThere are two mount options in a storage class mountOptionsThe options are recorded in the parameters.nfsOptionsFor a RWX volume, CSI's
v1.4.0-v1.4.3 and v1.5.0-v1.5.1Users can create a storage class with
Here, the timeout is 15 seconds and retransmission is 3.
v1.4.4+, v1.5.2+ and v1.6+The reboot hang issue in the Linux kernel is known but without a fix currently. To improve the stability, |
@derekbit Can we make the default soft mode in upcoming 1.4.4 and 1.5.2 if no any compatibility concerns? It seems it happens at runtime when mount the volume used by the share manager pod, so we would expect the NFS mount (share manager) will be changed to use soft mode after restarting the share manager pod, correct? |
Sure. Not compatibility concern. |
Pre Ready-For-Testing Checklist
longhorn/longhorn-manager#2167
|
Hi @derekbit
|
Before rebooting the client machine, did you wait until the IO is stuck? I assume what you tested in the current implementation, hard mode, correct? @derekbit should we have the above criteria when testing? it seems this case is not always able to reproduce. |
On-the-fly IO is required. If the server is down, the IO should be stuck. Yes, it is not easy to reproduce... |
@roger-ryao probably can have a script to run n times to see whether it's able to reproduce if you are running this locally. |
Verified on master-head 20230926
The test steps Scenario 1: RWX Volume with Hard Mount
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "hard,timeo=50,retrans=1"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
# annotations:
# volume.beta.kubernetes.io/storage-class: longhorn-test
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn-test
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
restartPolicy: Always
containers:
- name: volume-test
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
kubectl exec -it volume-test -- /bin/bash -c "mount | grep nfs" 10.43.78.52:/pvc-433cb644-d339-4158-a477-6e1b82a903c1 on /data type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=10.0.2.43,local_lock=none,addr=10.43.78.52) Scenario 2: RWX Volume with Soft Mount
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "soft,timeo=250,retrans=5"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
# annotations:
# volume.beta.kubernetes.io/storage-class: longhorn-test
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn-test
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
restartPolicy: Always
containers:
- name: volume-test
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
kubectl exec -it volume-test -- /bin/bash -c "mount | grep nfs" 10.43.22.27:/pvc-93725304-9c5f-46e6-bb04-ab2ae6bfc6dd on /data type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=250,retrans=5,sec=sys,clientaddr=10.0.2.43,local_lock=none,addr=10.43.22.27) Result Passed
|
…/longhorn#6655 This reverts commit longhorn@1423656 Ref: longhorn/longhorn#6838 Signed-off-by: Roger Yao <roger.yao@suse.com>
Is your improvement request related to a feature? Please describe (👍 if you like this request)
When a RWX volume is attached, a share-manager pod embedded with a userspace NFS server is created and the volume is exported. A remote exported share is
hard
mounted by Longhorn, and it is then provided to the workload. When the share-manager pod or embedded NFS server is somehow crashed or unreachable, the 'hard mount' option keeps the client trying to connect to NFS server and prevents data loss.It has been reported that a reboot hangs when the NFS share is hard mounted and that the connection to the NFS server is lost during I/O operations.
The root cause is the Linux kernel is trying to maintain filesystem stability. Linux kernel will not allow a filesystem to be unmounted until all its pending IO is written back to storage, and the system can't shut down until all file systems are
unmounted. Currently, the bug/issue is not resolved.
A feasible workaround is using
soft
instead. It is possible to mitigate the potential data loss by using the sync mount option. However, thesync
option makes the RWX volumes unusable in most practical applications due to poor IO performance.As discussed with @innobead, at the trade-off the potential data loss and IO performance, the Longhorn system can use both the "hard" option and allow users to use the "soft" option with long custom
timeo
andretrans
values.Describe the solution you'd like
Describe alternatives you've considered
Additional context
cc @james-munson
The text was updated successfully, but these errors were encountered: