[BUG] Backup NFS - Operation not permitted during mount #6114

adampetrovic · 2023-06-13T06:29:53Z

Describe the bug (🐛 if you encounter this issue)

Setting backupTarget to an NFS store is giving an Operation not permitted error in the UI

I am using a Synology NAS that only supports up to v4.1

# cat /proc/fs/nfsd/versions
+2 +3 +4 +4.1

ash-4.4# cat /etc/exports
/volume2/k8s-backup	10.0.80.0/21(rw,async,no_wdelay,no_root_squash,insecure_locks,sec=sys,anonuid=1025,anongid=100)

My kubernetes nodes are within the subnet above

To Reproduce

Manually exec the mount command from within a longhorn-manager:

$ mkdir -p /mnt/nfs
$ mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /mnt/nfs
mount.nfs4: Operation not permitted

Executing the same command directly on a k8s node works fine:

$ sudo mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /tmp/nas
$

Expected behavior

NFS should mount

Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

Longhorn version: 1.4.2
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm / Flux
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: K3s
- Number of management node in the cluster: 1
- Number of worker node in the cluster: 4
Node config
- OS type and version: Ubuntu 22.04
- CPU per node: 2
- Memory per node: 30GB
- Disk type(e.g. SSD/NVMe): NVMe
- Network bandwidth between the nodes: 1Gbe
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Proxmox
Number of Longhorn volumes in the cluster: N/A

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

derekbit · 2023-06-14T09:14:16Z

@adampetrovic
Can you support a support bundle for further investigation?

ozid · 2023-11-11T18:12:35Z

I have the exact same issue longhorn v1.5.1 using helm
@derekbit here is my support bundle
[removed]
Mount is working on the host but not inisde the pod, this is the error on longhorn-manager:
mount.nfs4: Operation not permitted
While on the host its' working without issue, the exact same mount command

Personally i don't have access to the NFS server

DanielG0721 · 2023-11-11T20:36:13Z

i also struggle with this issue.. V1.5.1

derekbit · 2023-11-12T03:21:29Z

@ozid @DanielG0721
It looks related to the server or client's permission configuration. Would you able to check https://longhorn.io/kb/troubleshooting-unable-to-mount-an-nfs-backup-target/?

ozid · 2023-11-12T14:45:59Z

Hello @derekbit i checked a bit more and i see this on a tcpdump:

First line is the pod making the request
and second line NFS reply with Operation not permitted so i guess you are right

Find strange this is working on the host itself and not the pod, but after testing on an another privileged pod i have the exact same issue. Ill try to search the issue but on the server side i can only whitelist our IP nothing else

ozid · 2023-11-12T15:16:46Z

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023),
On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted

....

After testing a rule on my gateway to rewrite the source port to use a port between 1-1023
ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023
The mount is working :)

derekbit · 2023-11-13T02:47:28Z

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted

....

After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working :)

@ozid
Cool! We were not aware that it is caused by the port usage in the k8s system.
However, why does it work in most environments without the issue?

derekbit · 2023-11-13T02:48:14Z

cc @james-munson

ozid · 2023-11-13T11:32:04Z

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted
....
After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working :)

@ozid Cool! We were not aware that it is caused by the port usage in the k8s system. However, why does it work in most environments without the issue?

I would love to know why also but there is so many ways of doing thing...
Personally i use cilium as CNI without kube-proxy. So maybe the ebpf program rewrite the source port to a random higher range and this is causing the issue.

Maybe it worth writing this on the documentation here:
https://longhorn.io/kb/troubleshooting-unable-to-mount-an-nfs-backup-target/

derekbit · 2023-11-13T12:28:23Z

cc @mantissahz

dotdiego · 2023-12-08T10:34:48Z

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted

....

After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working :)

Could you share more about how you managed to fix this issue ?
I'm currently in this situation and don't know how to fix this.

Thanks

ozid · 2023-12-08T13:23:56Z

Hello @dotdiego yes of course, so basically your NFS server expecting your client to have a source port to be in the privileged port range, in simple term it must be under 1024.

So you must somehow find a way to make sure your client (in this case, the longhorn pod) is making a nfs query with a source port between 1-1023.

In my case, my kubernetes cluster need to go through my gateway server to reach the NFS, good news for me it's fully managed with linux tools so i can manipulate the traffic as i want, this is why i added this line to my gateway:
ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023

So after adding this, my gateway rewrite the original source port wich was: 54850 to a random port between 1-1023

Before my command tcpdump was output something like this:
my-longhorn-pod-ip:54850 -> nfs_server-ip:2049 this is not working

After my command on my gateway:
my-longhorn-pod-ip:1022 -> nfs_server-ip:2049 this working

Hope it's more clear for you

innobead · 2023-12-18T03:09:28Z

@james-munson Please help with the doc as @derekbit mentioned. We need a KB doc.

npawelek · 2024-01-07T17:15:04Z

I stumbled across this as I'm attempting to debug this issue on my own k3s cluster. I'm running into this mount.nfs4: mount(2): Operation not permitted issue attempting to mount the NFS backup target from the longhorn manager container. This appears to be working without issue on Ubuntu 22.04 with K8s (deployed via kubeadm), but not Debian 12 with K3s. I'm deploying a migration cluster and can't mount the backupstore, which is a bit frustrating. Apparmor is disabled, so to not add any complications. From the underlying host, I can mount fine, but not within the longhorn-manager container. Longhorn version is 1.5.3 and it's installed via Helm.

k exec -it -n longhorn-system longhorn-manager-m7ph7 -- sh
sh-4.4# showmount -e 192.168.0.151
Export list for 192.168.0.151:
/volume1/LonghornBackupstore 10.32.0.0/12,192.168.0.34,192.168.0.33,192.168.0.32,192.168.0.31

sh-4.4# ip a sh eth0
127: eth0@if128: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:ae:93:a9:e1:96 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.32.2.86/32 scope global eth0
       valid_lft forever preferred_lft forever

sh-4.4# mount -v -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 192.168.0.151:/volume1/LonghornBackupstore /var/lib/longhorn-backupstore-mounts/192_168_0_151/volume1/LonghornBackupstore
mount.nfs4: timeout set for Sun Jan  7 17:07:16 2024
mount.nfs4: trying text-based options 'nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2,addr=192.168.0.151,clientaddr=10.32.2.86'
mount.nfs4: mount(2): Operation not permitted
mount.nfs4: Operation not permitted for 192.168.0.151:/volume1/LonghornBackupstore on /var/lib/longhorn-backupstore-mounts/192_168_0_151/volume1/LonghornBackupstore

Nothing is logged in dmesg. I'm also using the most updated kernel in Debian 12 stable (6.1.0-17-amd64). CNI plugin is also Cilium 1.14.5, and I can see that traffic from the longhorn-manager pod has a source port < 1024.

k exec -n kube-system cilium-jxfqt -- cilium monitor -n | grep 192.168.0.151
...
Policy verdict log: flow 0x614bba26 local EP ID 1360, remote ID 16777263, proto 6, egress, action allow, auth: disabled, match L3-Only, 10.32.2.86:842 -> 192.168.0.151:2049 tcp SYN
-> network flow 0x614bba26 , identity 67641->16777263 state new ifindex eth0 orig-ip 0.0.0.0: 10.32.2.86:842 -> 192.168.0.151:2049 tcp SYN
-> endpoint 1360 flow 0x0 , identity 16777263->67641 state reply ifindex lxcd72247c18a9c orig-ip 192.168.0.151: 192.168.0.151:2049 -> 10.32.2.86:842 tcp SYN, ACK
-> network flow 0x614bba26 , identity 67641->16777263 state established ifindex eth0 orig-ip 0.0.0.0: 10.32.2.86:842 -> 192.168.0.151:2049 tcp ACK
...

Here's my environment details and I will submit a support bundle as well.

Longhorn version: 1.5.3
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm / Flux
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: K3s v1.29.0+k3s1
Number of management node in the cluster: 3
Number of worker node in the cluster: 1
Node config
OS type and version: Debian 12 (stable kernel 6.1.0-17-amd64; apparmor disabled)
CPU per node: 2
Memory per node: 32GB
Disk type(e.g. SSD/NVMe): NVMe
Network bandwidth between the nodes: 1Gbe
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal

ozid · 2024-04-29T21:58:54Z

@npawelek maybe you could try to tcpdump at the host level just to be 100% sure there is no nat applied once the traffic goes out of the CNI.

adampetrovic added the kind/bug label Jun 13, 2023

derekbit added this to the v1.7.0 milestone Nov 13, 2023

derekbit added investigation-needed Need to identify the case before estimating and starting the development component/longhorn-share-manager Longhorn share manager (control plane for NFS server, RWX) labels Nov 13, 2023

derekbit added priority/0 Must be fixed in this release (managed by PO) area/backup-store Remote backup store related and removed component/longhorn-share-manager Longhorn share manager (control plane for NFS server, RWX) labels Nov 13, 2023

derekbit added require/knowledge-base Require adding knowledge base document area/environment-issue User-specific related issues, ex: network, DNS, host packages, etc. and removed investigation-needed Need to identify the case before estimating and starting the development labels Dec 8, 2023

derekbit modified the milestones: v1.7.0, v1.6.0 Dec 8, 2023

innobead added the investigation-needed Need to identify the case before estimating and starting the development label Dec 13, 2023

innobead assigned mantissahz and james-munson and unassigned mantissahz Dec 13, 2023

innobead removed this from the v1.6.0 milestone Jan 2, 2024

innobead added this to the v1.7.0 milestone Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Backup NFS - Operation not permitted during mount #6114

[BUG] Backup NFS - Operation not permitted during mount #6114

adampetrovic commented Jun 13, 2023 •

edited

derekbit commented Jun 14, 2023

ozid commented Nov 11, 2023 •

edited

DanielG0721 commented Nov 11, 2023

derekbit commented Nov 12, 2023

ozid commented Nov 12, 2023

ozid commented Nov 12, 2023 •

edited

derekbit commented Nov 13, 2023 •

edited

derekbit commented Nov 13, 2023

ozid commented Nov 13, 2023 •

edited

derekbit commented Nov 13, 2023

dotdiego commented Dec 8, 2023

ozid commented Dec 8, 2023 •

edited

innobead commented Dec 18, 2023

npawelek commented Jan 7, 2024 •

edited

ozid commented Apr 29, 2024

[BUG] Backup NFS - Operation not permitted during mount #6114

[BUG] Backup NFS - Operation not permitted during mount #6114

Comments

adampetrovic commented Jun 13, 2023 • edited

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Log or Support bundle

Environment

Additional context

derekbit commented Jun 14, 2023

ozid commented Nov 11, 2023 • edited

DanielG0721 commented Nov 11, 2023

derekbit commented Nov 12, 2023

ozid commented Nov 12, 2023

ozid commented Nov 12, 2023 • edited

derekbit commented Nov 13, 2023 • edited

derekbit commented Nov 13, 2023

ozid commented Nov 13, 2023 • edited

derekbit commented Nov 13, 2023

dotdiego commented Dec 8, 2023

ozid commented Dec 8, 2023 • edited

innobead commented Dec 18, 2023

npawelek commented Jan 7, 2024 • edited

ozid commented Apr 29, 2024

adampetrovic commented Jun 13, 2023 •

edited

ozid commented Nov 11, 2023 •

edited

ozid commented Nov 12, 2023 •

edited

derekbit commented Nov 13, 2023 •

edited

ozid commented Nov 13, 2023 •

edited

ozid commented Dec 8, 2023 •

edited

npawelek commented Jan 7, 2024 •

edited