Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use NFS shares with Zpool #12509

Closed
MetallicTsun opened this issue Aug 24, 2021 · 4 comments
Closed

Unable to use NFS shares with Zpool #12509

MetallicTsun opened this issue Aug 24, 2021 · 4 comments

Comments

@MetallicTsun
Copy link

System information

Type Version/Name
Distribution Name Debian Bullseye
Distribution Version 11.0
Kernel Version 4.9.0-12-amd64
Architecture x86_64
OpenZFS Version zfs-2.1.0-1

Describe the problem you're observing

I have a single raidz2 pool that after upgrading to the latest version no longer with properly utilize NFS shares. This affects nfs-server and sharenfs= in zfs. the issue is when i share a folder or data set with NFS i cannot copy or move files FROM the share. i am however able to read and write to the share 'normally' although its slow.
if i copy a file from the share the process will basically hand and create a zero byte file and never complete the operation and no data is moved. this does not occur on zfs version 0.7 that is within debians repos. This also affects newly created pools.
i did find the below discussion which sounded like my issue

#12370

here is output from zfs get all mediaStorage/Files
https://pastebin.com/NffcaeUs

`### Describe how to reproduce the problem
Compile ZFS 2.1.0-1 using these scripts:
https://github.com/kneutron/ansitest/blob/master/debian-compile-zfs--boojum.sh
https://github.com/kneutron/ansitest/blob/master/ubuntu_zfs_build_install.sh
on a debian machine. create a pool and share it with nfs-server or sharenfs=. mount the server on a network machine or locally via 127.0.0.1. and try and copy a file from the share to the local machine

@rincebrain
Copy link
Contributor

Right, I was looking at that, then got sidetracked by something catastrophic.

I guess I'll go back to looking at it...

Aside: I wouldn't call Debian 11 with a Debian 9 kernel Debian 11...

@MetallicTsun
Copy link
Author

Aside: I wouldn't call Debian 11 with a Debian 9 kernel Debian 11...

i would have to agree but due to driver issues with my host bus adapter that interfaces with the disks in this pool i am unable to upgrade to kernels above the installed one. i havent been able to diagnose as it cripples my zfs pool

@rincebrain
Copy link
Contributor

rincebrain commented Aug 25, 2021

Hey, check this out.

$ dd if=/mntdumb/zfs_vanilla/config.log of=/tmp/blackhole/zfs_vanilla/config.log bs=131072 iflag=direct
1+1 records in
1+1 records out
160105 bytes (160 kB, 156 KiB) copied, 0.0006609 s, 242 MB/s
$ dd if=/mntdumb/zfs_vanilla/config.log of=/tmp/blackhole/zfs_vanilla/config.log bs=131073 iflag=direct
[wait 30s]
^C
[requires kicking the mount with umount -f to get it to come back...]
dd: error reading '/mntdumb/zfs_vanilla/config.log': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 1.5938 s, 0.0 kB/s
$

So apparently a workaround is {r,w}size=131072, for now.

Because if I mount with those (I was mounted with =256k before)...

$ dd if=/mntdumb/zfs_vanilla/config.log of=/tmp/blackhole/zfs_vanilla/config.log bs=131073 iflag=direct
1+1 records in
1+1 records out
160105 bytes (160 kB, 156 KiB) copied, 0.00061106 s, 262 MB/s
$ dd if=/mntdumb/zfs_vanilla/config.log of=/tmp/blackhole/zfs_vanilla/config.log bs=131072 iflag=direct
1+1 records in
1+1 records out
160105 bytes (160 kB, 156 KiB) copied, 0.000608195 s, 263 MB/s
$

Still digging...

e: 1c2358c seems to be the rotten commit. That's unfortunate...

e2: So let's try to narrow down where it's broken - it works on 5.10.48, broken on 4.9.0-12-amd64, broken on 4.14.100, broken on 4.14.232 - gonna test it on 4.19.0-17-amd64 and then examine the config.logs to see what the differing codepaths involved might be...
e2a: I wonder if 83b91ae was wrong about how long it's broken until...
e3: Yup, it behaves the same on 4.9. 4.19 works fine though.

FYI @behlendorf it seems like 83b91ae is still pretty broken on at least 4.14 and 4.9. (In particular, zfs_uiomove_iter is spitting back EFAULT on what I presume to be the second iteration, since it always appears to follow the pattern of "one call succeeds, one immediately following call fails with EFAULT".)

I'm going back to printf debugging after spending far too much time trying and failing to convince systemtap to look at the struct definitions...

@rincebrain
Copy link
Contributor

It seems like what happens when I try doing bs=1M iflag=direct over NFS with [rw]size=1048576, on my Debian 9 and 10 testbeds, is something like:

  • on 4.9.x, we go into dmu_read_uio_dnode initially with size=1M, dmu_buf_hold_array_by_dnode sets numbufs=8 and db->db_size=128k, but our poor uio->uio_iter is an ITER_PIPE, and consists of 16 4k buffers. So we go into zfs_uiomove_iter, read 16x4k and return - but because numbufs=8, we then go through the loop again, and EFAULT out because shockingly, our 4k sized buffers are still full...which generic_file_splice_read turns into EAGAIN. which bubbles up...and then somebody just requests the thing from the start again and the comedy begins anew.
  • On 4.19, this doesn't happen...because dmu_read_uio_dnode (always? at least in my NFS experimenting) gets invoked with size=64k, so numbufs=1 and we never run into this.

If we hardcode size to 64k, then we don't run into this on 4.9 or 4.19. If we hardcode size to 128k, then we run into it on both.

Next experiment is going to be patching all that debug information into the pre-1c2358c tree and see how it functioned, and run this on e.g. 5.10 and see how it looks there, and then go look at how this looks for local accesses. I'm tempted to just suggest not returning EAGAIN if we read any bytes, but don't feel confident that won't have some side effect somewhere...

rincebrain added a commit to rincebrain/zfs that referenced this issue Aug 28, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in
one loop, get EFAULT back from zfs_uiomove() (because the iovec
only holds 64k), and return EFAULT, which turns into EAGAIN on the
way out. EAGAIN gets interpreted as "I didn't read anything", the
caller tries again without consuming the 64k we already read, and we're
stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Fixes: openzfs#12509

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Aug 30, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Fixes: openzfs#12509

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Aug 30, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Fixes: openzfs#12509

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Sep 22, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
rincebrain added a commit to rincebrain/zfs that referenced this issue Nov 5, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 6, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 1, 2022
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370
Closes openzfs#12509
Closes openzfs#12516
behlendorf pushed a commit that referenced this issue Aug 2, 2022
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370
Closes #12509
Closes #12516
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants