Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NFS and large reads on older kernels #12516

Merged
merged 1 commit into from Sep 20, 2021

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented Aug 28, 2021

Motivation and Context

As #12509 pointed out, currently we hang forever on NFS servers with older kernels using [rw]size > 64k mounts and reads larger than that.

You can demonstrate this by mounting -o rsize=1048576 and then doing dd if=/nfsmount/bigfile of=/dev/null bs=1M iflag=direct on an affected kernel.

Description

Pretty simple. Just don't return EFAULT (which turns into EAGAIN on the way out) if we got some bytes back.

(An alternate fix would be to do the same logic just after the call returns in zfs_read(), to parallel a similar check in zfs_write() - I couldn't decide which one I preferred, and only noticed this was present when checking if zfs_write() did similar things after testing, so I went with the one I ran tests on already.)

How Has This Been Tested?

Could reproduce the problem from #12509 reliably before and never after on 4.9.0-17-amd64 (Debian 9). (Still behaves unchanged on 4.19.0-17-amd64 on Debian 10, too.)

Passed -r sanity with no complaints on both Debian 9 and 10.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf
Copy link
Contributor

Pretty simple. Just don't return EFAULT (which turns into EAGAIN on the way out) if we got some bytes back.

Where does this get converted to EAGAIN? I didn't see anything on the ZFS side which would do this so presumably it happens somewhere in the VFS? Or maybe the NFS code?

An alternate fix would be to do the same logic just after the call returns in zfs_read()

zfs_read() seems like the right place to handle this since that's where the rest of the logic for partial reads is kept. As long as we've made some process, zfs_uio_resid(uio) != start_resid, that's a success from zfs_read()s perspective.

Sorry about needing to retest!

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 30, 2021
@rincebrain
Copy link
Contributor Author

Pretty simple. Just don't return EFAULT (which turns into EAGAIN on the way out) if we got some bytes back.

Where does this get converted to EAGAIN? I didn't see anything on the ZFS side which would do this so presumably it happens somewhere in the VFS? Or maybe the NFS code?

According to (based on git blame) you, generic_file_splice_read does it:
https://github.com/openzfs/zfs/blame/master/module/os/linux/zfs/zfs_uio.c#L177-L182

An alternate fix would be to do the same logic just after the call returns in zfs_read()

zfs_read() seems like the right place to handle this since that's where the rest of the logic for partial reads is kept. As long as we've made some process, zfs_uio_resid(uio) != start_resid, that's a success from zfs_read()s perspective.

Yeah, that seems reasonable to me. I'll go fire it up.

Sorry about needing to retest!

No worries, getting the fix in the right place is the important thing. :)

@behlendorf
Copy link
Contributor

Ahh, so it does but only for ->splice_read(). I'm glad I left a note!

Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Fixes: openzfs#12509

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 20, 2021
@behlendorf behlendorf merged commit 59eab10 into openzfs:master Sep 20, 2021
rincebrain added a commit to rincebrain/zfs that referenced this pull request Sep 22, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
rincebrain added a commit to rincebrain/zfs that referenced this pull request Nov 5, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 6, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 13, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 13, 2021
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370 
Closes openzfs#12509
Closes openzfs#12516
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 1, 2022
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12370
Closes openzfs#12509
Closes openzfs#12516
behlendorf pushed a commit that referenced this pull request Aug 2, 2022
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370
Closes #12509
Closes #12516
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants