Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadDirPlus, "This shouldn't happen" is happening (input.Offset > len(d.stream)) #297

Closed
zarmin opened this issue May 7, 2019 · 13 comments

Comments

@zarmin
Copy link

zarmin commented May 7, 2019

I've caught this error:

if input.Offset > uint64(len(d.stream)) {
// This shouldn't happen, but let's not crash.
return fuse.EINVAL
}

Setup:

  • NFS export a top on a fuse
  • Debian, 4.9.144 (both client + server)
  • Using the last commit (link above) from the repo.

It's happening in a few times in a minute during a higher I/O load. I'm doing a stress test with npm install commands, because it is operating on a lot of small files and directories parallel.

It seems to me an off-by-one issue, here is some logs:

Tuesday, 07-May-19 05:11:03 UTC  ReadDirPlus, input.Offset > len(d.stream): 26 > 25
Tuesday, 07-May-19 05:11:03 UTC  ReadDirPlus, input.Offset > len(d.stream): 25 > 24
Tuesday, 07-May-19 05:12:22 UTC  ReadDirPlus, input.Offset > len(d.stream): 23 > 22
Tuesday, 07-May-19 05:13:54 UTC  ReadDirPlus, input.Offset > len(d.stream): 24 > 23
Tuesday, 07-May-19 05:14:53 UTC  ReadDirPlus, input.Offset > len(d.stream): 22 > 21
Tuesday, 07-May-19 05:14:53 UTC  ReadDirPlus, input.Offset > len(d.stream): 22 > 21
Tuesday, 07-May-19 05:15:21 UTC  ReadDirPlus, input.Offset > len(d.stream): 23 > 22
Tuesday, 07-May-19 05:15:27 UTC  ReadDirPlus, input.Offset > len(d.stream): 21 > 20
Tuesday, 07-May-19 05:15:27 UTC  ReadDirPlus, input.Offset > len(d.stream): 21 > 20

I'm getting EINVAL on the NFS client side in these situations.

It's completely reproducible in my test environment.

@zarmin
Copy link
Author

zarmin commented May 7, 2019

I'm also attaching an NFS network packet dump about the error.

nfs4-error-cut2.pcapng.zip

It's quite strange, why the NFS client is calling an another READDIR, the first one had lower number of the results than the request's dircount / maxcount values.

Also interesting why the cookie value (which is the offset - https://github.com/torvalds/linux/blob/master/fs/nfsd/nfs3proc.c#L526) is skipping values (in first readdir response, the last 2 files - rc and readable-stream - has cookie values of 18 and 21), so the next READDIR will request the next cookie value, which is 22.

Maybe it's an NFS feature (bug?), which need to be handled in the fuse lib, by returning an empty directory list in this case...

@hanwen
Copy link
Owner

hanwen commented May 7, 2019

i am not so familiar with NFS export. Is this a user process using syscalls, or is this exported directly from the kernel?

@hanwen
Copy link
Owner

hanwen commented May 7, 2019

does it reproduce if you use debug? that would be easier to read for me than pcap (which I have never used before.)

@zarmin
Copy link
Author

zarmin commented May 7, 2019

In NFS everything is in kernelspace, so it's exported directly from the kernel.
This packet dump is the only reliable way to dump what NFSd is doing, because you can't strace or use any debug tool which is useful in a userspace application.

I don't know how this helps you, but I've filtered the 2GB log file by 'Fh'. As I mentioned this is only happening under a high I/O load, which needs some operation.

2019/05/07 09:19:36 rx 2521985: OPEN i44901 {O_RDONLY,0x8000}
2019/05/07 09:19:36 tx 2521985:     OK, {Fh 4037 }
2019/05/07 09:19:51 rx 2804902: RELEASE i44901 {Fh 4037 0x8000  L0}
2019/05/07 09:19:51 tx 2804902:     OK
2019/05/07 09:19:51 rx 2804904: OPENDIR i19766
2019/05/07 09:19:51 tx 2804904:     OK, {Fh 4037 }
2019/05/07 09:19:51 rx 2804914: READDIRPLUS i19766 {Fh 4037 [23 +4096)  L 0 RDONLY,0x8000}
2019/05/07 09:19:51 tx 2804914:     22=invalid argument
2019/05/07 09:19:51 rx 2805076: RELEASEDIR i19766 {Fh 4037 0x8000  L0}
2019/05/07 09:19:51 tx 2805076:     OK

The 22=invalid argument is coming from the mentioned part of the code.
I've forked your codebase and put a logging code into the 'This shouldn't happen' part, to be absolutely sure the error is happening there.

@rfjakob
Copy link
Contributor

rfjakob commented May 7, 2019

Interesting. This looks like the nfs client is trying to find out if there are more entries available (but why only under load?).

If you edit the code to return ENOENT instead of EINVAL, does that fix the problem on the nfs client side?

@rfjakob
Copy link
Contributor

rfjakob commented May 7, 2019

Scratch ENOENT, we should probably return zero bytes like read() beyond end of file.

@zarmin
Copy link
Author

zarmin commented May 7, 2019

In this case ENOENT means the directory itself does not exists, and it's not working.
If I just return with fuse.OK, then the error disappears, because I'm just returning an empty directory list.

My stress test is running the same task in parallel threads.
I've found out this error is always happening with the same directory.... so i'm digging further...

@navytux
Copy link
Contributor

navytux commented May 7, 2019

This packet dump is the only reliable way to dump what NFSd is doing, because you can't strace or use any debug tool which is useful in a userspace application.

@zarmin, for the reference you can use bpftrace to investigate what is going on inside the kernel. I would start with something like the following to obtain complete kernel stack that triggers EINVAL readdir go-fuse return:

#!/usr/bin/env -S bpftrace

#include <linux/errno.h>

kprobe:fuse_readdir {
        @readdir[tid] = kstack;
}

kretprobe:fuse_readdir {
        // XXX kstack is not working in kretprobe https://github.com/iovisor/bpftrace/issues/101
        if (retval == EINVAL) {
                printf("fuse.readdir -> %d: %s\n", retval, @readdir[tid]);
        }
        delete(@readdir[tid]);
}

Having that working one can add more debug printing which usually helps to understand the cause of the issue.

@navytux
Copy link
Contributor

navytux commented May 7, 2019

By the way: the fact that the issue happens only under load suggests that it is maybe related to kernel dentires being evicted and then further loaded.

Also:

Debian, 4.9.144 (both client + server)

4.9.144 is not the latest kernel in linux-4.9.y series (latest as of today is 4.9.173). Looking at the v4.9.144..v4.9.173 changes I see there are both some fuse and nfsd patches:

kirr@deco:~/src/linux/linux$ git log --oneline v4.9.144..v4.9.173 -- fs/fuse/ fs/nfs*
986fa92bc3a0 NFS: Fix a typo in nfs_init_timeout_values()
22aed03ea558 NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
28154b865fad nfsd: Don't release the callback slot unless it was actually held
22a8711fa5aa NFSv4.1 don't free interrupted slot on open
33c164d59f3c nfsd: fix wrong check in write_v4_end_grace()
e4ea22f928da nfsd: fix memory corruption caused by readdir
7ed608268e69 NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
18ae814610c5 NFS: Fix an I/O request leakage in nfs_do_recoalesce
0da4596d6756 NFS: Fix I/O request leakages
6f976ecaf133 nfs: Fix NULL pointer dereference of dev_name
877362fd15f0 nfsd4: catch some false session retries
f92c45b79824 nfsd4: fix cached replies to solo SEQUENCE compounds
64702dea0600 fuse: handle zero sized retrieve correctly
4f3d69898e1a fuse: decrement NR_WRITEBACK_TEMP on the right page
50449aafb818 fuse: call pipe_buf_release() under pipe lock
a37706aca54c NFS: nfs_compare_mount_options always compare auth flavors.
60846115bbed nfsd4: fix crash on writing v4_end_grace before nfsd startup
de42cd219802 nfs: don't dirty kernel pages read by direct-io

It is likely that generic in-kernel VFS fixes (not shown above) could be also related.

@zarmin
Copy link
Author

zarmin commented Jun 4, 2019

Thanks for all of the answers, I'm back on this topic.

I tried a bunch of things, I've involved the nfs developers too in this thread...
https://marc.info/?l=linux-nfs&m=155966546212349&w=2

@zarmin
Copy link
Author

zarmin commented Jun 5, 2019

By the answer of J. Bruce Fields, "Off the top of my head that client behaviour sounds weird and suboptimal, but also harmless.". So I think the good method is to return fuse.OK instead of fuse.EINVAL.

@zarmin
Copy link
Author

zarmin commented Jun 18, 2019

By this simple code, https://gist.github.com/zarmin/68e546e53a3a38e6612cc074befd477b

This reproduces the issue via NFS, and it is caused by the folder list is changed during the listing, because of a delete.
So returning fuse.EINVAL is wrong and it should return a fuse.OK with an empty dirlist in this cases.

@hanwen
Copy link
Owner

hanwen commented Jul 3, 2019

fixed in 19887fb

sorry for the delay. Things have been busy at work.

@hanwen hanwen closed this as completed Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants