New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReadDirPlus, "This shouldn't happen" is happening (input.Offset > len(d.stream)
)
#297
Comments
I'm also attaching an NFS network packet dump about the error. It's quite strange, why the NFS client is calling an another Also interesting why the cookie value (which is the offset - https://github.com/torvalds/linux/blob/master/fs/nfsd/nfs3proc.c#L526) is skipping values (in first readdir response, the last 2 files - Maybe it's an NFS feature (bug?), which need to be handled in the fuse lib, by returning an empty directory list in this case... |
i am not so familiar with NFS export. Is this a user process using syscalls, or is this exported directly from the kernel? |
does it reproduce if you use debug? that would be easier to read for me than pcap (which I have never used before.) |
In NFS everything is in kernelspace, so it's exported directly from the kernel. I don't know how this helps you, but I've filtered the 2GB log file by 'Fh'. As I mentioned this is only happening under a high I/O load, which needs some operation.
The |
Interesting. This looks like the nfs client is trying to find out if there are more entries available (but why only under load?). If you edit the code to return ENOENT instead of EINVAL, does that fix the problem on the nfs client side? |
Scratch ENOENT, we should probably return zero bytes like read() beyond end of file. |
In this case My stress test is running the same task in parallel threads. |
@zarmin, for the reference you can use bpftrace to investigate what is going on inside the kernel. I would start with something like the following to obtain complete kernel stack that triggers EINVAL readdir go-fuse return: #!/usr/bin/env -S bpftrace
#include <linux/errno.h>
kprobe:fuse_readdir {
@readdir[tid] = kstack;
}
kretprobe:fuse_readdir {
// XXX kstack is not working in kretprobe https://github.com/iovisor/bpftrace/issues/101
if (retval == EINVAL) {
printf("fuse.readdir -> %d: %s\n", retval, @readdir[tid]);
}
delete(@readdir[tid]);
} Having that working one can add more debug printing which usually helps to understand the cause of the issue. |
By the way: the fact that the issue happens only under load suggests that it is maybe related to kernel dentires being evicted and then further loaded. Also:
4.9.144 is not the latest kernel in linux-4.9.y series (latest as of today is 4.9.173). Looking at the v4.9.144..v4.9.173 changes I see there are both some fuse and nfsd patches:
It is likely that generic in-kernel VFS fixes (not shown above) could be also related. |
Thanks for all of the answers, I'm back on this topic. I tried a bunch of things, I've involved the nfs developers too in this thread... |
By the answer of J. Bruce Fields, "Off the top of my head that client behaviour sounds weird and suboptimal, but also harmless.". So I think the good method is to return |
By this simple code, https://gist.github.com/zarmin/68e546e53a3a38e6612cc074befd477b
This reproduces the issue via NFS, and it is caused by the folder list is changed during the listing, because of a delete. |
fixed in 19887fb sorry for the delay. Things have been busy at work. |
I've caught this error:
go-fuse/fuse/nodefs/dir.go
Lines 77 to 80 in 509d146
Setup:
It's happening in a few times in a minute during a higher I/O load. I'm doing a stress test with
npm install
commands, because it is operating on a lot of small files and directories parallel.It seems to me an off-by-one issue, here is some logs:
I'm getting
EINVAL
on the NFS client side in these situations.It's completely reproducible in my test environment.
The text was updated successfully, but these errors were encountered: