Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: Readdir swallows partial results if it fails to Lstat any file in the listing #27416

scudette opened this issue Aug 31, 2018 · 2 comments


Copy link

@scudette scudette commented Aug 31, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?


What operating system and processor architecture are you using (go env)?


What did you do?

Test program here

It is hard to reproduce but when using fuse sometimes the mount point becomes disconnected. In this case when you try to do a lstat() of it you get "transport endpoint is not connected" (Google for that to see that it is a common thing). Anyway the issue is really that when a directory contains a disconnected mount point like this, then Golang's readdir variants (os.Open + Readdir and ioutils.ReadDir()) break out of the loop as soon as they can not Lstat one of the files in the directory and this either returns no results in the case of ioutils or randomly less results in the case of os.Open + Readdir.

This is very surprising and leads to weird program failures because an external event (unclean mount point) suddenly causes the entire directory listing to fail in Go programs. The directory is fine for e.g. ls - you can see how /bin/ls shows it:

$ ls -l /
ls: cannot access '/mnt': Transport endpoint is not connected
total 2235132
drwxr-xr-x   2 root root      12288 Aug 24 07:35 bin
drwxr-xr-x   3 root root       4096 Aug 30 06:38 boot
drwxr-xr-x   2 root root       4096 May  3 15:48 lib64
drwx------   2 root root      16384 Sep 26  2017 lost+found
drwxr-xr-x   5 root root       4096 Feb  9  2018 media
d?????????   ? ?    ?             ?            ? mnt
drwxr-xr-x   4 root root       4096 Jun 28 12:41 opt
dr-xr-xr-x 362 root root          0 Jun 27 23:02 proc
-rw-------   1 root root 2147483648 Dec 17  2017 swapfile
dr-xr-xr-x  13 root root          0 Sep  1 01:28 sys
drwxrwxrwt  89 root root     102400 Sep  1 01:51 tmp
drwxr-xr-x  14 root root       4096 Dec 29  2017 usr

The issue is in os/dir_unix.go exiting out of the loop in case of an lstat error.

A more robust implementation is possible by copying out the code from dir_unix.go and modifying it ( so maybe this is a valid workaround but IMHO almost every user of ioutil.ReadDir() expects to get some results back - it is unexpected to just have the entire directory listing fail because someone has put a fuse mount inside it.

Semantically to me at least, when I call ReadDir on a directory (say "/"), then any error that I get back should relate to the directory itself. I was surprised that I was unable to ReadDir("/") when a subdir of "/" was actually un-stat'able. The error does not seem to relate to the thing I was calling the function on.

What did you expect to see?

I expected partial results with all the results that could be stat'ed. Maybe add an empty FileInfo for the bad mount point or omit it.

What did you see instead?

ReadDir() returns an error and no results - weird since I can totally do an "ls -l /" and this works fine.

@FiloSottile FiloSottile changed the title os.Readdir swallows partial results if it fails to Lstat any file in the listing. os: Readdir swallows partial results if it fails to Lstat any file in the listing Aug 31, 2018
@FiloSottile FiloSottile added this to the Unplanned milestone Aug 31, 2018
Copy link

@scudette scudette commented Mar 28, 2020

Just spent 2+ hours debugging a problem around this bug. It should not happen!

In our project we needed to reimplement ReadDir() ourselves to work around this bug

But there was an odd place where ioutils.ReadDir() was used instead of our fixed version.

We should not need to reinvent parts of the standard library - please consider fixing this for good. It is extremely frustrating to spend time chasing bugs in standard libraries - resorting to very careful analysis of strace output to notice that getdents64 returns 82 entries but we only have 52 newfstatat() calls!

This is a high severity bug it should get more attention.

@ianlancetaylor ianlancetaylor removed this from the Unplanned milestone Mar 28, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Mar 28, 2020
Copy link

@ianlancetaylor ianlancetaylor commented Mar 28, 2020

I don't know what the right thing to do is here. What should happen if there is more than one error calling lstat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants