Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: small-count Readdirnames followed by seek-to-zero can lead to duplicate names #37161

Open
chris3torek opened this issue Feb 10, 2020 · 4 comments
Assignees
Labels
Milestone

Comments

@chris3torek
Copy link

@chris3torek chris3torek commented Feb 10, 2020

What version of Go are you using (go version)?

$ go version
go version go1.11.2 linux/amd64
go version go1.13.5 freebsd/amd64

Does this issue reproduce with the latest release?

I think so, since the code has not changed. (It's kind of a minor bug too...)

What operating system and processor architecture are you using (go env)?

(I'll just include the freebsd one here - as long as the calls go through the file_unix.go code the OS is not particularly relevant)

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/torek/.cache/go-build"
GOENV="/home/torek/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GONOPROXY=""
GONOSUMDB=""
GOOS="freebsd"
GOPATH="/home/torek/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build370595268=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This has to be run where there is a file system with some direntries (and probably it's specific to Unix-like systems with getdirentries calls). It does reproduce even on the Go Playground though.

package main

import (
	"fmt"
	"log"
	"os"
)

func main() {
	df, err := os.Open(".")
	if err != nil {
		log.Fatal(err)
	}
	names, err := df.Readdirnames(1)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("first Readdirnames(1):", names)
	if _, err = df.Seek(0, 0); err != nil {
		log.Fatal(err)
	}
	names, err = df.Readdirnames(0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("second Readdirnames:", names)
}

What did you expect to see?

Output in which file names don't repeat, in the second output line.

What did you see instead?

Output in which file names do repeat. Here's what happens in the Playground:

first Readdirnames(1): [dev]
second Readdirnames: [tmp etc usr dev tmp etc usr]

The root cause

The actual cause of the bug is pretty straightforward. If you use Readdirnames to the end of the directory, the block holding getdirentries data is fully consumed, but if you call it with a small count (such as the 1 in the example above), it's not. So in os/dir_unix.go, the dirinfo data has d.bufp < d.nbuf.

The Seek function has a test in it:

    r, e := f.seek(offset, whence)
    if e == nil && f.dirinfo != nil && r != 0 {
            e = syscall.EISDIR
    }

which makes sure that the only allowable seek on a directory is to zero, but this never clears out the dirinfo. So the buffered entries remain, and we get them on the next call to Readdirnames(-1)—on the Playground this is tmp etc and usr. Then the buffer is drained so we call the underlying getdirentries system call again, filling the buffer with dev, tmp, etc, and usr, and we retrieve them, resulting in the doubled-up entries.

One proposed fix would be to move f.dirinfo checking from os/file.go's Seek to the next level down. If seeking on a directory, check the offset-and-whence there (perhaps seeking directories to nonzero locations and/or to the end should be allowed on some systems; this could also allow relative seek by 0 bytes to return an offset, if desirable) and zap any buffered data there.

@ianlancetaylor ianlancetaylor changed the title Small-count Readdirnames followed by seek-to-zero can lead to duplicate names os: small-count Readdirnames followed by seek-to-zero can lead to duplicate names Feb 11, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Feb 11, 2020
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Feb 11, 2020

Thank you for reporting this issue @chris3torek and welcome to the Go project!

So this looks quite similar to bug #35767 that involved cached results on Darwin and that @randall77 fixed with CL https://go-review.googlesource.com/c/go/+/209961 for Go1.14.
Your bug reproduces on my Mac for Go1.13 and below but on Go1.14rc* I don't get repeated entries.

Thus:
a) Could you please try Go1.14rc1 https://golang.org/dl/#unstable so

go get golang.org/dl/go1.14rc1 && go1.14rc1 run main.go

b) Kindly putting this on @randall77's radar as perhaps we might need to implement the previous polyfill for

func (f *File) seekInvalidate()

on non-GOOS=darwin and non-GOOS=js systems.

Thank you.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Feb 12, 2020

Yes, looks like we need a similar fix to CL 209961 for non-darwin unix.
CL coming.
It's been broken since at least 1.11, so this will wait until 1.15.

@randall77 randall77 self-assigned this Feb 12, 2020
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 12, 2020

Change https://golang.org/cl/219143 mentions this issue: os: seek should invalidate any cached directory reads

@chris3torek

This comment has been minimized.

Copy link
Author

@chris3torek chris3torek commented Feb 12, 2020

I won't have a chance to try any of this for a few days at least, but the current fix looks fine and I'm certain it will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.