Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/fs,os: fs.ReadDir with an os.DirFS can produce invalid paths #44166

Closed
bcmills opened this issue Feb 8, 2021 · 16 comments
Closed

io/fs,os: fs.ReadDir with an os.DirFS can produce invalid paths #44166

bcmills opened this issue Feb 8, 2021 · 16 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 8, 2021

What version of Go are you using (go version)?

$ go version
go version devel +c7494cf477 Fri Feb 5 10:32:19 2021 -0500 linux/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/bcmills/.cache/go-build"
GOENV="/usr/local/google/home/bcmills/.config/go/env"
GOEXE=""
GOFLAGS="-benchtime=1x"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/google/home/bcmills/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/bcmills"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/google/home/bcmills/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/bcmills/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel +c7494cf477 Fri Feb 5 10:32:19 2021 -0500"
GCCGO="/usr/local/google/home/bcmills/bin/gccgo"
AR="ar"
CC="gcc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/tmp/tmp.K8mIsoRRRV/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3900520840=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran the test in https://play.golang.org/p/HY3-ANaECG-.

package main

import (
	"io/fs"
	"os"
	"path/filepath"
	"testing"
)

func TestDirFSPathsValid(t *testing.T) {
	if filepath.Separator == '\\' {
		t.Skipf(`Can't create filename containing a backslash because path.Separator is '\'.`)
	}

	d := t.TempDir()
	if err := os.WriteFile(filepath.Join(d, "control.txt"), []byte(string("Hello, world!")), 0644); err != nil {
		t.Fatal(err)
	}
	if err := os.WriteFile(filepath.Join(d, `experi\ment.txt`), []byte(string("Hello, backslash!")), 0644); err != nil {
		t.Fatal(err)
	}

	fsys := os.DirFS(d)
	err := fs.WalkDir(fsys, ".", func(path string, e fs.DirEntry, err error) error {
		if fs.ValidPath(e.Name()) {
			t.Logf("%q ok", e.Name())
		} else {
			t.Errorf("%q INVALID", e.Name())
		}
		return nil
	})
	if err != nil {
		t.Fatal(err)
	}
}

What did you expect to see?

All paths returned by fs.ReadDir (and, by extension, fs.WalkDIr) satisfy fs.ValidPath.

What did you see instead?

--- FAIL: TestDirFSPathsValid (0.00s)
    dirfs_test.go:26: "." ok
    dirfs_test.go:26: "control.txt" ok
    dirfs_test.go:28: "experi\\ment.txt" INVALID
FAIL
@bcmills bcmills added this to the Go1.16 milestone Feb 8, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

Marking as release-blocker pending triage from @rsc.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 8, 2021

It seems like the two options are:

  • leave things as they are, returning a name that can't be opened
  • OR drop that name from the directory listing, pretending it's not there at all

In general I would err on the side of showing that there's an inaccessible file than pretending it doesn't exist at all, but I'd be happy to listen to good arguments for the other.

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

Two more options:

  • relax ValidPath such that it allows backslashes (and report a “file not found” error instead of an “invalid path” error for names containing backslashes on Windows)
  • define some kind of backslash-escaping for the paths reported by (and accepted by) the os.DirFS implementation of fs.FS.

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

I think the reason for rejecting backslashes in the first place is so that Open (and, possibly, a future fs.Create) can produce more useful error messages for Windows users.

But we can always reject backslashes on Windows in the FS-specific Create implementation — it will have to reject additional characters and reserved names anyway, per the Windows naming conventions.

So that leaves Open. I wonder if there is some way we could tweak the error returned by Open on Windows such that it satisfies os.IsNotExist (or at least errors.Is(err, os.ErrNotExist)?) and also gives a hint about the wrong slash..?

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 8, 2021

The reason to reject backslashes is not quite to cater to Windows users but instead to define a simpler subset of possible names for all implementations to handle. Backslashes are a perennial special case in portable file system implementations, because of Windows, and it simplifies all future implementations to remove them from the required interface entirely.

It seems clear that approximately no one actually uses files named experi\ment.txt in their day-to-day activity on Unix. And since FS has not existed to date, there is certainly no actual usage of that name with existing FS implementations. This is our chance to keep it that way and save all future implementors a specific headache.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 8, 2021

Also, in my experience, if you do have a file named experi\ment.txt, it's almost always a bug. For example one program on my computer insists on creating a directory called /Users/rsc/Library/Application\ Support (with an actual backslash in the name), apparently because someone was confused about the difference between shell syntax escaping and actual file names. There's no Windows involved here at all, yet still there are bugs involving backslash.

If we accept backslash in ValidPath and in os.DirFS and also at some later point allow Create, then those bugs will continue to be created. If we keep the no-backslashes rule, then we prevent them.

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

Hmm. But if, say, someone implements a file browser on top of io/fs in Go, I would rather they be able to show the existence (and contents) of the erroneous Application\ Support directory than not.

If we disallow the path, then being able to successfully traverse that path would require a custom fs.FS implementation and some kind of additional path-escaping scheme, plus the insight that that workaround is needed in the first place.

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

(To be clear: I agree that paths involving backslashes are usually symptoms of other bugs. I just think they almost always occur on the Create side, and I would rather address them there instead of everywhere.)

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Feb 8, 2021

Thinking about this issue in relation to #44092: we're returning an empty list from Glob for arguments that are technically valid but probably bugs (like "/" and "a/").

If we want to raise the bar on diagnosing “things that are probably bugs”, why raise it for Open but not for Glob?

Loading

@seebs
Copy link
Contributor

@seebs seebs commented Feb 8, 2021

If fs is expected to be able to model filesystems, it has to be able to model the paths those filesystems can model, even if some of those paths are bad paths that shouldn't be used. If it's not going to do that, then it's not going to be able to express interactions with those real filesystems.

There's very often strange workarounds and translation layers. I've been told in the past that Windows syscalls use forward slashes, and that the translation to backslashes happens closer to the user interface. I know MacOS's internal representation used to use : for the actual directory separator, requiring fancy workarounds to create files which superficially appeared to contain colons in their names. And once upon a time, I saw an Apple file-sharing protocol (around MacOS 6 or 7) implementation that talked over NFS, and a series of bugs such that a user on a Mac trying to create a file named a/b could in fact cause a directory entry to be created that had a/b as its single directory entry, which could not in any way be opened or interacted with, or even fixed by the standard filesystem tools, but luckily someone had fsdb available and they didn't have to reformat the drive.

What should io/fs do with the path name C:aux.txt? Should it reject this, as it's commonly an error? Should it reject it because at least one major target system will explode horribly if you actually try to use it? What about a terminal $1 or ;1 on a filename?

It seems to me that probably the right answer is for io/fs to handle completely-arbitrary strings, with special cases for slashes and maybe NUL, but for some APIs to have or impose stricter requirements. But in general, if the name can actually exist and raw syscalls on the target platform could talk to it, it should be valid-ish.

Given a plain file x, is x/..namedfork/data a valid path? Should it work with any of these packages? Should it not work with them? The more I think about this the more I am unsure.

Loading

@gazerro
Copy link
Contributor

@gazerro gazerro commented Feb 8, 2021

Investigating this issue I found the issue #44171 that might be interesting for this discussion.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Feb 9, 2021

@seebs You are making the "any model must provide the union of all the features of the possible underlying systems being modeled" argument. But we have avoided that for package os - instead, we provide a simplified model that does not capture everything about all operating systems. The io/fs package aims to do the same.

Loading

@seebs
Copy link
Contributor

@seebs seebs commented Feb 9, 2021

Hmm.

Well, the model can just be target-specific, in which case, valid paths would vary between systems, but that seems like it would be bad. In which case, there's either at least one path which is acceptable to the model but can't be used on a particular target, or at least one path which is unacceptable to the model but can be used on a particular target. Or both, I guess.

I think part of the issue is that the other tools, like Create or Open, can still manipulate these paths, which creates two conflicting and incompatible ways to think about file names. So at least some names valid to Open will be rejected as InvalidFile by io/fs. And at least some names which io/fs considers valid will probably be invalid for Open. That seems bad too...

Loading

@MatejLach
Copy link

@MatejLach MatejLach commented Feb 9, 2021

While I am very sympathetic to the argument of not wanting to model every possible file system behavior and then having to maintain the complexity forever, if someone were to build antivirus software in Go and relied on the standard library for file system access as stdlib is a certain 'guarantee' of quality, presumably it wouldn't be ideal if all it took for malware to hide from a scan is to have a backslash in its name. Certainly not if the headline would be along the lines of 'due to a bug in the Go standard library'.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 9, 2021

Change https://golang.org/cl/290709 mentions this issue: io/fs: allow backslash in ValidPath, reject in os.DirFS.Open

Loading

@Cyberax
Copy link

@Cyberax Cyberax commented Feb 10, 2021

If we accept backslash in ValidPath and in os.DirFS and also at some later point allow Create, then those bugs will continue to be created. If we keep the no-backslashes rule, then we prevent them.

As of now, Go actually can't reliably do Create/Open with Windows filenames that contain invalid UTF-8 sequences (e.g. two UTF-16 surrogates side by side).

Loading

@gopherbot gopherbot closed this in e5b08e6 Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants