Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: Windows filenames with unpaired surrogates are not handled correctly #32334

Open
hundt opened this issue May 30, 2019 · 3 comments

Comments

@hundt
Copy link

commented May 30, 2019

There is a well-known issue with Windows/NTFS (see rust-lang/rust#12056 and https://lwn.net/Articles/684181/) where filenames are treated as UTF-16 but are allowed to contain unpaired surrogates. But syscall_windows.go assumes that the input and output to the Windows syscalls is valid UTF-16. This breaks some of the high-level APIs; for example, File.Readdir on a directory containing files with unpaired surrogates in the names will return FileInfo results with incorrect names (valid filenames but referring to different or nonexistent files). A demonstration is included below.

I'm not sure what a reasonable solution would be. I guess essentially something like WTF-8 where the strings that come back from these syscalls on Windows are generally valid UTF-8 but might not be?

I'm not a Windows developer so I'm not sure how often this issue comes up in real life, but I happened to notice it so I thought I'd flag it in case anyone finds it worth taking action, or so people can find this documentation of the issue if they encounter it.

What version of Go are you using (go version)?

$ go version
go version go1.12.5 windows/386

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=386
set GOBIN=
set GOCACHE=C:\Users\IEUser\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=386
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\IEUser\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_386
set GCCGO=gccgo
set GO386=sse2
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m32 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessag
e-length=0 -fdebug-prefix-map=C:\Users\IEUser\AppData\Local\Temp\go-build4397225
02=/tmp/go-build -gno-record-gcc-switches

What did you do?

Created a file named <unpaired surrogate>.txt ([]uint16=[0xdcc0 0x2e 0x74 0x78 0x74]) and attempt to read it by calling ioutil.ReadDir and reading all the files that come back.

Code snippet
package main

import "log"
import "syscall"
import "io/ioutil"
import "os"

// Create a file at c:\test\files\<w>.txt
func write(w uint16, content []byte) {
	pathb := []byte("c:\\test\\files\\")
	path := []uint16{}
	for _, c := range pathb {
		path = append(path, uint16(c))
	}
	fileName := []uint16{w, '.', 't', 'x', 't'}
	log.Printf("  Converted filename: %#x", fileName)
	path = append(path, fileName...)
	pathp := &path[0]
	access := uint32(syscall.GENERIC_READ | syscall.GENERIC_WRITE)
	sharemode := uint32(syscall.FILE_SHARE_READ | syscall.FILE_SHARE_WRITE)
	createmode := uint32(syscall.CREATE_ALWAYS)
	h, e := syscall.CreateFile(pathp, access, sharemode, nil, createmode, syscall.FILE_ATTRIBUTE_NORMAL, 0)
	if e != nil {
		log.Printf("  CreateFile failed: %s", e)
	} else {
		n, e := syscall.Write(h, content)
		if n != len(content) || e != nil {
			log.Printf("  Write failed: n=%d e=%s", n, e)
		}
	}
	syscall.Close(h)
}

func main() {
	testCases := []uint16{'a', 0xdc00}

	for _, w := range testCases {
		b := []byte{byte(w)}
		if w > 0xff {
			b = []byte{byte(w >> 8), byte(w)}
		}
		fileName := string(b) + ".txt"
		log.Printf("Writing %#x to %#x", b, fileName)
		write(w, b)
	}

	log.Printf("Reading from directory...")
	files, err := ioutil.ReadDir("c:\\test\\files\\")
	if err != nil {
		log.Fatal(err)
	}

	for _, file := range files {
		filePath := "c:\\test\\files\\" + file.Name()
		log.Printf("  Filename string: %#x", []byte(file.Name()))
		f, err := os.Open(filePath)
		if err != nil {
			log.Printf("    Error opening: %s", err)
		} else {
			buf, err := ioutil.ReadAll(f)
			if err != nil {
				log.Printf("    error reading: %s", err)
			} else {
				log.Printf("    successfully opened; content=%#x", buf)
			}
			f.Close()
		}
	}
}

What did you expect to see?

The code successfully opens the file.

What did you see instead?

The code attempts to open a file with name [0xfffd 0x2e 0x74 0x78 0x74] instead.

With the test cases above it fails with an error:

c:\test>go run files.go
2019/05/30 11:14:14 Writing 0x61 to 0x612e747874
2019/05/30 11:14:14   Converted filename: [0x61 0x2e 0x74 0x78 0x74]
2019/05/30 11:14:14 Writing 0xdc00 to 0xdc002e747874
2019/05/30 11:14:14   Converted filename: [0xdc00 0x2e 0x74 0x78 0x74]
2019/05/30 11:14:14 Reading from directory...
2019/05/30 11:14:14   Filename string: 0x612e747874
2019/05/30 11:14:14     successfully opened; content=0x61
2019/05/30 11:14:14   Filename string: 0xefbfbd2e747874
2019/05/30 11:14:14     Error opening: open c:\test\files\?.txt: The system cannot find the file specified.

If you add 0xfffd to testCases (to create the "replacement character" file that it is looking for) it will actually open that same file twice:

c:\test>go run files.go
2019/05/30 11:16:21 Writing 0x61 to 0x612e747874
2019/05/30 11:16:21   Converted filename: [0x61 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Writing 0xdc00 to 0xdc002e747874
2019/05/30 11:16:21   Converted filename: [0xdc00 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Writing 0xfffd to 0xfffd2e747874
2019/05/30 11:16:21   Converted filename: [0xfffd 0x2e 0x74 0x78 0x74]
2019/05/30 11:16:21 Reading from directory...
2019/05/30 11:16:21   Filename string: 0x612e747874
2019/05/30 11:16:21     successfully opened; content=0x61
2019/05/30 11:16:21   Filename string: 0xefbfbd2e747874
2019/05/30 11:16:21     successfully opened; content=0xfffd
2019/05/30 11:16:21   Filename string: 0xefbfbd2e747874
2019/05/30 11:16:21     successfully opened; content=0xfffd

@mdempsky mdempsky added the OS-Windows label May 30, 2019

@mdempsky mdempsky changed the title Windows filenames with unpaired surrogates are not handled correctly syscall: Windows filenames with unpaired surrogates are not handled correctly May 30, 2019

@mdempsky

This comment has been minimized.

Copy link
Member

commented May 30, 2019

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Jun 2, 2019

@hundt thank you very much for creating this issue.

I can reproduce your problem here on my Windows 10. I also tried opening file with 'corrupted' file name with notepead++, and notepead++ can read and write the file. Mind you notepead++ uses standard system 'Open' dialogue to get the filename.

So, I agree. Go should, probably, deal with these file names somehow.

Unfortunately I don't have time to deal with this issue. So leaving for others.

Alex

@hundt

This comment has been minimized.

Copy link
Author

commented Jul 30, 2019

I lack the ability to tag issues, but I wonder if this issue has security implications. I am thinking of things like:

  • a server validates a username by reading a directory listing, and an invalid username containing the unicode replacement character is accepted because of the presence of a valid username with an unpaired surrogate in its place
  • a server writes a file to some location based on username, and multiple different usernames with unpaired surrogates end up overwriting or gaining access to each other's data

I am not a security expert so maybe these scenarios are far-fetched or not that important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.