-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: RemoveAll hangs on Windows when the directory tree contains files with non-UTF8-representable filenames #59971
Comments
I think it would make sense for the For backward compatibility, perhaps we could add a GODEBUG controlling this behavior. @TBBle, shall we convert this to a proposal for a change in the Windows |
These seems like the path of less resistance. Yet...
Today it is reasonably safe to asume that a Go string is a well-formed UTF-8. If we change The WTF-8 talks about this issue in wtf-8/#intended-audience:
|
I don't think that is actually true, though. Go's In particular, the |
(2003 is RFC 3629, when encoding unpaired surrogates became invalid UTF-8 as part of shrinking Unicode from 31 bits to 21 bits. I've never checked to see if WTF-8 is just reinventing pre-2003 UTF-8, but I'd be mildly amused if that was the case.) Also, per the documented "For information about UTF-8 strings in Go":
So in Go's case, they apparently intended "
I was thinking more of a new set of functions that explicitly convert to and from WTF-8, and using those where the current UTF-16 functions are used for Windows filenames only. This is also how the Windows API works (since Vista); apart from the filesystem APIs (which take raw uint16 arrays), the Win32 API is in UTF-16 (or a "globally"-specified 8-byte encoding for legacy APIs); so the only way for a non-UTF-16-valid filename to exist is by deliberately build those into a uint16 array. (Or read an existing directory entry) Which is exactly what Cygwin has done, purely for the lolz. In contrast, on Linux setting Anyway, on-topic, if |
Its specification is: // UTF16ToString returns the UTF-8 encoding of the UTF-16 sequence s,
// with a terminating NUL and any bytes after the NUL removed. which to me says that the behavior for a non-UTF-16 sequence is unspecified. So I would argue that changing it to return a WTF-8 string is simply defining behavior that was previously (defined as) a programmer error — that is, the programmer failed to validate the UTF-16 string as required before passing it to Similarly, today // UTF16FromString returns the UTF-16 encoding of the UTF-8 string
// s, with a terminating NUL added. If s contains a NUL byte at any
// location, it returns (nil, EINVAL). which does not seem to define any particular behavior for non-UTF-8 strings. |
The arguments in favor of using WTF-8 in |
Change https://go.dev/cl/493036 mentions this issue: |
I've submitted CL 493036 with a minimal WTF-8 implementation that has been pluged-into the syscall package. It looks good, although I had to copy a bunch of code from the utf8 package. The issue seems to be solved, though. @kevpar @TBBle could you try CL 493036 to see if it fixes your cases? You can download it using |
Good news. I confirmed a dockerd.exe built with Go 1.20.4 reproduces the hang (and unhangs when I delete the two problematic files), and when built with Hang:
No hang:
So the fix works for me. (More details on the test at microsoft/hcsshim#696 (comment) if desired) Is there any chance this fix will migrate backwards to older Go versions? Docker currently builds with Go 1.18.10 and |
I'm not even sure if this will get into go1.21, code freeze is happening in a couple of weeks and this issue haven't got much traction yet. If we backport something, IMO it shouldn't be CL 493036 but something less invasive, like avoiding the infinite recursion in @golang/windows @alexbrainman |
That makes sense. And just avoiding the infinite hang (with presumably some kind of error we can recognise) will give callers something to react to now. That said, perhaps if we want a less-intrusive patch for backporting, having an "unprocessible name" error or something would be nice and explicit; this could perhaps be a round-trip-test immediately after UTF-16 decoding. A "this call got stuck" error might still have other triggers lying around, although I guess the caller would handle it the same way anyway no matter what the cause? (In the case I'm involved with, it's a cleanup that fails after a successful operation, I'm not sure we even check the error). |
Change https://go.dev/cl/498600 mentions this issue: |
Also mention WTF-8 support in the syscall package. For #32558 For #58977 For #59971 Change-Id: Id1627889b5e498add498748d9bfc69fb58030b35 Reviewed-on: https://go-review.googlesource.com/c/go/+/498600 Reviewed-by: Eli Bendersky <eliben@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Probably, I haven't tested it personally.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Called
os.RemoveAll
on a directory tree that contained a file with a name that is valid on Windows NTFS, but not representable in UTF-8.A source of such files is the MSYS/Cygwin implementation of POSIX
unlink
in the case where the file is a running executable, e.g., an executable deleting itself or one of its DLLs, such as a pacman self-upgrade.An example of such a call to
os.RemoveAll
is cleaning up temporary directories created during export of a Windows Containers filesystem layer.TODO: Create a minimal repro. An existing unit test demonstrating this is https://github.com/microsoft/go-winio/pull/261/files#diff-fe0a7b5d46479d166aa0b717fbf33f61139084553170813c8d215edd960af249R117-R118
What did you expect to see?
The directory is deleted (or the call fails, I guess) and the code continues executing.
What did you see instead?
The same hang (infinite loop) as #36375, for roughly the same reason.
There's more details in that ticket, and particularly #36375 (comment) which explains how it applies in this case, and links on to the constellation of tickets where this issue has affected users.
In summary:
os.RemoveAll
seesIsNotExist
when given a specific file, it assumes it was already deleted by some other process.os.RemoveAll
runs an unconditionalfor
loop as long as the directory has contents and no errors were seen while deleting the contents, rescanning the directory on each iteration.os.readdir
on Windows (called viaos.Readdirnames
) incorrectly assumes all filenames are UTF16. Specifically, unpaired surrogates are valid in Windows NTFS filenames, but are folded into the replacement character by this assumption.os.RemoveAll
, when converted back to the OS-level format are hence different from the filenames the OS knows of, and produce theIsNotExist
seen in the first point, but breaking that code's assumption. and hence allowing the loop to repeat without forward progress.The difference from #36375 is only that the reason the inner
os.RemoveAll
returnsIsNotExist
in that case was passing a too-long DOS-style path rather than an incorrect path. (As it happens, Win32DeleteFile
returnsERROR_FILE_NOT_FOUND
when given a too-long path. #36375 was resolved by ensuring filenames are not too long to be passed to the Win32DeleteFile
API, but did not resolve the underlying infinite loop potential.)I haven't tested this, but I'm pretty sure that a directory with the same kind of unusual name would trigger the same behaviour.
One amusing note is that if you delete the problematic file while the loop is hung, it'll unhang and resume without error.
A few solutions come to mind:
syscall.UTF16ToString
or similar, instead callUTF16ToStringButFailInsteadOfUsingTheReplacementCharacter
.syscall.UTF16ToString
is not small, and x/sys/windows would need to be looked at too.os.Readdir
can pass the resulting file names into otheros
APIs successfully.os.RemoveAll
implementation (e.g. take pkg/fs: Add RemoveAll function microsoft/go-winio#261 upstream) would probably fix this entire class of errors by keeping the filenames in native format between reading the directory entries and recursing/deleting them. It would also take care of the long file path issue by using the\\?\
prefix naturally, rather than patching it in after-the-fact.Path
in any generatedPathError
would need to be converted to some byte-oriented format, which would still fail if error-handling code attempts to, e.g.os.Lstat
it.os.RemoveAll
when the loop is not making forward progress in a way that is robust against why.strings
can be used to manipulate filenames freely.My preferred in-Go solution would be encoding all Windows filenames in
string
in WTF-8 (or a different, covers-all-possible-byte-strings encoding, of course. I'm not particularly tied to WTF-8 itself.)I particularly dislike the options that push people to external reimplementations, as I'd prefer a solution that made microsoft/go-winio#261 obsolete, not one that makes it widely necessarily, including requiring platform checks on multi-platform codebases.
Just to highlight, this problem only occurs on Windows because it's only on Windows that
os
is assuming the filenames can be roundtripped via UTF-8.User code that attempted to implement something like
os.RemoveAll
oros.Walk
by directly callingos.Readdir
and also usedstrings
to manipulate the filename strings would be vulnerable to similar issues on all platforms in the presence of non-UTF-8 filenames.The text was updated successfully, but these errors were encountered: