Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
syscall: Windows filenames with unpaired surrogates are not handled correctly #32334
There is a well-known issue with Windows/NTFS (see rust-lang/rust#12056 and https://lwn.net/Articles/684181/) where filenames are treated as UTF-16 but are allowed to contain unpaired surrogates. But
I'm not sure what a reasonable solution would be. I guess essentially something like WTF-8 where the strings that come back from these syscalls on Windows are generally valid UTF-8 but might not be?
I'm not a Windows developer so I'm not sure how often this issue comes up in real life, but I happened to notice it so I thought I'd flag it in case anyone finds it worth taking action, or so people can find this documentation of the issue if they encounter it.
What version of Go are you using (
@hundt thank you very much for creating this issue.
I can reproduce your problem here on my Windows 10. I also tried opening file with 'corrupted' file name with notepead++, and notepead++ can read and write the file. Mind you notepead++ uses standard system 'Open' dialogue to get the filename.
So, I agree. Go should, probably, deal with these file names somehow.
Unfortunately I don't have time to deal with this issue. So leaving for others.
I lack the ability to tag issues, but I wonder if this issue has security implications. I am thinking of things like:
I am not a security expert so maybe these scenarios are far-fetched or not that important.
This is caused by that Go replace invalid sequences to 0xFFFD with
This is right behavior in UTF-8. 0xED indicate range 0x80-0x9F for next byte. But 0xB0 is out of the range. If utf16.Decode accept this invalid range for second byte (by any chance?), 0x80 is in range of 0x80-0xBF, then codepoint should be: