rust-lang/rfcs

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Support long path names on all Windows versions #2188

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
10 participants

DemiMarie commented Oct 24, 2017

 Initial RFC commit 
 bdf262a 

Member

retep998 commented Oct 25, 2017 • edited

If you want to be able to provide useful comments on this RFC, it is essential that you actually understand the various forms of paths on Windows. Therefore, please read this article before commenting:

Thank you.

Member

chris-morgan commented Oct 25, 2017

 Could we expose the transformation function publicly in std::os::windows? Some people are inevitably going to want to call an NT API directly at some point, it’d be nice if we had a function that said “pass your paths through here before calling the NT API directly, to make sure they’ll work on long paths”. (On the other hand, I don’t think we expose the UTF-16 conversion function that is also needed, and this isn’t quite so useful without that. I do know that I’ve copied code from the standard library to achieve these things before. But exposing that is definitely out of scope for this RFC.)

gurry commented Oct 26, 2017 • edited

 I'm building a P2P filesharing application which would be deemed seriously crippled if it cannot support longer paths. So this is very much a needed fix. @chris-morgan I, for one, would vote for making the std APIs automatically and seemlessly work with longer paths. Requiring the caller to call extra functions for longer paths would be awkward.
Member

chris-morgan commented Oct 26, 2017

 @gurry I do not describe a function required for normal use, but rather a low-level function for cases where the user is calling the NT API directly.
Member

retep998 commented Oct 26, 2017 • edited

 Calling NT directly is very special because you're passing NT paths, not win32 paths. A path that starts with \\?\ is a win32 path, not an NT path, and win32 transforms that into a path that starts with \??\ before calling an NT api. However, NT paths are not required to start with \??\ and merely need to start with \ (?? is a directory in the object manager). However I am still interested in these API additions: Path normalization. On Windows this is different from canonicalization. Normalization involves calling GetFullPathName. Turning an absolute path into a root local device path, by adding the \\?\ prefix as necessary if appropriate. Getting rid of the \\?\ prefix and turning the path into a normal absolute path for compatibility with software that breaks on paths starting with \\?\.
 CoreFX doesn’t use GetFullPathNameW 
I thought that CoreFX (.NET Core standard library) used GetFullPathNameW, but it doesn’t.
 b7f95dd 

Closed

Contributor

pitdicker commented Oct 27, 2017

 I once wrote the beginnings of an RFC for this, but never finished it. It goes a different route, doing the conversion ourselves. It is not very difficult, and also works for files that do not (yet) exist in the filesystem. for reference (I did not recently re-read it)
Member

retep998 commented Nov 6, 2017 • edited

 It is not very difficult, and also works for files that do not (yet) exist in the filesystem. GetFullPathName does not touch the filesystem so it works perfectly fine on paths that don't exist.

DemiMarie commented Nov 12, 2017

As I am looking at implementation strategies, I am finding that:

We don’t really want bug-for-bug compatibility with Windows.

The Windows behavior:

• Is poorly documented (or sometimes not documented at all).
• May change between Windows versions, as has already happened with paths that begin with \??\ (the behavior of which changed between XP and Vista).
• Error-prone: C:\Users\Lightning\CON opening \??\CON (at the NT level) is not expected, and difficult to work around.
• Limits our options. Windows supports the equivalent of Unix’s openat(2) via NtCreateFile, but that takes NT paths.

So, I propose that Rust do its own normalization. This will require me to write LOTS of test cases!

ssokolow commented Nov 12, 2017

 @DemiMarie When I read that big statement, my initial expectation was that you were going to continue with a list of points in favour of using the Windows normalization function to avoid introducing bugs where Rust's behaviour diverges from the system behaviour. (ie. "We don't want to be responsible for having to match Windows bug-for-bug to avoid problems.")
Member

retep998 commented Nov 12, 2017 • edited

 Regardless of what decision we end up taking, we will need a ridiculous number of test cases. I also recommend first putting the normalization implementation into an external crate and getting it tested in all sorts of zany setups to make sure it behaves as people expect in the most obscure of corner cases.

DemiMarie commented Nov 13, 2017

 @ssokolow I would much rather use the system implementation. However, parts of Windows pathname handling are such security footguns (such as DOS devices) that I really really don’t want Rust programs to inherit them. Personally, I think that we should act just like the Windows API, except for a certain list of special cases, each of which needs to be documented and justified. The ones that come to mind are: Long path names should be transparently supported by libstd. This one is almost certainly the least controversial. DOS devices should not be recognized transparently, although one can always access them using \\.\ or the like. This is for security reasons. Named pipe names should not be normalized. Likewise. Trailing whitespace and periods should not be stripped from filenames. This is so that one can access any filename on the system, and so that one can feed the results from (say) listing a directory directly into the path-opening functions. 2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. 1 is easily accomplished by ensuring that every path passed to CreateFileW et al has already been normalized and prefixed with \\?\. 3 is a matter of checking (before passing to GetFullPathNameW) if the path is that of a named pipe (\\.\pipe\something or \\server\pipe\something) and skipping the GetFullPathNameW call if it is.

ssokolow commented Nov 13, 2017 • edited

 I certainly agree. I was more expressing that, to someone like me who hasn't used Windows on anything other than retro-gaming PCs in over a decade and has only started to learn the depth of the rabbit hole more recently, the degree to which Windows pathname handling is broken boggles the mind. The part about trailing periods and spaces didn't even come to mind for me. (ie. It was more intended as a very sloppy attempt to indicate that more supporting detail for the rationale wouldn't go amiss.)
Member

retep998 commented Nov 13, 2017 • edited

 2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. X:\COM1.blah will still get transformed into \\.\COM1 or \??\COM1, so simply tacking on a suffix won't work. Have I mentioned the sheer number of test cases we're going to need for this?

DemiMarie commented Nov 13, 2017

 You can add a suffix that begins with a backslash. … On Nov 12, 2017 9:36 PM, "Peter Atashian" ***@***.***> wrote: 2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. X:\COM1.blah will still get transformed into \\.\COM1 or \??\COM1, so simply tacking on a suffix won't work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2188 (comment)>, or mute the thread .

DemiMarie commented Nov 16, 2017

 @retep998 I agree with the huge number of test cases! Here is the algorithm I am leaning towards right now: If the path begins with \\?\GLOBALROOT\ or \??\GLOBALROOT\, delete everything up to but not including the last backslash in the prefix. Pass the rest directly to NT. If the path begins with \\?\ or \??\, replace the prefix with \??\ and pass to NT. If the first four characters match [\\/][\\/].[\\/]: replace prefix with \??\ and pass to NT (??? should we normalize here ???) If the first two characters match [\\/][\\/]: TODO TODO
Member

retep998 commented Nov 16, 2017

 We don't pass paths directly to NT though. We always call win32 functions so stripping the \\?\GLOBALROOT prefix would result in a path like \foo\bar which win32 interprets as a rooted path relative to the current drive. Also, because \\?\ is the officially documented prefix for verbatim paths, I'd rather normalize to that than \??\.

Closed

DemiMarie commented Dec 6, 2017

I have thought more on this and done some testing, and the following seems to work for every case I have come across:

Subroutine canon_with_suffix:

This adds a suffix (such as \rust) containing a \ to the end of the path, passes it to GetFullPathNameW, and finally strips the suffix. The suffix is needed to prevent unwanted transformations.

Subroutine num_slashes:

This is easiest to just show in code:

fn num_slashes(path: &[u8]) -> usize {
let mut count = 0;
for i in path {
match i {
b'\\' | b'/' => { count += 1; }
_ => break
}
}
count
}
The algorithm:
if path.len() > 4 {
match path[..4] {
br"\\?\" | br"\??\" => return Ok(to_u16(path)),
_                   => (),
}
}
match num_slashes(path) {
0 | 1 => canon_with_suffix(path),
2     => process_unc(path),
_     => Err("The path is invalid"),
}

I haven’t finished process_unc, and it’s late, so I will fill that in later.

Contributor

KodrAus commented Jan 23, 2018 • edited

 Just leaving a comment so I can come back shortly and spend some more time digging in to this! For reference (mostly my own), looks like CoreFX (.NET's std) does use GetFullPathNameW when expanding short path names during normalisation, which in turn is called when getting the full path name, which in turn is called by pretty much everything that uses paths. I like @retep998's suggestion of getting an implementation out in the wild and iterating on it there.
Member

alexcrichton commented Jan 29, 2018

 When considering a change like this I'd personally be quite curious to explore what other language ecosystems are doing to see if we can learn about the pros/cons of various strategies. From some googling I've found: Python rejected modifying paths in favor of using an application manifest on Windows 10 (which I think is rust-lang/rust#11207 for us?). Ruby may not do anything, but that's pretty old, so maybe the situation has changed? Node.js I believe automatically does some form of normalization (although I'm not sure which). Despite this though it looks like there have at least historically been trouble with other tooling. Java may natively support long paths but that post at least doesn't reach a conclusion of how. .NET as @KodrAus mentioned seems to use GetFullPathNameW liberally Perhaps some more research could be done into these various ecosystems to see what they're doing and list out some pros/cons of the various strategies?
Member

Manifests

Pros

• Works on Windows 10.
• No changes to user code needed.
• Even affects C/C++ libraries that are linked in.

Cons

• Does not work on Windows 8.1 and older.
• Requires serious design work around how to handle manifests, especially if we don't want to stop the user from using their own manifest.

GetFullPathNameW

Pros

• Works on all versions of Windows.
• No changes to user code are needed.

Cons

• Requires internal modifications to std.
• Does not automatically affect third party libraries that call system functions themselves.
• Some care needs to be done to ensure that transforming the path ourselves is no different than what win32 would automatically do when converting the path before passing it on to NT. An extensive test suite would help.

Status quo

Pros

• Developers of Rust itself can be lazy and not care about Windows users.

Cons

• Windows users are left by the side of the road with no long path support.
Contributor

KodrAus commented Jan 30, 2018

 Thanks @retep998! I think we should include your summary in some form in the Alternatives section of the RFC. The GetFullPathNameW approach seems like the only immediately viable one to me, since Windows Server 2012 / 8.1 are still in support for a long time, so we should try and support them. I'm feeling like iteration/discussion over an implementation is probably the next step, so perhaps we can keep this RFC intentionally vague on details and move it forward and/or work on a possible implementation in the nursery. What do you all think?
Member

alexcrichton commented Aug 23, 2018

 @rfcbot fcp close This RFC has been open for quite some time and I'd like to propose that we close this RFC. My previous findings seem to indicate that there's not necessarily broad consensus among language standard libraires about whether this is a good idea or not, and I'd personally prefer to err on the side of less "magic" in the standard library. While I think that it's always possible we could support the manifest option more easily in Rust I'd specifically like to propose closing this RFC with the idea of inserting expansion of paths on all system calls in the standard library.

rfcbot commented Aug 23, 2018 • edited by dtolnay

 Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me.

rfcbot commented Sep 5, 2018

 🔔 This is now entering its final comment period, as per the review above. 🔔

Open

rfcbot commented Sep 15, 2018

 The final comment period, with a disposition to close, as per the review above, is now complete. By the power vested in me by Rust, I hereby close this RFC.

rfcbot closed this Sep 15, 2018

to join this conversation on GitHub. Already have an account? Sign in to comment