New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support long path names on all Windows versions #2188

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
10 participants
@DemiMarie

DemiMarie commented Oct 24, 2017

Rendered

@DemiMarie DemiMarie changed the title from Initial RFC commit to Support long path names on all Windows versions Oct 24, 2017

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Oct 25, 2017

Member

Required reading

If you want to be able to provide useful comments on this RFC, it is essential that you actually understand the various forms of paths on Windows. Therefore, please read this article before commenting:

https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html

Thank you.

Member

retep998 commented Oct 25, 2017

Required reading

If you want to be able to provide useful comments on this RFC, it is essential that you actually understand the various forms of paths on Windows. Therefore, please read this article before commenting:

https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html

Thank you.

@chris-morgan

This comment has been minimized.

Show comment
Hide comment
@chris-morgan

chris-morgan Oct 25, 2017

Member

Could we expose the transformation function publicly in std::os::windows? Some people are inevitably going to want to call an NT API directly at some point, it’d be nice if we had a function that said “pass your paths through here before calling the NT API directly, to make sure they’ll work on long paths”.

(On the other hand, I don’t think we expose the UTF-16 conversion function that is also needed, and this isn’t quite so useful without that. I do know that I’ve copied code from the standard library to achieve these things before. But exposing that is definitely out of scope for this RFC.)

Member

chris-morgan commented Oct 25, 2017

Could we expose the transformation function publicly in std::os::windows? Some people are inevitably going to want to call an NT API directly at some point, it’d be nice if we had a function that said “pass your paths through here before calling the NT API directly, to make sure they’ll work on long paths”.

(On the other hand, I don’t think we expose the UTF-16 conversion function that is also needed, and this isn’t quite so useful without that. I do know that I’ve copied code from the standard library to achieve these things before. But exposing that is definitely out of scope for this RFC.)

@gurry

This comment has been minimized.

Show comment
Hide comment
@gurry

gurry Oct 26, 2017

I'm building a P2P filesharing application which would be deemed seriously crippled if it cannot support longer paths. So this is very much a needed fix.

@chris-morgan I, for one, would vote for making the std APIs automatically and seemlessly work with longer paths. Requiring the caller to call extra functions for longer paths would be awkward.

gurry commented Oct 26, 2017

I'm building a P2P filesharing application which would be deemed seriously crippled if it cannot support longer paths. So this is very much a needed fix.

@chris-morgan I, for one, would vote for making the std APIs automatically and seemlessly work with longer paths. Requiring the caller to call extra functions for longer paths would be awkward.

@chris-morgan

This comment has been minimized.

Show comment
Hide comment
@chris-morgan

chris-morgan Oct 26, 2017

Member

@gurry I do not describe a function required for normal use, but rather a low-level function for cases where the user is calling the NT API directly.

Member

chris-morgan commented Oct 26, 2017

@gurry I do not describe a function required for normal use, but rather a low-level function for cases where the user is calling the NT API directly.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Oct 26, 2017

Member

Calling NT directly is very special because you're passing NT paths, not win32 paths. A path that starts with \\?\ is a win32 path, not an NT path, and win32 transforms that into a path that starts with \??\ before calling an NT api. However, NT paths are not required to start with \??\ and merely need to start with \ (?? is a directory in the object manager).

However I am still interested in these API additions:

  • Path normalization. On Windows this is different from canonicalization. Normalization involves calling GetFullPathName.
  • Turning an absolute path into a root local device path, by adding the \\?\ prefix as necessary if appropriate.
  • Getting rid of the \\?\ prefix and turning the path into a normal absolute path for compatibility with software that breaks on paths starting with \\?\.
Member

retep998 commented Oct 26, 2017

Calling NT directly is very special because you're passing NT paths, not win32 paths. A path that starts with \\?\ is a win32 path, not an NT path, and win32 transforms that into a path that starts with \??\ before calling an NT api. However, NT paths are not required to start with \??\ and merely need to start with \ (?? is a directory in the object manager).

However I am still interested in these API additions:

  • Path normalization. On Windows this is different from canonicalization. Normalization involves calling GetFullPathName.
  • Turning an absolute path into a root local device path, by adding the \\?\ prefix as necessary if appropriate.
  • Getting rid of the \\?\ prefix and turning the path into a normal absolute path for compatibility with software that breaks on paths starting with \\?\.
CoreFX doesn’t use GetFullPathNameW
I thought that CoreFX (.NET Core standard library) used `GetFullPathNameW`, but it doesn’t.
@pitdicker

This comment has been minimized.

Show comment
Hide comment
@pitdicker

pitdicker Oct 27, 2017

Contributor

I once wrote the beginnings of an RFC for this, but never finished it. It goes a different route, doing the conversion ourselves. It is not very difficult, and also works for files that do not (yet) exist in the filesystem. for reference (I did not recently re-read it)

Contributor

pitdicker commented Oct 27, 2017

I once wrote the beginnings of an RFC for this, but never finished it. It goes a different route, doing the conversion ourselves. It is not very difficult, and also works for files that do not (yet) exist in the filesystem. for reference (I did not recently re-read it)

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Nov 6, 2017

Member

It is not very difficult, and also works for files that do not (yet) exist in the filesystem.

GetFullPathName does not touch the filesystem so it works perfectly fine on paths that don't exist.

Member

retep998 commented Nov 6, 2017

It is not very difficult, and also works for files that do not (yet) exist in the filesystem.

GetFullPathName does not touch the filesystem so it works perfectly fine on paths that don't exist.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Nov 12, 2017

As I am looking at implementation strategies, I am finding that:

We don’t really want bug-for-bug compatibility with Windows.

The Windows behavior:

  • Is poorly documented (or sometimes not documented at all).
  • May change between Windows versions, as has already happened with paths that begin with \??\ (the behavior of which changed between XP and Vista).
  • Error-prone: C:\Users\Lightning\CON opening \??\CON (at the NT level) is not expected, and difficult to work around.
  • Limits our options. Windows supports the equivalent of Unix’s openat(2) via NtCreateFile, but that takes NT paths.

So, I propose that Rust do its own normalization. This will require me to write LOTS of test cases!

DemiMarie commented Nov 12, 2017

As I am looking at implementation strategies, I am finding that:

We don’t really want bug-for-bug compatibility with Windows.

The Windows behavior:

  • Is poorly documented (or sometimes not documented at all).
  • May change between Windows versions, as has already happened with paths that begin with \??\ (the behavior of which changed between XP and Vista).
  • Error-prone: C:\Users\Lightning\CON opening \??\CON (at the NT level) is not expected, and difficult to work around.
  • Limits our options. Windows supports the equivalent of Unix’s openat(2) via NtCreateFile, but that takes NT paths.

So, I propose that Rust do its own normalization. This will require me to write LOTS of test cases!

@ssokolow

This comment has been minimized.

Show comment
Hide comment
@ssokolow

ssokolow Nov 12, 2017

@DemiMarie

When I read that big statement, my initial expectation was that you were going to continue with a list of points in favour of using the Windows normalization function to avoid introducing bugs where Rust's behaviour diverges from the system behaviour. (ie. "We don't want to be responsible for having to match Windows bug-for-bug to avoid problems.")

ssokolow commented Nov 12, 2017

@DemiMarie

When I read that big statement, my initial expectation was that you were going to continue with a list of points in favour of using the Windows normalization function to avoid introducing bugs where Rust's behaviour diverges from the system behaviour. (ie. "We don't want to be responsible for having to match Windows bug-for-bug to avoid problems.")

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Nov 12, 2017

Member

Regardless of what decision we end up taking, we will need a ridiculous number of test cases.

I also recommend first putting the normalization implementation into an external crate and getting it tested in all sorts of zany setups to make sure it behaves as people expect in the most obscure of corner cases.

Member

retep998 commented Nov 12, 2017

Regardless of what decision we end up taking, we will need a ridiculous number of test cases.

I also recommend first putting the normalization implementation into an external crate and getting it tested in all sorts of zany setups to make sure it behaves as people expect in the most obscure of corner cases.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Nov 13, 2017

@ssokolow I would much rather use the system implementation. However, parts of Windows pathname handling are such security footguns (such as DOS devices) that I really really don’t want Rust programs to inherit them.

Personally, I think that we should act just like the Windows API, except for a certain list of special cases, each of which needs to be documented and justified. The ones that come to mind are:

  1. Long path names should be transparently supported by libstd. This one is almost certainly the least controversial.
  2. DOS devices should not be recognized transparently, although one can always access them using \\.\ or the like. This is for security reasons.
  3. Named pipe names should not be normalized. Likewise.
  4. Trailing whitespace and periods should not be stripped from filenames. This is so that one can access any filename on the system, and so that one can feed the results from (say) listing a directory directly into the path-opening functions.

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. 1 is easily accomplished by ensuring that every path passed to CreateFileW et al has already been normalized and prefixed with \\?\. 3 is a matter of checking (before passing to GetFullPathNameW) if the path is that of a named pipe (\\.\pipe\something or \\server\pipe\something) and skipping the GetFullPathNameW call if it is.

DemiMarie commented Nov 13, 2017

@ssokolow I would much rather use the system implementation. However, parts of Windows pathname handling are such security footguns (such as DOS devices) that I really really don’t want Rust programs to inherit them.

Personally, I think that we should act just like the Windows API, except for a certain list of special cases, each of which needs to be documented and justified. The ones that come to mind are:

  1. Long path names should be transparently supported by libstd. This one is almost certainly the least controversial.
  2. DOS devices should not be recognized transparently, although one can always access them using \\.\ or the like. This is for security reasons.
  3. Named pipe names should not be normalized. Likewise.
  4. Trailing whitespace and periods should not be stripped from filenames. This is so that one can access any filename on the system, and so that one can feed the results from (say) listing a directory directly into the path-opening functions.

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. 1 is easily accomplished by ensuring that every path passed to CreateFileW et al has already been normalized and prefixed with \\?\. 3 is a matter of checking (before passing to GetFullPathNameW) if the path is that of a named pipe (\\.\pipe\something or \\server\pipe\something) and skipping the GetFullPathNameW call if it is.

@ssokolow

This comment has been minimized.

Show comment
Hide comment
@ssokolow

ssokolow Nov 13, 2017

I certainly agree.

I was more expressing that, to someone like me who hasn't used Windows on anything other than retro-gaming PCs in over a decade and has only started to learn the depth of the rabbit hole more recently, the degree to which Windows pathname handling is broken boggles the mind. The part about trailing periods and spaces didn't even come to mind for me.

(ie. It was more intended as a very sloppy attempt to indicate that more supporting detail for the rationale wouldn't go amiss.)

ssokolow commented Nov 13, 2017

I certainly agree.

I was more expressing that, to someone like me who hasn't used Windows on anything other than retro-gaming PCs in over a decade and has only started to learn the depth of the rabbit hole more recently, the degree to which Windows pathname handling is broken boggles the mind. The part about trailing periods and spaces didn't even come to mind for me.

(ie. It was more intended as a very sloppy attempt to indicate that more supporting detail for the rationale wouldn't go amiss.)

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Nov 13, 2017

Member

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards.

X:\COM1.blah will still get transformed into \\.\COM1 or \??\COM1, so simply tacking on a suffix won't work.

Have I mentioned the sheer number of test cases we're going to need for this?

Member

retep998 commented Nov 13, 2017

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards.

X:\COM1.blah will still get transformed into \\.\COM1 or \??\COM1, so simply tacking on a suffix won't work.

Have I mentioned the sheer number of test cases we're going to need for this?

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Nov 13, 2017

DemiMarie commented Nov 13, 2017

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Nov 16, 2017

@retep998 I agree with the huge number of test cases!

Here is the algorithm I am leaning towards right now:

  1. If the path begins with \\?\GLOBALROOT\ or \??\GLOBALROOT\, delete everything up to but not including the last backslash in the prefix. Pass the rest directly to NT.
  2. If the path begins with \\?\ or \??\, replace the prefix with \??\ and pass to NT.
  3. If the first four characters match [\\/][\\/].[\\/]: replace prefix with \??\ and pass to NT (??? should we normalize here ???)
  4. If the first two characters match [\\/][\\/]: TODO
  5. TODO

DemiMarie commented Nov 16, 2017

@retep998 I agree with the huge number of test cases!

Here is the algorithm I am leaning towards right now:

  1. If the path begins with \\?\GLOBALROOT\ or \??\GLOBALROOT\, delete everything up to but not including the last backslash in the prefix. Pass the rest directly to NT.
  2. If the path begins with \\?\ or \??\, replace the prefix with \??\ and pass to NT.
  3. If the first four characters match [\\/][\\/].[\\/]: replace prefix with \??\ and pass to NT (??? should we normalize here ???)
  4. If the first two characters match [\\/][\\/]: TODO
  5. TODO
@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Nov 16, 2017

Member

We don't pass paths directly to NT though. We always call win32 functions so stripping the \\?\GLOBALROOT prefix would result in a path like \foo\bar which win32 interprets as a rooted path relative to the current drive. Also, because \\?\ is the officially documented prefix for verbatim paths, I'd rather normalize to that than \??\.

Member

retep998 commented Nov 16, 2017

We don't pass paths directly to NT though. We always call win32 functions so stripping the \\?\GLOBALROOT prefix would result in a path like \foo\bar which win32 interprets as a rooted path relative to the current drive. Also, because \\?\ is the officially documented prefix for verbatim paths, I'd rather normalize to that than \??\.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Dec 6, 2017

I have thought more on this and done some testing, and the following seems to work for every case I have come across:

Subroutine canon_with_suffix:

This adds a suffix (such as \rust) containing a \ to the end of the path, passes it to GetFullPathNameW, and finally strips the suffix. The suffix is needed to prevent unwanted transformations.

Subroutine num_slashes:

This is easiest to just show in code:

fn num_slashes(path: &[u8]) -> usize {
    let mut count = 0;
    for i in path {
        match i {
            b'\\' | b'/' => { count += 1; }
            _ => break
        }
    }
    count
}
The algorithm:
if path.len() > 4 {
    match path[..4] {
        br"\\?\" | br"\??\" => return Ok(to_u16(path)),
        _                   => (),
    }
}
match num_slashes(path) {
    0 | 1 => canon_with_suffix(path),
    2     => process_unc(path),
    _     => Err("The path is invalid"),
}

I haven’t finished process_unc, and it’s late, so I will fill that in later.

DemiMarie commented Dec 6, 2017

I have thought more on this and done some testing, and the following seems to work for every case I have come across:

Subroutine canon_with_suffix:

This adds a suffix (such as \rust) containing a \ to the end of the path, passes it to GetFullPathNameW, and finally strips the suffix. The suffix is needed to prevent unwanted transformations.

Subroutine num_slashes:

This is easiest to just show in code:

fn num_slashes(path: &[u8]) -> usize {
    let mut count = 0;
    for i in path {
        match i {
            b'\\' | b'/' => { count += 1; }
            _ => break
        }
    }
    count
}
The algorithm:
if path.len() > 4 {
    match path[..4] {
        br"\\?\" | br"\??\" => return Ok(to_u16(path)),
        _                   => (),
    }
}
match num_slashes(path) {
    0 | 1 => canon_with_suffix(path),
    2     => process_unc(path),
    _     => Err("The path is invalid"),
}

I haven’t finished process_unc, and it’s late, so I will fill that in later.

@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jan 23, 2018

Contributor

Just leaving a comment so I can come back shortly and spend some more time digging in to this!

For reference (mostly my own), looks like CoreFX (.NET's std) does use GetFullPathNameW when expanding short path names during normalisation, which in turn is called when getting the full path name, which in turn is called by pretty much everything that uses paths.

I like @retep998's suggestion of getting an implementation out in the wild and iterating on it there.

Contributor

KodrAus commented Jan 23, 2018

Just leaving a comment so I can come back shortly and spend some more time digging in to this!

For reference (mostly my own), looks like CoreFX (.NET's std) does use GetFullPathNameW when expanding short path names during normalisation, which in turn is called when getting the full path name, which in turn is called by pretty much everything that uses paths.

I like @retep998's suggestion of getting an implementation out in the wild and iterating on it there.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Jan 29, 2018

Member

When considering a change like this I'd personally be quite curious to explore what other language ecosystems are doing to see if we can learn about the pros/cons of various strategies.

From some googling I've found:

Perhaps some more research could be done into these various ecosystems to see what they're doing and list out some pros/cons of the various strategies?

Member

alexcrichton commented Jan 29, 2018

When considering a change like this I'd personally be quite curious to explore what other language ecosystems are doing to see if we can learn about the pros/cons of various strategies.

From some googling I've found:

Perhaps some more research could be done into these various ecosystems to see what they're doing and list out some pros/cons of the various strategies?

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jan 29, 2018

Member

Manifests

Pros

  • Works on Windows 10.
  • No changes to user code needed.
  • Even affects C/C++ libraries that are linked in.

Cons

  • Does not work on Windows 8.1 and older.
  • Requires serious design work around how to handle manifests, especially if we don't want to stop the user from using their own manifest.

GetFullPathNameW

Pros

  • Works on all versions of Windows.
  • No changes to user code are needed.

Cons

  • Requires internal modifications to std.
  • Does not automatically affect third party libraries that call system functions themselves.
  • Some care needs to be done to ensure that transforming the path ourselves is no different than what win32 would automatically do when converting the path before passing it on to NT. An extensive test suite would help.

Status quo

Pros

  • Developers of Rust itself can be lazy and not care about Windows users.

Cons

  • Windows users are left by the side of the road with no long path support.
Member

retep998 commented Jan 29, 2018

Manifests

Pros

  • Works on Windows 10.
  • No changes to user code needed.
  • Even affects C/C++ libraries that are linked in.

Cons

  • Does not work on Windows 8.1 and older.
  • Requires serious design work around how to handle manifests, especially if we don't want to stop the user from using their own manifest.

GetFullPathNameW

Pros

  • Works on all versions of Windows.
  • No changes to user code are needed.

Cons

  • Requires internal modifications to std.
  • Does not automatically affect third party libraries that call system functions themselves.
  • Some care needs to be done to ensure that transforming the path ourselves is no different than what win32 would automatically do when converting the path before passing it on to NT. An extensive test suite would help.

Status quo

Pros

  • Developers of Rust itself can be lazy and not care about Windows users.

Cons

  • Windows users are left by the side of the road with no long path support.
@KodrAus

This comment has been minimized.

Show comment
Hide comment
@KodrAus

KodrAus Jan 30, 2018

Contributor

Thanks @retep998! I think we should include your summary in some form in the Alternatives section of the RFC.

The GetFullPathNameW approach seems like the only immediately viable one to me, since Windows Server 2012 / 8.1 are still in support for a long time, so we should try and support them. I'm feeling like iteration/discussion over an implementation is probably the next step, so perhaps we can keep this RFC intentionally vague on details and move it forward and/or work on a possible implementation in the nursery.

What do you all think?

Contributor

KodrAus commented Jan 30, 2018

Thanks @retep998! I think we should include your summary in some form in the Alternatives section of the RFC.

The GetFullPathNameW approach seems like the only immediately viable one to me, since Windows Server 2012 / 8.1 are still in support for a long time, so we should try and support them. I'm feeling like iteration/discussion over an implementation is probably the next step, so perhaps we can keep this RFC intentionally vague on details and move it forward and/or work on a possible implementation in the nursery.

What do you all think?

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Aug 23, 2018

Member

@rfcbot fcp close

This RFC has been open for quite some time and I'd like to propose that we close this RFC. My previous findings seem to indicate that there's not necessarily broad consensus among language standard libraires about whether this is a good idea or not, and I'd personally prefer to err on the side of less "magic" in the standard library.

While I think that it's always possible we could support the manifest option more easily in Rust I'd specifically like to propose closing this RFC with the idea of inserting expansion of paths on all system calls in the standard library.

Member

alexcrichton commented Aug 23, 2018

@rfcbot fcp close

This RFC has been open for quite some time and I'd like to propose that we close this RFC. My previous findings seem to indicate that there's not necessarily broad consensus among language standard libraires about whether this is a good idea or not, and I'd personally prefer to err on the side of less "magic" in the standard library.

While I think that it's always possible we could support the manifest option more easily in Rust I'd specifically like to propose closing this RFC with the idea of inserting expansion of paths on all system calls in the standard library.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Aug 23, 2018

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented Aug 23, 2018

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Sep 5, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Sep 5, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Sep 15, 2018

The final comment period, with a disposition to close, as per the review above, is now complete.

By the power vested in me by Rust, I hereby close this RFC.

rfcbot commented Sep 15, 2018

The final comment period, with a disposition to close, as per the review above, is now complete.

By the power vested in me by Rust, I hereby close this RFC.

@rfcbot rfcbot added closed and removed disposition-close labels Sep 15, 2018

@rfcbot rfcbot closed this Sep 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment