Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: success of canonicalizing r"\" depends on whether set_current_dir has been called #49342

Open
vitiral opened this issue Mar 24, 2018 · 15 comments
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-enhancement Category: An issue proposing an enhancement or a PR with one. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@vitiral
Copy link
Contributor

vitiral commented Mar 24, 2018

Problem: sometimes `Path::new(r"").canonicalize() fails

Note: apparently this isn't necessarily against the spec (a better link to the windows spec would be nice)

In Windows it's relative to what drive your current working directory is at the time. If your current directory is in the C drive then C:\ would be the root. If the current directory is the D drive then D:\ would be the root. There is no absolute root.

Regardless of whether it is against the spec or not, I think basing the "root location" off of the current_dir is a best practice. At the very least, having canonicalize(r"\") fail is unexpected.

Full Report

I hit this problem during one of my tests for path_abs. The way I handle it is thorny (and there is still a TODO hidden in there).

The below test passes on windows (and linux obviously):

    if cfg!(windows) {
        let result = Path::new(r"\").canonicalize();
        assert!(result.is_ok(), "Should work before set_current_dir is called: {:?}", result);
    }
    let tmp = tempdir::TempDir::new("ex").unwrap();
    let tmp = tmp.path();
    let tmp_abs = PathArc::new(&tmp).canonicalize().unwrap();
    env::set_current_dir(&tmp_abs).unwrap();
    if cfg!(windows) {
        let result = Path::new(r"\").canonicalize();
        assert!(result.is_err());
        println!("Got ERR cananonicalizing root: {}", result.unwrap_err());
    }\

The way I handle this in path_abs is the following:

  • If the first component is Root
  • Get the current_dir and use it's root instead.

That code is here (it's messy and tied to resolving the absolute path in general).

Rust Info

Running in Windows Appveyor: https://ci.appveyor.com/project/vitiral/path-abs

  stable installed - rustc 1.24.1 (d3ae9a9e0 2018-02-27)
@retep998
Copy link
Member

Personally I'd argue that we shouldn't try to "fix" the behavior from the OS with regards to canonicalization. It works as intended, which is to open a file or folder and then ask the OS what the canonical path to it is.

What you might actually want is to simply call GetFullPathNameW which has additional benefits in that it is able to handle drive relative paths such as C:foo. Win32 respects the per drive directories that cmd sets, but current_dir does not tell you what they are, unlike GetFullPathNameW which is able to resolve them correctly.

@retep998 retep998 added the O-windows Operating system: Windows label Mar 25, 2018
@vitiral
Copy link
Contributor Author

vitiral commented Mar 25, 2018

Thanks for the advice! I won't argue with what the rust core team feels is the right behavior. I agree that doing workarounds of poor OS design is not ideal, but sometimes that's what you would expect a standard library to do so shrug.

Do you mean C:/foo instead of C:foo?

@retep998
Copy link
Member

retep998 commented Mar 25, 2018

No, I mean C:foo which is relative to the current directory for that drive. If the current directory is C:\bar for example, then C:foo resolves to C:\bar\foo; whereas if the current directory is D:\bar, then you're gonna have a difficult time telling what C:foo resolves to unless you use GetPathNameW.

@XAMPPRocky XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 26, 2018
@Screwtapello
Copy link
Contributor

I spent some time trying to reproduce this issue (from the description above and using the code that originally exhibited the problem; in a Win10 VM and using the AppVeyor configuration where the problem was originally observed). In all cases, the test passed and I could not reproduce the problem.

@Screwtapello
Copy link
Contributor

I take it back, I finally figured out what's going on. I wrote a program that, for various paths A and B, would set_current_dir(A) and then canonicalize(B). In the following table, each column represents a possible current directory, each row represents a canonicalizable path, and each cell contains the result of canonicalization, if successful. Cells containing "-" returned an error instead.

\\?\C:\ C:\ \\vboxsvr\Shared
\\?\C:\Users \\?\C:\Users \\?\C:\Users \\?\C:\Users
C:\Users \\?\C:\Users \\?\C:\Users \\?\C:\Users
C:Users - \\?\C:\Users -
\Users - \\?\C:\Users \\?\UNC\vboxsvr\Shared\Users
Users \\?\C:\Users \\?\C:\Users \\?\UNC\vboxsvr\Shared\Users
\ - \\?\C:\ \\?\UNC\vboxsvr\Shared\

From these results, I generalize:

  • No matter what the current directory is, you can always canonicalize an fully absolute path
  • No matter what the current directory is, you can always canonicalize a fully relative path
  • If the current directory uses extended path syntax (\\?\ prefix), partially-relative paths like C:foo or \foo don't work, presumably because extended path syntax is "outside" the current-drive system they depend on

Also, if the current directory is a UNC path (\\server\share prefix), C:foo relative paths don't work but \foo relative paths do; presumably UNC paths are some kind of drive without a letter.

None of this is mentioned in the SetCurrentDirectory docs.

I'm not entirely sure what to suggest here. As @retep998 suggests, we probably shouldn't "fix" the OS behaviour with respect to canonicalization, but it'd be nice to mention it in the docs somewhere and save future generations the headache I've suffered. On the other hand, if it's too obscure to show up in the platform-specific MSDN docs, it's hard to justify mentioning it in the cross-platform Rust docs.

@vitiral
Copy link
Contributor Author

vitiral commented Aug 3, 2018 via email

@steveklabnik
Copy link
Member

Triage; looks like @rust-lang/libs would have to make some sort of decision here.

@Amanieu
Copy link
Member

Amanieu commented Apr 13, 2020

As I am not very familiar with the intricate details of Windows paths, I am inclined to defer to @retep998 for a decision.

@ChrisDenton
Copy link
Member

ChrisDenton commented Apr 13, 2020

I'd note that the Windows NT kernel does not really have the same concept of relative paths. All paths are absolute*. Interpreting relative paths happens in the Win32 user space. The Win32 API will call GetFullPathNameW (or actually the internal equivalent) to get an absolute path. When you use the \\?\ prefix then you're instead telling Win32 to bypass resolving the path and to pass that path verbatim as an NT absolute path.

However, the problem comes when you set the current directory to an NT path (meaning Win32 should bypass resolving path) but use a relative path when opening a file (meaning Win32 should resolve the relative path before opening). I'm pretty sure this is undefined behaviour in the sense that this is not an expected situation. So it could produce weird paths which aren't openable due to not existing.

I'd assume that future updates of Windows 10 could better account for this scenario but it's likely always going to be a weird mix.

For what it's worth, my take is the documentation should be updated to advise against using \\?\ paths for the current directory.

@ChrisDenton
Copy link
Member

An alternative would be for set_current_dir to automatically strip the prefix and document that.

@Screwtapello
Copy link
Contributor

Silently stripping the prefix probably isn't a good idea, since there's a variety of perfectly sensible paths that will break in confusing ways if Win32 is allowed to mess with them (paths longer than 256 characters, for example, or paths containing the segment CON).

I would support updating the set_current_dir() docs to advise against using prefixed paths, or updating the implementation to return an error instead of attempting to set the current path. Anyone using prefixed paths is probably doing it because they want to handle path resolution themselves instead of leaving it up to Win32, so they are unlikely to call set_current_dir() except by accident.

@ChrisDenton
Copy link
Member

ChrisDenton commented Apr 13, 2020

Sorry, I was not suggesting ignoring the prefix in the general case. Only when calling set_current_dir. E.g., something like this (not tested):

fn chdir(p: &path::Path) -> io::Result<()> {
    let p: &OsStr = p.as_ref();
    
    // Skip the `\\?\` verbatim prefix, if any.
    static PREFIX: &[u8] = br"\\?\";
    let bytes = os_str_as_u8_slice(p);
    let prefix_len = if bytes.starts_with(PREFIX) {
        PREFIX.len()
    } else {
        0
    };
    
    let mut p = p.encode_wide().skip(prefix_len).collect::<Vec<_>>();
    p.push(0);
    
    cvt(unsafe { c::SetCurrentDirectoryW(p.as_ptr()) }).map(drop)
}

However, returning an error may well be a better option. Though my hunch is most people who call set_current_dir with a verbatim path are doing so because they called canonicalize on the path beforehand.

@ChrisDenton
Copy link
Member

So to summarise: if the path contains a verbatim prefix set_current_dir should do one of the following:

  • Return an error without calling SetCurrentDirectory.
    • Issue: It's not easy for users to manually remove the prefix.
  • Silently call SetCurrentDirectory without the prefix.
    • Issue: May be unexpected.
  • Call SetCurrentDirectory with the prefix (the current behaviour).
    • Issue: Will return success but may cause issues later.

IMHO, whatever the choice the behaviour should be explicitly documented.

@ChrisDenton
Copy link
Member

The following is my conclusions based on testing in Windows 10 version 1909.

  • When setting the current directory:

    • the maximum path length is ~258, even if you use the \\?\ prefix.
    • using \\?\ prefixed paths will work but may produce seemingly odd results when opening relative file paths.
  • When opening files (including for canonicalization):

    • For best results use an absolute path. The current directory will not affect absolute paths.
    • Failing that, use plain sub paths like file.txt, folder\file.txt or .\file.txt.
    • While ..\file.txt can also be used, it may have inconsistent results with a root folder.
      • If the current directory is an ordinary win32 path it will return the same as .\file.txt
      • If the current directory is \\?\ prefixed it will try (and fail) to open an invalid file name.
    • Don't use drive relative paths like C:file.txt or root relative paths like \file.txt. These don't work well with a \\?\ prefixed current directory.
    • Keep in mind that other code running in the same process can change the current directory. Don't rely on it to stay the same.

I'm not sure whether this is too platform specific for the Rust docs or if something like the above (minus my commentary) would be a beneficial addition. I think the biggest issues is, as in the original report, using canonicalize to set the current directory (therefore using the \\?\ prefix) and then using \ to open a file (or other relative syntax that requires \\?\ aware parsing of the current directory).

That specific issue could also be side-stepped by automatically removing the prefix in the case of drive and UNC paths. We already have std::path::Prefix for detecting the type of path. So it just requires special casing VerbatimDisk and VerbatimUNC (other verbatim paths can't be transformed because they likely don't have a win32 representation but these are unlikely to returned by canonicalize). However, this wouldn't stop code outside of Rust (e.g., from an imported DLL or static library) from changing the current directory to a \\?\ path.

@ChrisDenton
Copy link
Member

I could more succinctly summarise the above as: Relative paths are not \\?\ aware.

@workingjubilee workingjubilee added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-enhancement Category: An issue proposing an enhancement or a PR with one. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants