Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Path.normalize() method #47363

Closed
wants to merge 17 commits into from

Conversation

Projects
None yet
@ThatsGobbles
Copy link

ThatsGobbles commented Jan 11, 2018

As per the discussion here: rust-lang/rfcs#2208

I found myself with a need to normalize file paths. By "normalize", I refer to the lexical cleanup of a path, following a similar algorithm as Python's os.path.normpath() and Go's filepath.Clean(). These in turn follow the methodology outlined in Rob Pike's paper: Lexical File Names in Plan 9.

I was able to write a method that works for my use case, and I felt it could be a helpful addition to stdlib! However, I am very new to both Rust and contributing to OSS projects. Please do let me know if there are any changes that could/should be made in those regards.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 11, 2018

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 11, 2018

I don't see any tests for \\.\ device paths or \foo\bar rooted paths. There's a lot of tests missing from the Windows version.

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Jan 11, 2018

@retep998: Those are good points, I can add those tests. What would be some other good Windows-specific tests to include?

I must note, I don't have access to a Windows dev box, so it's tricky to run these Windows-specific tests.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 11, 2018

At the very least, we should have every single example from the tables on https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 11, 2018

You should also include test cases for filenames containing spaces, tabs, newlines, and all three.

Please use raw strings for filenames containing backslashes, so that the test cases don't need to escape backslashes.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 11, 2018

@rust-lang/libs Do we want to use GetFullPathNameW on Windows or do we want to roll our own normalization, and if we do choose the latter, where exactly should we differ from the behavior of GetFullPathNameW?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 11, 2018

@retep998 If we don't use GetFullPathNameW on Windows, perhaps we should use it as part of the test cases, by calling both it and Path.normalize() on every test case and failing if they differ. (Unless we have some reason to differ, in which case we should have specific test cases for that.)

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Jan 11, 2018

Does GetFullPathNameW use the underlying filesystem? My goal with this method was to perform purely lexical conversion of a path.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 11, 2018

@ThatsGobbles According to the MSDN documentation, "This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."

It does, however, look at the process's current working directory.

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Jan 12, 2018

I managed to get a Windows 10 VM up and running, and am trying out tests. The normalization does not seem to like the examples in The Definitive Guide to NT and Win32 Paths involving trailing dots and spaces at the end of a path (the DEF. . examples), mixing forward and backslashes in UNC or root local device paths (e.g. \\?\X:/ABC/DEF), or using COM ports. It seems that Path.components() isn't treating these as special cases?

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Jan 12, 2018

It appears that this Windows shell function does what I'm aiming to do with my function!

Now the question would be, how do I get access to it in my Rust code?

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Jan 16, 2018

As it turns out, I'm not really sure what I should be doing to get this working. I don't know anything about Windows C FFI in Rust, which is stopping me from using the PathCch methods (not to mention, I'd like for normalize to be a pure function that can't fail, and relying on a Windows API seems like it introduces the possibility of failure).

Are there any guides or reading I can look at to get my feet wet with this?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 17, 2018

I'm going to defer to @retep998 here.

@carols10cents

This comment has been minimized.

Copy link
Member

carols10cents commented Jan 22, 2018

Ping @retep998! Do you have any advice for @ThatsGobbles ?

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Feb 6, 2018

Hello all,

I concur with the name change: clean is far more clear as to what the method is trying to do!

As for tests, I've been incorporating the examples from the linked page, but I'm running into issues with Windows paths. It seems that Path.components() trips up some of the edge cases, and since clean (formerly known as normalize) depends heavily on components, I'm not seeing correct results. Perhaps I'm missing something, either code- or understanding-wise?

/// ```
#[unstable(feature = "path normalize", issue = "47402")]
pub fn normalize(&self) -> PathBuf {
#[unstable(feature = "path cleaning", issue = "47402")]

This comment has been minimized.

@kennytm

kennytm Feb 6, 2018

Member

Feature names usually don't contain spaces.

@kennytm

kennytm approved these changes Feb 6, 2018

tc!("/../../", "/");
} else {
// Drive-absolute paths.
tc!(r#"X:\ABC\DEF"#, r#"X:\ABC\DEF"#);

This comment has been minimized.

@kennytm

kennytm Feb 6, 2018

Member

Nit pick. Unless your path contains a " you could just lose the #s

tc!(r"X:\ABC\DEF", r"X:\ABC\DEF");

I think this is much less noisy.

@BatmanAoD

This comment has been minimized.

Copy link

BatmanAoD commented Feb 13, 2018

Ping from triage! @kennytm and @joshtriplett , it looks like the author needs some help dealing with Windows paths. Can either of you recommend someone who might be able to help? Or should we tag a team? I've also asked on IRC.

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Feb 13, 2018

@ThatsGobbles Could you list the test cases which tripped up components()?

@ArazAbishov

This comment has been minimized.

Copy link
Member

ArazAbishov commented Feb 22, 2018

@ThatsGobbles Ping from the release team :) Did you have a chance to take a look into what @kennytm proposed to do?

@ThatsGobbles

This comment has been minimized.

Copy link
Author

ThatsGobbles commented Feb 25, 2018

Hi @kennytm and @ArazAbishov, apologies for the slow response, I found myself with little free time this week! Here's the current set of problem cases that I've discovered:

X:\ABC\DEF. .
X:DEF. .
\ABC\DEF. .
ABC\DEF. .
\\server\share\ABC. .
//server/share/ABC/DEF
\\.\X:\ABC\DEF. .
\\.\X:/ABC/DEF
\\.\X:\ABC\..\..\C:\
COM1
X:\COM1
X:COM1
valid\COM1
X:\notvalid\COM1
X:\COM1.blah
X:\COM1:blah
X:\COM1  .blah
\\.\X:\COM1
\\abc\xyz\COM1
\\?\X:\ABC\DEF. .
\\?\X:/ABC/DEF
\??\X:\ABC\DEF
\??\X:\
\??\X:
\??\X:\COM1
\??\X:\ABC\DEF. .
\??\X:/ABC/DEF
\??\X:\ABC\..\XYZ
\??\X:\ABC\..\..\..

All of the cases ending in '. .' failed due to those not getting normalized away. Most of the UNC paths did not trip up .components() per say, but when re-joining, there would be an unexpected trailing slash.

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Mar 5, 2018

Ping from the release team! @kennytm the author seems to be still stuck, but replied to your request for information. Can you (or @retep998) help them out?

@kennytm kennytm self-assigned this Mar 6, 2018

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Mar 10, 2018

@ThatsGobbles Hi sorry for long delay. I'm not totally familiar with Windows path, so perhaps @retep998 can give some more mentoring. (The failing test cases are in the source file already)

  1. Cases ending with . ..

    I think this components() should still return "DEF. ." as is, meaning the removal of the dots and spaces should be done inside the clean() function. You may want to add a sys::path::clean_file_name function which will remove . and from the last path component.

    // Pseudo-code. OsStr::trim_right does not exist, implement it yourself.
    #[cfg(windows)]
    pub fn clean_file_name(f: &OsStr) -> &OsStr {
        f.trim_right(&['.', ' '])
    }
    #[cfg(not(windows))]
    pub fn clean_file_name(f: &OsStr) -> &OsStr {
        f
    }

    You also need to add a test to ensure trailing dots and slashes in the middle are not trimmed out.

    tc!("X:\ABC...\DEF...", "X:\ABC...\DEF");
    tc!("X:\ABC...\DEF...\", "X:\ABC...\DEF...\");

    The components() iterator will not recognize the trailing slash. You might need to manually check self.inner.ends_with(MAIN_SEPARATOR).

  2. Prefix with / i.e. //server/share/ABC/DEF.

    The Project Zero blog suggests replacing all / with \ as the first step, which we will not be doing due to immutability 🙂. However, we may modify sys::path::parse_prefix to recognize / as the same as \.

    We need to add an argument, fn parse_prefix(path: &OsStr, is_cleaning: bool) -> Option<Prefix> to restrict the new behavior within the clean() function. This means you have to construct the Components instance yourself, instead of calling the public self.components()

  3. UNC with middle /

    Instead of always pushing item.as_os_str(), you may need to special-case it for UNC prefixes, so that the prefix \\server\share would be pushed separately as two components. This allows us to construct a \\server\share even if the source was \\server/share.

  4. COM1.

    You'll need to hard-code the reserved filenames.

    #[cfg(windows)]
    fn clean_reserved_path(filename: &OsStr) -> Option<PathBuf> {
        if let Some(capture) = filename.capture(r"^(COM[0-9¹²³]|…)(?:[:. ]*[:.].*)?$") {
           Some(r"\\.\" + capture.get(1))
        } else {
           None
        }
    }
    #[cfg(not(windows))]
    fn clean_reserved_path(_: &OsStr) -> Option<PathBuf> { 
        None
    }

    In your clean(), if you find the prefix is Drive Absolute, Drive Relative or Relative, you need to call clean_reserved_path() on the filename and return it directly if it is Some.

    match prefix {
        None | Some(Disk(_)) => {
            if let Some(cleaned) = sys::path::clean_reserved_path(file_name) {
                return cleaned;
            }
        }
        _ => {}
    }
  5. Trailing slashes

    I think normalizing \\foo into \\foo\ and \\foo\bar into \\foo\bar\ is OK.

@shepmaster

This comment has been minimized.

Copy link
Member

shepmaster commented Mar 18, 2018

Ping from triage, @ThatsGobbles ! You've got some feedback from @kennytm; will you be able to address it in the near future?

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Mar 26, 2018

Thank you so much for this PR @ThatsGobbles! Unfortunately, we haven't heard from you in a while, so I'm closing this to keep the list of PRs clean.

If you have time to work on this again please reopen the pull request though! We will be happy to review the new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.