New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is the ideal FileType on Windows. You may not like it, but this is what peak performance looks like. #47956

Merged
merged 6 commits into from Feb 17, 2018

Conversation

Projects
None yet
@retep998
Member

retep998 commented Feb 2, 2018

Theoretically this would fix #46484

The current iteration of this PR should not cause existing code to break, but instead merely improves handling around reparse points. Specifically...

  • Reparse points are considered to be symbolic links if they have the name surrogate bit set. Name surrogates are reparse points that effectively act like symbolic links, redirecting you to a different directory/file. By checking for this bit instead of specific tags, we become much more general in our handling of reparse points, including those added by third parties.
  • If something is a reparse point but does not have the name surrogate bit set, then we ignore the fact that it is a reparse point because it is actually a file or directory directly there, despite having additional handling by drivers due to the reparse point.
  • For everything which is not a symbolic link (including non-surrogate reparse points) we report whether it is a directory or a file based on the presence of the directory attribute bit.
  • Notably this still preserves invariant that when is_symlink returns true, both is_dir and is_file will return false. The potential for breakage was far too high.
  • Adds an unstable FileTypeExt to allow users to determine whether a symbolic link is a directory or a file, since FileType by design is incapable of reporting this information.
@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Feb 2, 2018

Collaborator

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Feb 2, 2018

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 2, 2018

Member

cc @roblourens Could you test this to determine whether this actually fixes it?

Member

retep998 commented Feb 2, 2018

cc @roblourens Could you test this to determine whether this actually fixes it?

Show outdated Hide outdated src/libstd/sys/windows/fs.rs
@@ -38,8 +38,9 @@ pub struct FileAttr {
}
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug)]
pub enum FileType {
Dir, File, SymlinkFile, SymlinkDir, ReparsePoint, MountPoint,
pub stuct FileType {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Feb 2, 2018

Member

struct

retep998 added some commits Feb 2, 2018

Rewrite remove_dir_all to be correct
The fact that this had to be rewritten does not bode well

@retep998 retep998 changed the title from This is what FileType on Windows should ideally be. to This is what peak FileType on Windows looks like Feb 2, 2018

@retep998 retep998 changed the title from This is what peak FileType on Windows looks like to This is the ideal FileType on Windows. You may not like it, but this is what peak performance looks like. Feb 2, 2018

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Feb 2, 2018

Member

r? @BurntSushi (some random libs team member)

Member

Mark-Simulacrum commented Feb 2, 2018

r? @BurntSushi (some random libs team member)

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 2, 2018

Member

Closed by accident because of github silliness.

Member

retep998 commented Feb 2, 2018

Closed by accident because of github silliness.

@retep998 retep998 reopened this Feb 2, 2018

BurntSushi added a commit to BurntSushi/ripgrep that referenced this pull request Feb 2, 2018

windows: fix OneDrive traversals
This commit fixes a bug on Windows where directory traversals were
completely broken when attempting to scan OneDrive directories that use
the "file on demand" strategy.

The specific problem was that Rust's standard library treats OneDrive
directories as reparse points instead of directories, which causes
methods like `FileType::is_file` and `FileType::is_dir` to always return
false, even when retrieved via methods like `metadata` that purport to
follow symbolic links.

We fix this by peppering our code with checks on the underlying file
attributes exposed by Windows. We consider an entry a directory if and
only if the directory bit is set on the attributes. We are careful to
make sure that the code remains the same on non-Windows platforms.

Note that we also bump the dependency on `walkdir`, which contains a
similar fix for its traversals.

This bug is recorded upstream:
rust-lang/rust#46484

Upstream also has a pending PR:
rust-lang/rust#47956

Fixes #705
@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 2, 2018

Member

cc @rust-lang/libs

This PR fixes a pretty bad bug that impacts any program on Windows that interacts with the file system and OneDrive directories. This was originally reported to me via the VS code folks on ripgrep: BurntSushi/ripgrep#705 --- Basically, the issue here is that OneDrive directories are reported as reparse points, and std currently does not interpret those as directories (or files) even when instructed to follow symbolic links. You can see an example in @roblourens's bug report here: #46484

I managed to work around this bug, but the work-around is gnarly. You basically need to replace all is_file/is_dir calls with a custom Windows specific wrapper. You can see the fix in walkdir here and the fix in ripgrep here.

With all that said, I am by no means a Windows person, so I don't actually know whether this PR is correct. But, it is consistent with my own fix to the problem (which was helpfully informed by @retep998 and @roblourens).

Member

BurntSushi commented Feb 2, 2018

cc @rust-lang/libs

This PR fixes a pretty bad bug that impacts any program on Windows that interacts with the file system and OneDrive directories. This was originally reported to me via the VS code folks on ripgrep: BurntSushi/ripgrep#705 --- Basically, the issue here is that OneDrive directories are reported as reparse points, and std currently does not interpret those as directories (or files) even when instructed to follow symbolic links. You can see an example in @roblourens's bug report here: #46484

I managed to work around this bug, but the work-around is gnarly. You basically need to replace all is_file/is_dir calls with a custom Windows specific wrapper. You can see the fix in walkdir here and the fix in ripgrep here.

With all that said, I am by no means a Windows person, so I don't actually know whether this PR is correct. But, it is consistent with my own fix to the problem (which was helpfully informed by @retep998 and @roblourens).

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 2, 2018

Member

I'm running short on time, but I'm now doubting this solution. I think @retep998 mentioned this to me on IRC, but the problem here is that this change will cause any entry that is a directory to return true for is_dir regardless of whether it is a symbolic link or not. This kind of breaks the various contracts we have setup in std::fs where we often distinguish between following symbolic links (e.g., fs::metadata) and not following symbolic links (e.g., fs::symlink_metadata).

This logic change, for example, ended up requiring changes like this in walkdir where I had assumed that is_dir being true meant that is_symlink would be false: BurntSushi/walkdir@2863f28

Member

BurntSushi commented Feb 2, 2018

I'm running short on time, but I'm now doubting this solution. I think @retep998 mentioned this to me on IRC, but the problem here is that this change will cause any entry that is a directory to return true for is_dir regardless of whether it is a symbolic link or not. This kind of breaks the various contracts we have setup in std::fs where we often distinguish between following symbolic links (e.g., fs::metadata) and not following symbolic links (e.g., fs::symlink_metadata).

This logic change, for example, ended up requiring changes like this in walkdir where I had assumed that is_dir being true meant that is_symlink would be false: BurntSushi/walkdir@2863f28

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 2, 2018

Member

The problem is a fundamental difference in how Windows and Unix handle these sorts of attributes. On Windows whether a given path is a file or a directory is a completely separate question from whether it is a reparse point, whereas on Unix files directories and symbolic links are all mutually exclusive. Normally this wouldn't matter so much because you'd typically only have to deal with something being a reparse point or symbolic link when you specifically ask for symlink_metadata instead of metadata, but as it turns out sometimes metadata itself will be told by the OS that something is a reparse point. This leads to a conundrum where the programmer is unable to determine whether the path is a directory or a file because is_dir ignores that when it is a reparse point. However to fix this would require FileType to report that a path is a file or directory even when it is also a reparse point which violates the current assumptions of users of the API in that being a file or directory is mutually exclusive with being a symbolic link/reparse point.

Member

retep998 commented Feb 2, 2018

The problem is a fundamental difference in how Windows and Unix handle these sorts of attributes. On Windows whether a given path is a file or a directory is a completely separate question from whether it is a reparse point, whereas on Unix files directories and symbolic links are all mutually exclusive. Normally this wouldn't matter so much because you'd typically only have to deal with something being a reparse point or symbolic link when you specifically ask for symlink_metadata instead of metadata, but as it turns out sometimes metadata itself will be told by the OS that something is a reparse point. This leads to a conundrum where the programmer is unable to determine whether the path is a directory or a file because is_dir ignores that when it is a reparse point. However to fix this would require FileType to report that a path is a file or directory even when it is also a reparse point which violates the current assumptions of users of the API in that being a file or directory is mutually exclusive with being a symbolic link/reparse point.

@pitdicker

This comment has been minimized.

Show comment
Hide comment
@pitdicker

pitdicker Feb 2, 2018

Contributor

I like how matching the attributes is folded into the functions here.

But I don't agree this is the best solution, as it pretty heavily changes the existing behavior. Yes, it matches better with the Windows model of things, but makes it harder to write something cross-platform.

What do you think about only changing the final match condition, so that it reports a reparse point as a directory or file if the reparse point is not recognized as a symlink or directory junction? I think that would also solve the bug with OneDrive, and not be a breaking change.

Contributor

pitdicker commented Feb 2, 2018

I like how matching the attributes is folded into the functions here.

But I don't agree this is the best solution, as it pretty heavily changes the existing behavior. Yes, it matches better with the Windows model of things, but makes it harder to write something cross-platform.

What do you think about only changing the final match condition, so that it reports a reparse point as a directory or file if the reparse point is not recognized as a symlink or directory junction? I think that would also solve the bug with OneDrive, and not be a breaking change.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 2, 2018

Member

@pitdicker Interesting. I also believe that would solve the OneDrive bug.

Member

BurntSushi commented Feb 2, 2018

@pitdicker Interesting. I also believe that would solve the OneDrive bug.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 2, 2018

Member

What if there is a reparse point which does act like a symbolic link, but is not recognized as a symlink or directory junction? What would happen if someone called remove_dir_all on that? It would see the reparse point as just a directory and not a symlink, causing it to recursively delete the contents instead of just deleting the reparse point itself, which would be absolutely horrible!

I think the best we could do is change how we construct FileType based on whether we are getting metadata or link_metadata. For the former we would ignore if something is a reparse point and just report whether it is a directory or a file, but for the latter we would report all reparse points as symlinks.

Member

retep998 commented Feb 2, 2018

What if there is a reparse point which does act like a symbolic link, but is not recognized as a symlink or directory junction? What would happen if someone called remove_dir_all on that? It would see the reparse point as just a directory and not a symlink, causing it to recursively delete the contents instead of just deleting the reparse point itself, which would be absolutely horrible!

I think the best we could do is change how we construct FileType based on whether we are getting metadata or link_metadata. For the former we would ignore if something is a reparse point and just report whether it is a directory or a file, but for the latter we would report all reparse points as symlinks.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 2, 2018

Member

@retep998 So I still feel like we're missing something here. If a reparse point is considered a symlink, and OneDrive directories are reparse points, then that means we're treating OneDrive directories as a symlinks. But that doesn't seem right to me? They aren't necessarily symlinked to something else on the file system. For example, if I were removing a OneDrive directory, I would expect it to recursively descend and remove it where as I wouldn't expect that behavior with symlinks (unless the routine was specifically instructed to follow symlinks).

What else are reparse points used for on Windows?

Member

BurntSushi commented Feb 2, 2018

@retep998 So I still feel like we're missing something here. If a reparse point is considered a symlink, and OneDrive directories are reparse points, then that means we're treating OneDrive directories as a symlinks. But that doesn't seem right to me? They aren't necessarily symlinked to something else on the file system. For example, if I were removing a OneDrive directory, I would expect it to recursively descend and remove it where as I wouldn't expect that behavior with symlinks (unless the routine was specifically instructed to follow symlinks).

What else are reparse points used for on Windows?

@mattico

This comment has been minimized.

Show comment
Hide comment
@mattico

mattico Feb 3, 2018

Contributor

(This is the first time I've read about this but I believe this summary is correct)

@BurntSushi
When NTFS attempts to open a file or directory with a reparse point it looks up a driver based on its reparse tag, and that driver handles the request. These drivers can do anything and can be registered by user programs, so a reparse point doesn't have any defined semantics per se.

The good news is that the surrogate bit in the reparse tag specifies if "the file or directory represents another named entity in the system". That sounds like symlink behavior to me!

I made a program to test this behavior. Here are the results on my computer with Files On-Demand enabled:

PATH REPARSE TAG IsReparseTagNameSurrogate
D:\symlink 10100000000000000000000000001100 true
D:\symlink_dir 10100000000000000000000000001100 true
D:\junction 10100000000000000000000000000011 true
D:\mountpoint 10100000000000000000000000000011 true
D:\OneDrive 10010000000000000111000000011010 false
D:\hardlink hard links do not use reparse points and they don't act like symlinks

So the flag works as I'd expect. Therefore, I'd argue the correct way to implement is_symlink is using this flag:

fn IsReparseTagNameSurrogate(reparse_tag: ULONG) -> bool {
    (reparse_tag & 0x20000000) == 0x20000000
}
pub fn is_symlink(&self) -> bool {
    self.is_reparse_point() && IsReparseTagNameSurrogate(self.reparse_tag)
}

Edit:
Funny thing is, none of the code I found while researching does this correctly. Even Microsoft did it wrong and has an issue open to fix it.

Contributor

mattico commented Feb 3, 2018

(This is the first time I've read about this but I believe this summary is correct)

@BurntSushi
When NTFS attempts to open a file or directory with a reparse point it looks up a driver based on its reparse tag, and that driver handles the request. These drivers can do anything and can be registered by user programs, so a reparse point doesn't have any defined semantics per se.

The good news is that the surrogate bit in the reparse tag specifies if "the file or directory represents another named entity in the system". That sounds like symlink behavior to me!

I made a program to test this behavior. Here are the results on my computer with Files On-Demand enabled:

PATH REPARSE TAG IsReparseTagNameSurrogate
D:\symlink 10100000000000000000000000001100 true
D:\symlink_dir 10100000000000000000000000001100 true
D:\junction 10100000000000000000000000000011 true
D:\mountpoint 10100000000000000000000000000011 true
D:\OneDrive 10010000000000000111000000011010 false
D:\hardlink hard links do not use reparse points and they don't act like symlinks

So the flag works as I'd expect. Therefore, I'd argue the correct way to implement is_symlink is using this flag:

fn IsReparseTagNameSurrogate(reparse_tag: ULONG) -> bool {
    (reparse_tag & 0x20000000) == 0x20000000
}
pub fn is_symlink(&self) -> bool {
    self.is_reparse_point() && IsReparseTagNameSurrogate(self.reparse_tag)
}

Edit:
Funny thing is, none of the code I found while researching does this correctly. Even Microsoft did it wrong and has an issue open to fix it.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 3, 2018

Member

@mattico Nice find! If that is indeed the case then perhaps that is the secret sauce we need to handle reparse points. @roblourens @retep998 What do you think?

Member

BurntSushi commented Feb 3, 2018

@mattico Nice find! If that is indeed the case then perhaps that is the secret sauce we need to handle reparse points. @roblourens @retep998 What do you think?

@pitdicker

This comment has been minimized.

Show comment
Hide comment
@pitdicker

pitdicker Feb 3, 2018

Contributor

@mattico Great!

"Name surrogate bit. If this bit is set, the file or directory represents another named entity in the system." That is not something I directly understand... But I checked this against the list of known reparse tags from Microsoft, and it seems to work as we need (https://github.com/retep998/winapi-rs/blob/d8290fc406c8e390101dd225f7ec506848fc86ce/src/um/winnt.rs#L3943)

Contributor

pitdicker commented Feb 3, 2018

@mattico Great!

"Name surrogate bit. If this bit is set, the file or directory represents another named entity in the system." That is not something I directly understand... But I checked this against the list of known reparse tags from Microsoft, and it seems to work as we need (https://github.com/retep998/winapi-rs/blob/d8290fc406c8e390101dd225f7ec506848fc86ce/src/um/winnt.rs#L3943)

Go back to files directories and symlinks being mutually exclusive
Be smarter about what a symlink is however
@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 3, 2018

Member

The current public api for FileType does not allow you to tell whether a symlink is a directory or a file, while remove_dir_all depends on that information. Might we want to add a FileTypeExt for Windows which exposes this information?

Member

retep998 commented Feb 3, 2018

The current public api for FileType does not allow you to tell whether a symlink is a directory or a file, while remove_dir_all depends on that information. Might we want to add a FileTypeExt for Windows which exposes this information?

@mattico mattico referenced this pull request in syncthing/syncthing Feb 3, 2018

Closed

Information about OneDrive reparse point problems #4728

@roblourens

This comment has been minimized.

Show comment
Hide comment
@roblourens

roblourens Feb 5, 2018

This sounds great to me, my only concern is whether is_symlink is too broad? It looks like it will return true for a mounted folder. They have similar behavior to symlinks, but I don't know whether they should be considered symlinks or not.

It looks like you could make a more specific check against the reparse tag: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363940(v=vs.85).aspx to check for IO_REPARSE_TAG_SYMLINK vs IO_REPARSE_TAG_MOUNT_POINT

This sounds great to me, my only concern is whether is_symlink is too broad? It looks like it will return true for a mounted folder. They have similar behavior to symlinks, but I don't know whether they should be considered symlinks or not.

It looks like you could make a more specific check against the reparse tag: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363940(v=vs.85).aspx to check for IO_REPARSE_TAG_SYMLINK vs IO_REPARSE_TAG_MOUNT_POINT

@roblourens

This comment has been minimized.

Show comment
Hide comment
@roblourens

roblourens Feb 5, 2018

Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 5, 2018

Member

This sounds great to me, my only concern is whether is_symlink is too broad? It looks like it will return true for a mounted folder. They have similar behavior to symlinks, but I don't know whether they should be considered symlinks or not.

The existing behavior is for mount points to be considered symlinks, and is fully intentional. If you want to delete a directory, for example, you need to know whether it is a symlink so you can tell whether you need to delete the contents first or just delete the directory itself. Anything which behaves like a symlink in that regard (which mount points definitely do) should be categorized as a symlink, and the way to tell whether a reparse point acts like a symlink is to call IsReparseTagNameSurrogate.

Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

On Linux this information does not exist. When you get the attributes of a file to tell whether it is a file or a directory or a symlink, those options are mutually exclusive. Therefore making a cross platform API does not make any sense.

Member

retep998 commented Feb 5, 2018

This sounds great to me, my only concern is whether is_symlink is too broad? It looks like it will return true for a mounted folder. They have similar behavior to symlinks, but I don't know whether they should be considered symlinks or not.

The existing behavior is for mount points to be considered symlinks, and is fully intentional. If you want to delete a directory, for example, you need to know whether it is a symlink so you can tell whether you need to delete the contents first or just delete the directory itself. Anything which behaves like a symlink in that regard (which mount points definitely do) should be categorized as a symlink, and the way to tell whether a reparse point acts like a symlink is to call IsReparseTagNameSurrogate.

Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

On Linux this information does not exist. When you get the attributes of a file to tell whether it is a file or a directory or a symlink, those options are mutually exclusive. Therefore making a cross platform API does not make any sense.

@mattico

This comment has been minimized.

Show comment
Hide comment
@mattico

mattico Feb 5, 2018

Contributor

@roblourens I disagree that mount points need to be handled differently than symlinks.

Mount Points are basically identical to Junctions, except that mount points can point into volumes that aren't mounted with a drive letter. The documentation indicates that they were added to aid users who are using more volumes than can be assigned to 26 drive letters. I'd argue, therefore, that the purpose of Mount Points is not to be "a junction except that applications will pretend it is a regular directory", but to be "a junction, but I ran out of drive letters".

Second, I believe that IsReparseTagNameSurrogate is the way that Windows applications are supposed to determine what to do with reparse tags, in general. It's what del /s uses to determine if it should recurse into a reparse tag to delete, for example. Matching specific reparse tags opens up a can of worms where new reparse tags don't work as expected (that's the whole reason we're having this discussion!) For example: if Microsoft adds "mount point except it works for ReFs volumes", whoops need to update is_symlink again. If third-party software adds a mount-point-like reparse tag, that would also unexpectedly work differently than regular mount points.

(anyway, I'm not super concerned about this because NTFS mount points are rarely used so as long as it's documented... meh)


Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

Doesn't std::fs::FileType work for this? type.is_dir() && type.is_symlink(), or does lstat never set S_IFDIR when S_IFLNK is true? I notice that the current windows implementation doesn't work like this, but I'm a bit surprised about that.

Note that, despite being pub, is_symlink_dir is not exposed to users since they cannot access the sys modules. I'd agree that there should be a std::os::windows::fs::FileTypeExt, so users who care can tell the difference between a directory symbolic link and a junction or whatever.

Contributor

mattico commented Feb 5, 2018

@roblourens I disagree that mount points need to be handled differently than symlinks.

Mount Points are basically identical to Junctions, except that mount points can point into volumes that aren't mounted with a drive letter. The documentation indicates that they were added to aid users who are using more volumes than can be assigned to 26 drive letters. I'd argue, therefore, that the purpose of Mount Points is not to be "a junction except that applications will pretend it is a regular directory", but to be "a junction, but I ran out of drive letters".

Second, I believe that IsReparseTagNameSurrogate is the way that Windows applications are supposed to determine what to do with reparse tags, in general. It's what del /s uses to determine if it should recurse into a reparse tag to delete, for example. Matching specific reparse tags opens up a can of worms where new reparse tags don't work as expected (that's the whole reason we're having this discussion!) For example: if Microsoft adds "mount point except it works for ReFs volumes", whoops need to update is_symlink again. If third-party software adds a mount-point-like reparse tag, that would also unexpectedly work differently than regular mount points.

(anyway, I'm not super concerned about this because NTFS mount points are rarely used so as long as it's documented... meh)


Also if we should be able to differentiate between a symlinked file and directory, seems like that be available cross platform, not just for windows.

Doesn't std::fs::FileType work for this? type.is_dir() && type.is_symlink(), or does lstat never set S_IFDIR when S_IFLNK is true? I notice that the current windows implementation doesn't work like this, but I'm a bit surprised about that.

Note that, despite being pub, is_symlink_dir is not exposed to users since they cannot access the sys modules. I'd agree that there should be a std::os::windows::fs::FileTypeExt, so users who care can tell the difference between a directory symbolic link and a junction or whatever.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 5, 2018

Member

Doesn't std::fs::FileType work for this? type.is_dir() && type.is_symlink(), or does lstat never set S_IFDIR when S_IFLNK is true?

According to posix, to get the file type, you can mask st_mode with S_IFMT and then compare that for equality with the file type constants. This is an inherently mutually exclusive operation therefore anything that uses posix style lstat is unable to tell whether a symlink is a file or a directory,

In addition, such posixy platforms have a single function for creating symlinks. There is no distinction between a directory symlink and a file symlink in the API (unlike Windows where CreateSymbolicLink can take SYMBOLIC_LINK_FLAG_DIRECTORY), nor does the programmer ever have to care about there being a distinction because deleting a symlink is done the same way regardless of what it points to (unlike Windows where you need to know the difference to know whether to call RemoveDirectory or DeleteFile).

Member

retep998 commented Feb 5, 2018

Doesn't std::fs::FileType work for this? type.is_dir() && type.is_symlink(), or does lstat never set S_IFDIR when S_IFLNK is true?

According to posix, to get the file type, you can mask st_mode with S_IFMT and then compare that for equality with the file type constants. This is an inherently mutually exclusive operation therefore anything that uses posix style lstat is unable to tell whether a symlink is a file or a directory,

In addition, such posixy platforms have a single function for creating symlinks. There is no distinction between a directory symlink and a file symlink in the API (unlike Windows where CreateSymbolicLink can take SYMBOLIC_LINK_FLAG_DIRECTORY), nor does the programmer ever have to care about there being a distinction because deleting a symlink is done the same way regardless of what it points to (unlike Windows where you need to know the difference to know whether to call RemoveDirectory or DeleteFile).

@shepmaster

This comment has been minimized.

Show comment
Hide comment
@shepmaster

shepmaster Feb 9, 2018

Member

Ping from triage, @BurntSushi !

Member

shepmaster commented Feb 9, 2018

Ping from triage, @BurntSushi !

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 11, 2018

Member

@rust-lang/libs I'd like to nominate this PR for merging, since I think it is in a good place.

Allow me to briefly summarize this PR. An issue was reported recently on Windows where is_dir returned false when it should have returned true. In particular, this was on a directory in a OneDrive mount on Windows. Windows records this as a reparse point, but FileType on Windows won't interpret it as a directory. The key here is that reparse points aren't necessarily symbolic links, but as @mattico points out, it is possible to tell whether a reparse point is a symlink or not.

So what does this PR do? It effectively rewrites FileType detection logic on Windows to be consistent with the aforementioned facts. In particular, it changes things like directory detection to correspond to, "has the directory bit set, and isn't a symlink," which seems correct to me. The specific behavior this changes is that is_dir will now return true in more cases than it previously did, but it should never return false where it previously returned true. While this is a change in behavior, I see this as an important bug fix. In particular, working around this problem with current Rust is particularly gruesome. You can see the requisite patches to both walkdir and ripgrep. If this PR is merged, then I believe I can drop both of those patches and everyone will be happy again.

The reason why this issue is coming up now is because it seems like there was a relatively recent change to cause OneDrive's "files on demand" feature to use reparse points. You can see that others are also grappling with a similar issue:

In light of the above I think we are on very solid ground when we say that the existing behavior is incorrect and should be fixed. Applications and libraries can work around it (like I have), but it is a monstrous pain. :-)

Member

BurntSushi commented Feb 11, 2018

@rust-lang/libs I'd like to nominate this PR for merging, since I think it is in a good place.

Allow me to briefly summarize this PR. An issue was reported recently on Windows where is_dir returned false when it should have returned true. In particular, this was on a directory in a OneDrive mount on Windows. Windows records this as a reparse point, but FileType on Windows won't interpret it as a directory. The key here is that reparse points aren't necessarily symbolic links, but as @mattico points out, it is possible to tell whether a reparse point is a symlink or not.

So what does this PR do? It effectively rewrites FileType detection logic on Windows to be consistent with the aforementioned facts. In particular, it changes things like directory detection to correspond to, "has the directory bit set, and isn't a symlink," which seems correct to me. The specific behavior this changes is that is_dir will now return true in more cases than it previously did, but it should never return false where it previously returned true. While this is a change in behavior, I see this as an important bug fix. In particular, working around this problem with current Rust is particularly gruesome. You can see the requisite patches to both walkdir and ripgrep. If this PR is merged, then I believe I can drop both of those patches and everyone will be happy again.

The reason why this issue is coming up now is because it seems like there was a relatively recent change to cause OneDrive's "files on demand" feature to use reparse points. You can see that others are also grappling with a similar issue:

In light of the above I think we are on very solid ground when we say that the existing behavior is incorrect and should be fixed. Applications and libraries can work around it (like I have), but it is a monstrous pain. :-)

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Feb 11, 2018

Member

Thanks for the extensive report @BurntSushi! Out of curiosity, do you have an example of situations that don't want this sort of logic? Are there examples programs which will break as a result of this change, for example relying on the fact that is_dir is returning false?

I'm inclined to merge given the fallout here, but would just like to understand the potential fallout here!

Member

alexcrichton commented Feb 11, 2018

Thanks for the extensive report @BurntSushi! Out of curiosity, do you have an example of situations that don't want this sort of logic? Are there examples programs which will break as a result of this change, for example relying on the fact that is_dir is returning false?

I'm inclined to merge given the fallout here, but would just like to understand the potential fallout here!

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 11, 2018

Member

@alexcrichton I actually can't think of any case (and haven't explicitly come across one), but I think someone more familiar with Windows should probably answer that. cc @roblourens @retep998 @mattico

Member

BurntSushi commented Feb 11, 2018

@alexcrichton I actually can't think of any case (and haven't explicitly come across one), but I think someone more familiar with Windows should probably answer that. cc @roblourens @retep998 @mattico

@mattico

This comment has been minimized.

Show comment
Hide comment
@mattico

mattico Feb 11, 2018

Contributor

(I'm going to equate "use of a non-surrogate reparse point" with "the OneDrive folder" for expediency)

I expect these functions are mostly used for recursive directory traversal. Users may have noticed that the OneDrive folder is special and isn't affected by these methods. I can imagine some cases where this behavior could be relied upon:

  • A user wants {backup, disk usage accounting} software to ignore the OneDrive folder since it's already backed up or since the files may not be stored locally on the computer.
  • A user notices that the OneDrive folder doesn't get deleted by their recursive delete function and starts to rely on this behavior. (fs::remove_dir_all currently stops with an error once it gets to the OneDrive folder, but I suppose an outside implementation may keep going)

The deletion scenario is slightly scary because it opens the door to data loss. I don't think it's an issue in practice because:

  1. OneDrive files are "in the cloud" so you only lose files which haven't finished syncing (and some bandwidth/time).
  2. The OneDrive folder is stored in the user's home directory (by default), so I doubt it would be caught up in a remove_dir_all unless you're deliberately or accidentally deleting all of a user's data.
  3. Users expect the OneDrive folder to behave like a normal directory until it doesn't. If they're recursively deleting they must have been willing to delete the OneDrive folder, at least by accident.
Contributor

mattico commented Feb 11, 2018

(I'm going to equate "use of a non-surrogate reparse point" with "the OneDrive folder" for expediency)

I expect these functions are mostly used for recursive directory traversal. Users may have noticed that the OneDrive folder is special and isn't affected by these methods. I can imagine some cases where this behavior could be relied upon:

  • A user wants {backup, disk usage accounting} software to ignore the OneDrive folder since it's already backed up or since the files may not be stored locally on the computer.
  • A user notices that the OneDrive folder doesn't get deleted by their recursive delete function and starts to rely on this behavior. (fs::remove_dir_all currently stops with an error once it gets to the OneDrive folder, but I suppose an outside implementation may keep going)

The deletion scenario is slightly scary because it opens the door to data loss. I don't think it's an issue in practice because:

  1. OneDrive files are "in the cloud" so you only lose files which haven't finished syncing (and some bandwidth/time).
  2. The OneDrive folder is stored in the user's home directory (by default), so I doubt it would be caught up in a remove_dir_all unless you're deliberately or accidentally deleting all of a user's data.
  3. Users expect the OneDrive folder to behave like a normal directory until it doesn't. If they're recursively deleting they must have been willing to delete the OneDrive folder, at least by accident.
@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 11, 2018

Member

I added an unstable FileTypeExt which allows users to finally determine whether a symbolic link is also a directory or a file. FileType will always return false for both is_dir and is_file when is_symlink is true, yet users need to know this critical piece of information. The fact that remove_dir_all needs this information should be reason enough to expose it.

This might need an FCP to be added, along with a tracking issue.

Member

retep998 commented Feb 11, 2018

I added an unstable FileTypeExt which allows users to finally determine whether a symbolic link is also a directory or a file. FileType will always return false for both is_dir and is_file when is_symlink is true, yet users need to know this critical piece of information. The fact that remove_dir_all needs this information should be reason enough to expose it.

This might need an FCP to be added, along with a tracking issue.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 11, 2018

Member

@retep998 If there is any controversy around FileTypeExt (although it seems reasonable to me, and it is an unstable API), then it might be wise to split that out into a separate PR. :-)

Member

BurntSushi commented Feb 11, 2018

@retep998 If there is any controversy around FileTypeExt (although it seems reasonable to me, and it is an unstable API), then it might be wise to split that out into a separate PR. :-)

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Feb 11, 2018

Member

A user wants {backup, disk usage accounting} software to ignore the OneDrive folder since it's already backed up or since the files may not be stored locally on the computer.

In this case I argue software should be paying attention to attributes like FILE_ATTRIBUTE_OFFLINE or FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS or FILE_ATTRIBUTE_RECALL_ON_OPEN.

Member

retep998 commented Feb 11, 2018

A user wants {backup, disk usage accounting} software to ignore the OneDrive folder since it's already backed up or since the files may not be stored locally on the computer.

In this case I argue software should be paying attention to attributes like FILE_ATTRIBUTE_OFFLINE or FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS or FILE_ATTRIBUTE_RECALL_ON_OPEN.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Feb 12, 2018

Member

Ok thanks for the info @mattico! This seems fine to land by me

Member

alexcrichton commented Feb 12, 2018

Ok thanks for the info @mattico! This seems fine to land by me

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 12, 2018

Member

@alexcrichton I think an unstable API was added to this since I last commented. Should we get tickboxes from the std team to land it? (I think the new unstable API is probably uncontroversial and is a Windows specific API.)

Member

BurntSushi commented Feb 12, 2018

@alexcrichton I think an unstable API was added to this since I last commented. Should we get tickboxes from the std team to land it? (I think the new unstable API is probably uncontroversial and is a Windows specific API.)

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Feb 12, 2018

Member

@BurntSushi nah unstable APIs can land at any time, I think this may want to hold off on waiting for beta to branch (to help give it maximal time to bake on nightly), but other than that I think this can be r+'d whenever.

Member

alexcrichton commented Feb 12, 2018

@BurntSushi nah unstable APIs can land at any time, I think this may want to hold off on waiting for beta to branch (to help give it maximal time to bake on nightly), but other than that I think this can be r+'d whenever.

@kennytm kennytm added relnotes and removed relnotes labels Feb 12, 2018

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
Member

BurntSushi commented Feb 16, 2018

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Feb 16, 2018

Contributor

📌 Commit 9269e83 has been approved by BurntSushi

Contributor

bors commented Feb 16, 2018

📌 Commit 9269e83 has been approved by BurntSushi

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Feb 16, 2018

Member

Thanks again @retep998! 🎉

Member

BurntSushi commented Feb 16, 2018

Thanks again @retep998! 🎉

@ssylvan ssylvan referenced this pull request in gohugoio/hugo Feb 16, 2018

Open

Support OneDrive by properly handling symlinks #4079

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Feb 17, 2018

Contributor

⌛️ Testing commit 9269e83 with merge b298607...

Contributor

bors commented Feb 17, 2018

⌛️ Testing commit 9269e83 with merge b298607...

bors added a commit that referenced this pull request Feb 17, 2018

Auto merge of #47956 - retep998:is-nibbles, r=BurntSushi
This is the ideal FileType on Windows. You may not like it, but this is what peak performance looks like.

Theoretically this would fix #46484

The current iteration of this PR should not cause existing code to break, but instead merely improves handling around reparse points. Specifically...

* Reparse points are considered to be symbolic links if they have the name surrogate bit set. Name surrogates are reparse points that effectively act like symbolic links, redirecting you to a different directory/file. By checking for this bit instead of specific tags, we become much more general in our handling of reparse points, including those added by third parties.
* If something is a reparse point but does not have the name surrogate bit set, then we ignore the fact that it is a reparse point because it is actually a file or directory directly there, despite having additional handling by drivers due to the reparse point.
* For everything which is not a symbolic link (including non-surrogate reparse points) we report whether it is a directory or a file based on the presence of the directory attribute bit.
* Notably this still preserves invariant that when `is_symlink` returns `true`, both `is_dir` and `is_file` will return `false`. The potential for breakage was far too high.
* Adds an unstable `FileTypeExt` to allow users to determine whether a symbolic link is a directory or a file, since `FileType` by design is incapable of reporting this information.
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Feb 17, 2018

Contributor

☀️ Test successful - status-appveyor, status-travis
Approved by: BurntSushi
Pushing b298607 to master...

Contributor

bors commented Feb 17, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: BurntSushi
Pushing b298607 to master...

@bors bors merged commit 9269e83 into rust-lang:master Feb 17, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment