Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security fix for remove_dir_all may have introduced a regression how filesystem loops are handled #93129

Open
the8472 opened this issue Jan 20, 2022 · 12 comments
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@the8472
Copy link
Member

the8472 commented Jan 20, 2022

Normally traversing very deep directory trees (e.g. created by bind mounts or FUSE) will result in ENAMETOOLONG or ELOOP.
The fix in #93110, #93112 and #93111 switches to using -at syscalls and keeping file descriptors for each child directory that's being descended into. But this way we loose loop detection (as far as I can tell), which means the process will run out of stack space or file descriptors instead of returning a proper error.

#88731 was already open which does contain proper loop handling.

@the8472 the8472 added the C-bug Category: This is a bug. label Jan 20, 2022
@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

As I wrote in the PR in my opinion there should be no loops. Bind mounts on Linux and directory hardlinks (available for Macos only) are not allowed to loop and we are not descending into symlinked dirs because we open them with openat(..., O_NOFOLLOW).

But of course running out of file descriptors is currently a problem. I have a follow-up draft branch ready which fixes that differently from #88731 (going up using openat(dirf, "..", O_NOFOLLOW) + inode comparison going back up because using iterated openat() turned out to be to slow for very deep directories. With this PR I can delete one million nested dirs in 16 seconds just like "rm -r".

@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

I would love to merge that with your work somehow so we don't run into conflicts or one of us incorporates the work of the other. Either way is fine for me.

I will open the draft PR tomorrow so you can have a look and then we can decide on how to proceed.

@the8472
Copy link
Member Author

the8472 commented Jan 20, 2022

But of course running out of file descriptors is currently a problem. I have a follow-up draft branch ready which fixes that differently from #88731 (going up using openat(dirf, "..", O_NOFOLLOW) + inode comparison going back up because using iterated openat() turned out to be to slow for very deep directories.

My approach is keeping a fixed-size stack of some directory fds open so it go back up and recurse into siblings most of the time and only has to re-walk path occasionally. Using .. + a fallback does seem less complex though.
But I also use a loop instead of recursion to avoid stack overflows which means I have to mutably update some stuff which does make the code more gnarly.

@the8472
Copy link
Member Author

the8472 commented Jan 20, 2022

I have tried to reproduce loops with bind mounts and overlayfs and it looks I cannot. That leaves the possibility of FUSE or maybe mounted network filesystems containing them, which should make this an even smaller edge-case.

@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

I am using an iterative version too (@cuviper did the first conversion). I added the fixed deque inspired by your work and then replaced the iterated openat() for reopening fds that have fallen out of the cache when going back up with openat(dirf, "..", O_NOFOLLOW). Because with the iterated openat() it took much, much longer in the adversarial case of just very deeply (1<<20) nested directories. It is essentially quadratic in fs operations in the depth with a fixed fd cache because going back up every size-of-cache times you have to refill the cache.

@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

A HashSet with visited inodes or (device, inode) pairs could easily be added for cycle detection.

@the8472
Copy link
Member Author

the8472 commented Jan 20, 2022

I assume there's still a fallback in case the .. case fails? That'll need deep path handling. But it sounds like your followup PR should cover most issues already that I had in mind and it also applies to more platforms. I can implement the long path handling on top of that instead.

@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

Currently there is no fallback for going back up via .., though I still have the code working in a previous commit and could incorporate it. In what scenarios do you expect that to fail? Bind mounts?

@the8472
Copy link
Member Author

the8472 commented Jan 20, 2022

A parent directory getting moved while we're somewhere deeper in the tree, then it won't match the original parent.

@hkratz
Copy link
Contributor

hkratz commented Jan 20, 2022

Ah OK. If the inode comparison fails my current implementation returns an error, just as with many other concurrent modification races (added files/directories at the right time cause ENOTEMPTY going up, deleted files/directories cause ENOENT on openat() or unlinkat()).

@the8472
Copy link
Member Author

the8472 commented Jan 20, 2022

Good point, if it gets moved then unlinking would fail anyway after we go back up. Then a fallback isn't necessary.

@cuviper
Copy link
Member

cuviper commented Jan 20, 2022

which means the process will run out of stack space or file descriptors instead of returning a proper error.

Running out of file descriptors should result in a proper error with EMFILE. Running out of stack is unfortunately a crash/abort, but we found that with default process limits, FDs should run out first.

@ChrisDenton ChrisDenton added the needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. label Jul 16, 2023
@fmease fmease added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-io Area: std::io, std::fs, std::net and std::path and removed needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. labels Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: std::io, std::fs, std::net and std::path C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants