-
Notifications
You must be signed in to change notification settings - Fork 18k
os: RemoveAll fails to remove deeply nested directory tree, triggers OOM Killer #47390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Profile of os.RemoveAll() with this patch: patch
After 1K iterations of removeAllFrom, i.e. for 1000 directories deep, around 10MB of allocations were used. This increased in proportion to the depth of directory tree so that one million iterations used a thousand times more memory, several GB. On a system with 8GB of RAM, OOM Killer was therefore triggered well before the removal of the directory tree could be completed. In comparison, coreutils’ |
Here's a standalone test case (go test rm_test.go -cpuprofile=cpu.prof -memprofile=mem.prof):
|
Interestingly, that test case takes time O(n²) on my Mac, but all the time is in syscall (calling into Mac libc), which suggests that perhaps Mac libc implements openat/unlinkat by reconstructing the original paths instead of actually using the fds directly. Oops. |
On my Linux workstation, getconf PATH_MAX says 4096, but every file system I've tried rejects mkdir after 31 255-long directory elements, or just under 8kB. If I change 255 to 1, then I can make more directories: 1<<11 works fine. I'm not sure what sysctl etc needs to be frobbed to get access to million-element path names. |
Change https://golang.org/cl/337449 mentions this issue: |
As for the actual bug, the implementation of os.RemoveAll already uses openat(2) and unlinkat(2), so there's no concern with the long paths directly (at least on Linux). The problem appears to be that the implementation keeps the parent directory open while it removes the children, and each open directory has a directory-entry-reading buffer in its dirInfo.buf, of size 8kB. If you are removing a tree a million entries deep with an 8kB buffer per depth level, that's going to be 8 GB. Presumably the kernel also has some state related to each open directory. If that was 1kB then you're still looking at 1 GB of kernel state for a million open file descriptors. An obvious answer is to close the parent while removing its children. But you need the parent's fd for the openat of each child. Closing the fd between children would (recursively) require rewalking the entire path to get the fd for the next child, which leads to exactly the quadratic behavior that opnat(2)/unlinkat(2) was introduced to, well, remove. We can't make the per-directory buffer appreciably smaller, because we get lost directory entries if it is too small - see #24015. It does look like we can adjust directory reading to drop the buffers between reads most of the time, though. I've sent CL 337449 to do that. Even that CL is not perfect, though: each level of the walk is holding all the strings from that directory read, which could be up to 8 kB or so of data. So you could still provoke 8 GB of memory usage, though not in the test case. I don't see any way to avoid having those strings, since we need to read at least 5760 bytes per #24015, and we can't throw away all but the first entry that is returned. Those strings are already in memory, so dropping the directory-reading buffers is at least a factor of two improvement in that worst case. Of course, then there is the problem of a two-million-deep directory tree. I tried to look at what coreutils rm does, and the answer appears to be “use some library that is not in the coreutils repo,” so I stopped there. Generally speaking, cleaning up an arbitrarily deep directory tree is going to be arbitrarily expensive. I'm not convinced this is a battle that the implementation of os.RemoveAll can win, although the memory factor of at least 2 (more in common cases) is probably worth claiming. We could submit my change for Go 1.18, but I think it's a bit too risky, with hardly any real benefit, for Go 1.17. |
The GNU rm program uses the fts library that is part of gnulib. The source code is at https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/fts.c;h=e6603f40e76fe86707ddf600ab7f818e3f63ccc1;hb=HEAD. The code is pretty complicated. As far as I can tell, as invoked by the rm command, it keeps a small cache of parent directories open as it descends. If the cache fills up, it closes and later reopens the parent directory for use with |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
create.go:
remove.go:
Create a very deep directory tree (~2.5 million subdirectories)
Attempt to remove them with os.RemoveAll()
Observe exhaustion of system memory, causing oom-killer to kill the process. Directory tree still remains:
What did you expect to see?
remove.go successfully remove the target deep directory tree, without exhausting system memory
What did you see instead?
remove.go causes a sharp increase in memory usage, causing OOM killer to kill remove.go before it can complete
The text was updated successfully, but these errors were encountered: