-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find: infinite loop when performing large amount of inplace file operations on btrfs #306
Comments
On Devuan Barista: apt-get install btrfs-progs Completed just fine? (Do you have a test that DOESN'T include a random number generator in hope it reproduces? I'm assuming this is a weird corner case in btrfs hashing behavior, possibly my 4.19.0-16-amd64 kernel from devuan hasn't introduced the btrfs-specific behavior yet? I can install arch in a vm, but "didn't see it" could once again mean openssl rand produced different output...) |
P.S. Last time I saw a filesystem-specific behavior like this, the problem was actually the block size of the underling readdir/getdents system call differing between the two environments. One granularity was triggering the filesystem bug, the other wasn't. It's not that gnu and busybox(?) had different algorithmic behavior, they were just fetching directory entry batches into different buffer sizes... |
Thanks for the response! Make sure find is chained with inplace file operations...
p.s. the btrfs .img was stored on ext4 partition in my test The issue is irrelevant to random file names, openssl was just used to generate unique filenames. I was actually considered this as a btrfs bug, but I'm not sure and I don't know how to describe it when reporting to upstream since I'm not aware of the internal implementation of |
On 12/20/21 10:16 AM, MoetaYuko wrote:
Thanks for the response!
Make sure find is chained with inplace file operations...
Your example is reproducible on my Arch PC (5.15.10 kernel) with the following
modifications:
1. increase 500 to 2000
2. add a simple inplace operation such as sed: |~/toybox-x86_64 find sub -type
f -print0 | xargs -0 -n1 sed -i s/a/b/|. The one-line perl script from Linux
build system can be adopted as well.
If you're modifying the directory while traversing it possibly btrfs is
re-hashing the contentents in a way that causes the ongoing directory traversal
to lose its place?
p.s. the btrfs .img was stored on ext4 partition in my test
Completed again for me. I'm guessing kernel version is involved. Lemme try it on
a 5.15 kernel...
The issue is irrelevant to random file names, openssl was just used to generate
unique filenames.
I was actually considered this as a btrfs bug, but I'm not sure and I don't know
how to describe it when reporting to upstream since I'm not aware of the
internal implementation of |find|
The internal implementation of toybox find is 730 lines of C:
https://github.com/landley/toybox/blob/master/toys/posix/find.c
Plus 210 lines of directory tree plumbing:
https://github.com/landley/toybox/blob/master/lib/dirtree.c
And the dirtree stuff largely boils down to:
dirtree(name, callback) {
struct dirent *entry;
struct DIR *dir;
struct stat;
dir = opendir(name);
while ((entry = readdir(dir))) {
stat(entry->d_name, &st)
callback(...)
if (S_ISDIR(st.st_mode)) dirtree(entry->d_name, callback);
}
}
The find callback is just figuring out what to do with a given entry. Neither of
these has the ability to "back up", it can only recurse endlessly if readdir()
keeps returning more entries. (Or in theory if recursing into crosslinked
subdirectories, but you don't even have any subbdirectories here.)
|
Just keep increasing the count. 2000 completed on another machine with Debian 11 and 5.10 kernel, but reproduced after increasing to 5000 |
I built a mkroot system and documented it here: Then did: mount /dev/sda /mnt && cd /mnt It completed. I went from 1000 to 7000 by thousands, then 14000, then 25000, then 50000... Is there a kernel .config thing that's needed? I note btrfs has a number of kernel options, all I've switched on is its Anterior Cruciate Ligament support... |
Weird...Reproduced on 3 different machines w/ distinct OS |
not reproducible, but appears to be kernel bug rather than toybox issue. |
I have the same bug!
I tried different variants, for example: # works fine, but has limit on shell command
find $cpio_dir -type f -exec perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' {} +
# redirect to sort -z works fine too
find $cpio_dir -type f -print0 | sort -z |
xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'
# stuck in infinity loop
find $cpio_dir -type f -exec perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' {} \;
# remove parallel 8 threads, stuck in infinity loop
find $cpio_dir -type f -print0 |
xargs -0 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' So there are 2 cases with exec and pipeline have the same issue and in-place cause problem in toybox find version. # use sed with in-place and dummy pattern, stuck in infinity loop
find $cpio_dir -type f -print0 |
xargs -0 -P8 -n1 sed -i 's/__DUMMY__//g;' |
I configure my PC environment and got 100% bug all the time:
If folder exceed 988 files - |
Hmmm, reproduces under Ubuntu 20.04 with btrfs you say... |
On the host I did: sudo debootstrap chimaera newroot http://pkgmaster.devuan.org/merged And in two different tabs, I ran: (I note the ubuntu 20.04.4 ISO I had lying around has kernel "5.13.0-30-generic", not .14 or 1.5.) When xfce finished booting in its window I clicked the "try" button to make the installer go away and opened a terminal: sudo /bin/bash And it completed? I also did ./toybox find /mnt/newroot -type f -exec echo {} ; | wc -l which said 7362 and completed much faster. I take it the btrfs bug wasn't present in the 5.13 kernel and got introduced more recently? You didn't say what directory you ran your perl invocation in? It's possible that if something is writing to the btrfs directory, the hashing goes janky and the directory traversal cursor loses its place and backs up, and if the invocation keeps modifying the directory as it's traversing the traversal never ends? That would obviously be a filesystem bug, but might explain what's going on? (And why some things you shell out to have that problem, but other things don't. Find itself is behaving the same either way, you say in your bug report the behavior varies based on what child processes find launches, which... isn't find's fault?) |
Hi @landley , it doesn't matter perl or sed, it cause error if util is doing in-place insertion. I made a simple script #!/bin/bash
set -e
cd /work
rm -fr tmp && mkdir tmp && cd tmp
for i in {1..2000}; do touch $i; done
# simple dump
echo '#!/bin/bash' > inplace.sh
echo 'echo FILE = $*' >> inplace.sh
echo 'sed -i "s/__DUMMY_ABC_//g" $*' >> inplace.sh
chmod a+x inplace.sh
/path/to/toybox find . -type f -exec ./inplace.sh {} \; Could according |
Ok, I found even easy command just move file from X to tmp and back from tmp into X echo '#!/bin/bash' > inplace.sh
echo 'mv -v "$*" tmp' >> inplace.sh
echo 'mv -v tmp "$*"' >> inplace.sh
chmod a+x inplace.sh |
I checked GNU findutil source and I found the special property named as munged_filename and it stores original_filename, later to check visits. |
Hi @landley could you please re-open issue? I reproduced the same bug on Arch Linux with kernel I made the test script to reproduce this issue #!/bin/bash
set -e
CMD_FIND=0
CMD_MOUNT=0
TMPDIR="$(mktemp -d)"
echo "Using temporary dir = $TMPDIR"
function cleanup() {
set +e
[[ $CMD_FIND -ne 0 ]] && while [ ! -z "$(jobs -p)" ]; do kill $(jobs -p); sleep 1; done
[[ $CMD_MOUNT -ne 0 ]] && cd "$TMPDIR" && sudo umount "$TMPDIR"/mount
cd / && rm -fr "$TMPDIR"
}
trap cleanup EXIT INT HUP
mkdir -p "$TMPDIR" && cd "$TMPDIR"
git clone https://github.com/landley/toybox
cd toybox && ./configure && make defconfig toybox && cd ..
truncate -s 128MB btrfs.img
mkfs.btrfs btrfs.img
mkdir -p mount
sudo mount -o loop btrfs.img mount
CMD_MOUNT=1
sudo chmod -R o+rwx mount
cd mount
for i in {1..2000}; do touch $i; done
echo '#!/bin/bash' > ../inplace.sh
echo 'mv -v "$*" tmp' >> ../inplace.sh
echo 'mv -v tmp "$*"' >> ../inplace.sh
chmod a+x ../inplace.sh
timeout --kill-after=60 60 ../toybox/toybox find . -type f -exec ../inplace.sh {} \; && echo FINISH &
CMD_FIND=1
wait |
So you've confirmed that when btrfs re-hashes a directory, existing directory traversal loses its place. Let's see what git log fs/btrfs in the kernel source has to say... that there are 2154 commits to that directory since 5.13. (I used --no-merges even!) Not the world's most stable filesystem, is it? (fs/fat has 42 over the same period.) Hmmm, possibly btrfs fixed this bug in torvalds/linux@8bb6898da627 although the "monotically increasing counter" mentioned in torvalds/linux@04fc7d5123f2 sounds like "iterate over directory, delete and recreate each item you find, the new item is appended live to the end of your search results even though it didn't exist when you started the readdir(), so this NEVER ENDS" may be a design flaw instead of an implementation flaw. (Commit torvalds/linux@723df2bcc9e1 lets me know this is complicated in btrfs...) Right, could you try running this in a directory full of files on your btrfs and see if it ever completes?
Because it completed just fine on my ext4 partition... |
Hi @landley Yes! It gets stuck in an infinite loop in all cases: direct FS on SSD or IMG FILE mounted as loop (ssd, hdd, tmpfs) |
Not true. The commits were merged in 6.2 and 6.1, but the issue persists in 6.4.3 |
Posix doesn't forbid it ("undefined") but Linux handled this reliably until btrfs. (I very vaguely recall a bugfix about this back under 2.3 development, which is why I knew Linux did get this right because at one point they cared.) Alas, in their old age the Linux developers have stopped caring about a bunch of things they used to in their old age. I can add a workaround option to find to read the contents of the directory into memory and then iterate over the memory version (internally it's the DIRTREE_BREADTH flag), but it changes the directory traversal order (all files and directories at this level, then descend into the subdirectories afterwards), and there's no flag for it in the debian version's man page I can find... |
By the way, the easy workaround is probably something like (untested): find . -print0 > file.idx I could do buffering inside find to workaround btrfs, but you can just do the buffering yourself. (Still might be worth sending the test case to the btrfs devs to get them to explicitly say "we don't care what this breaks, it's on you to work around our behavior" on record and all. I've done some head scratching about "maybe -newer would... no, rename doesn't change the timestamp" and so on, but a directory traversal in btrfs is never guaranteed to end, and that's kinda creepy. I wonder what kind of denial of service attacks you could build around that in other programs...) |
@landley, thanks a lot for your investigation! I'm really frustrated by this issue with btrfs, because Google is using and supplying built toybox as default host toolchain to build Android Kernel for other vendors like NXP and Qualcomm too. I see a lot of benefits in shipping already built binaries as it eliminates a lot of side effects, unlike when toybox was built for the host from supplied sources from external/toybox from AOSP itself, by the way toybox source in AOSP are older than the built binary version from Google (0.8.7 source vs 0.8.9 binary) I will try to create a ticket about it on kernel.org to update gen_kheaders.sh and add |
I made a ticket on kernel.org - https://bugzilla.kernel.org/show_bug.cgi?id=217681 |
by definition, that ain't true... see https://android.googlesource.com/platform/external/toybox/+/refs/heads/main/toys.h#142 (but, more likely "don't forget to |
Sorry, but no one use master branch as release delivery from Google
even android-s branches have the latest 0.8.6 |
sure (ignoring the fact that S is two years old), but S also has the source to 0.8.6. you can't take the source from one branch and the binary from another and complain they don't match... of course they don't! |
Let me know what the kernel guys say. If the answer is "no", I can add a cache-and-release workaround to the toybox find source. I want this to work for everybody. It seems pretty clearly like a filesystem bug, but if you're doing a build on that filesystem it needs to work. "Not my bug" doesn't mean I get to ignore it, because it exists and has to work. I'm wondering if some sort of inode cacheing is sufficient? If I see the same dev+inode a second time, it's stuttering and I can stop... Except will other filesystems do some sort of "return files in alphabetical order, you renamed bcd to xyz so it shows up a second time but there's a zzz after it we haven't seen yet. Which means I don't include the entry with the same dev/inode I've already seen, but I don't get to stop there. The problem is if I keep reading forever there's no guarantee it terminates on btrfs, and any stop before EOF is basically a heuristic? The cache-and-release approach means we won't perturb it, but we can't guarantee nobody ELSE does (hence denial of service). And if the directory is changing due to other processes editing the filesystem, endless updates aside: the larger the delay between the readdir() and acting upon it, the more likely the entry might have disappeared out from under us. That's a more classic race condition having a wider window to operate on, which is why I didn't want to change the default behavior, and instead add a new option. Maybe -breadth or some such? |
Closed #448 as a dupe of this, and the new action there is I poked the btrfs mailing list about it, ala https://www.spinics.net/lists/linux-btrfs/msg138634.html If they recommend a workaround I can add it to find, but nothing I come up with on my own seems reliable in avoiding denial-of-service? (But maybe just "good enough" is the best you can hope with on btrfs? Hope nobody else using the filesystem acts maliciously?) |
Filipe Manana submitted a kernel fix https://lore.kernel.org/linux-btrfs/c9ceb0e15d92d0634600603b38965d9b6d986b6d.1691923900.git.fdmanana@suse.com/ and I tested it against 6.4 (it applied with offset and worked fine). Alas spinics.net is giving me an "ERR_ADDRESS_UNREACHABLE" right now but if it works for you, checking the replies to the above link will probably find it, including my mkroot test procedure. When a kernel comes out with this in it, remind me to close this. |
Hi @landley, well done! Thanks a lot for your help and effort!
Of course, I will wait a kernel release. |
Merged into the btrfs maintainer's tree, we just missed the -rc6 pull but there might be an -rc7 pull before the release: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next (No, I don't know why it's in the log twice.) |
Nice to have this fix in mainline but if it doesn't go into LTS kernels, then this issue should be open until the latest LTS in the list will be 6.4 😅 |
The kernel fix has been merged for 6.5 torvalds/linux@9b378f6 and backported to 6.4 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=5951039da4a87f2759f928fa921f9d166213e763 |
Hi @landley, good news, there is backported fix for kernel 5.15 -> https://lore.kernel.org/linux-btrfs/cover.1706183427.git.fdmanana@suse.com/ |
I've been cc'd on the emails about it. :) |
Background
I was building the Linux kernel in a shell environment (in fact the AOSP build system) where most utils are provided by toybox rather than GNU ones. The build infinitely stuck at a postprocessing step: https://github.com/torvalds/linux/blob/a7904a538933c525096ca2ccde1e60d0ee62c08e/kernel/gen_kheaders.sh#L80
After some investigation, I found that it's relevant to toybox's
find
utils.MWE
Reproducible on a
btrfs
partition, but worked fine on ext4 in my experiments.Replacing
./toybox-x86_64 find
with GNU findutils fixed the issue.Environment
The text was updated successfully, but these errors were encountered: