find: infinite loop when performing large amount of inplace file operations on btrfs #306

moetayuko · 2021-12-20T05:59:29Z

Background

I was building the Linux kernel in a shell environment (in fact the AOSP build system) where most utils are provided by toybox rather than GNU ones. The build infinitely stuck at a postprocessing step: https://github.com/torvalds/linux/blob/a7904a538933c525096ca2ccde1e60d0ee62c08e/kernel/gen_kheaders.sh#L80
After some investigation, I found that it's relevant to toybox's find utils.

MWE

Reproducible on a btrfs partition, but worked fine on ext4 in my experiments.

mkdir testdir && cd testdir
# only happens when the directory contains a large amount of files
# 500 was sufficient to repro on my Arch PC, but need to increase to 3000 on another Debian server
for i in {1..500}
    touch $(openssl rand -hex 12)  # generate empty test files with random name
cd ..
./toybox-x86_64 find testdir -type f -print0 | xargs -0 -n1 sed -i s/a/b/  # it hangs forever after this

Replacing ./toybox-x86_64 find with GNU findutils fixed the issue.

Environment

Toybox 0.8.6 downloaded from http://landley.net/toybox/bin
Arch Linux with 5.15.10-zen1-1-zen kernel (500 files)
Debian 11 with 5.10.0-9-amd64 kernel (3000 files)

The text was updated successfully, but these errors were encountered:

landley · 2021-12-20T14:51:34Z

On Devuan Barista:

apt-get install btrfs-progs
truncate -s 1g btrfs.img
mkfs.btrfs btrfs.img
mkdir sub
mount -t btrfs btrfs.img sub
cd sub
for i in {1..500}; do touch xxxxxxxx.$i; done
cd ..
toybox find sub -type f -print0

Completed just fine? (Do you have a test that DOESN'T include a random number generator in hope it reproduces? I'm assuming this is a weird corner case in btrfs hashing behavior, possibly my 4.19.0-16-amd64 kernel from devuan hasn't introduced the btrfs-specific behavior yet? I can install arch in a vm, but "didn't see it" could once again mean openssl rand produced different output...)

landley · 2021-12-20T14:56:46Z

P.S. Last time I saw a filesystem-specific behavior like this, the problem was actually the block size of the underling readdir/getdents system call differing between the two environments. One granularity was triggering the filesystem bug, the other wasn't. It's not that gnu and busybox(?) had different algorithmic behavior, they were just fetching directory entry batches into different buffer sizes...

moetayuko · 2021-12-20T16:16:47Z

Thanks for the response!

Make sure find is chained with inplace file operations...
Your example is reproducible on my Arch PC (5.15.10 kernel) with the following modifications:

increase 500 to 2000
add a simple inplace operation such as sed: ~/toybox-x86_64 find sub -type f -print0 | xargs -0 -n1 sed -i s/a/b/. The one-line perl script from Linux build system can be adopted as well.

p.s. the btrfs .img was stored on ext4 partition in my test

The issue is irrelevant to random file names, openssl was just used to generate unique filenames.

I was actually considered this as a btrfs bug, but I'm not sure and I don't know how to describe it when reporting to upstream since I'm not aware of the internal implementation of find

landley · 2021-12-21T12:16:33Z

On 12/20/21 10:16 AM, MoetaYuko wrote: Thanks for the response! Make sure find is chained with inplace file operations... Your example is reproducible on my Arch PC (5.15.10 kernel) with the following modifications: 1. increase 500 to 2000 2. add a simple inplace operation such as sed: |~/toybox-x86_64 find sub -type f -print0 | xargs -0 -n1 sed -i s/a/b/|. The one-line perl script from Linux build system can be adopted as well.

If you're modifying the directory while traversing it possibly btrfs is re-hashing the contentents in a way that causes the ongoing directory traversal to lose its place?

p.s. the btrfs .img was stored on ext4 partition in my test

Completed again for me. I'm guessing kernel version is involved. Lemme try it on a 5.15 kernel...

The issue is irrelevant to random file names, openssl was just used to generate unique filenames. I was actually considered this as a btrfs bug, but I'm not sure and I don't know how to describe it when reporting to upstream since I'm not aware of the internal implementation of |find|

The internal implementation of toybox find is 730 lines of C: https://github.com/landley/toybox/blob/master/toys/posix/find.c Plus 210 lines of directory tree plumbing: https://github.com/landley/toybox/blob/master/lib/dirtree.c And the dirtree stuff largely boils down to: dirtree(name, callback) { struct dirent *entry; struct DIR *dir; struct stat; dir = opendir(name); while ((entry = readdir(dir))) { stat(entry->d_name, &st) callback(...) if (S_ISDIR(st.st_mode)) dirtree(entry->d_name, callback); } } The find callback is just figuring out what to do with a given entry. Neither of these has the ability to "back up", it can only recurse endlessly if readdir() keeps returning more entries. (Or in theory if recursing into crosslinked subdirectories, but you don't even have any subbdirectories here.)

moetayuko · 2021-12-21T12:34:18Z

Completed again for me. I'm guessing kernel version is involved. Lemme try it on a 5.15 kernel...

Just keep increasing the count. 2000 completed on another machine with Debian 11 and 5.10 kernel, but reproduced after increasing to 5000

landley · 2021-12-21T17:33:16Z

I built a mkroot system and documented it here:
http://lists.landley.net/pipermail/toybox-landley.net/2021-December/012756.html

Then did:

mount /dev/sda /mnt && cd /mnt
echo -e '#!/bin/sh\necho -n "$1 "\nsed s/a/b/ $1' > /funky && chmod +x /funky
rm -rf sub && mkdir -p sub && touch sub/{1..50000} && find sub | xargs -n1 /funky && echo

It completed. I went from 1000 to 7000 by thousands, then 14000, then 25000, then 50000...

Is there a kernel .config thing that's needed? I note btrfs has a number of kernel options, all I've switched on is its Anterior Cruciate Ligament support...

moetayuko · 2021-12-22T04:40:02Z

Weird...Reproduced on 3 different machines w/ distinct OS
config for Arch w/ 5.15 kernel: http://fars.ee/2aMv
config for Ubuntu 20.04 w/ 5.11 kernel: http://fars.ee/-XeR
config for Debian 11 w/ 5.10 kernel: https://termbin.com/k89ov

enh-google · 2022-07-25T19:46:22Z

not reproducible, but appears to be kernel bug rather than toybox issue.

bmx666 · 2023-07-14T21:12:56Z

I have the same bug!

Ubuntu 20.04, kernel 5.14 or kernel 5.15
SSD with btrfs mounted to /work
Docker container based on Ubuntu 20.04 that binds /work/aosp as ~/working/aosp

I tried different variants, for example:

# works fine, but has limit on shell command
find $cpio_dir -type f -exec perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' {} +

# redirect to sort -z works fine too
find $cpio_dir -type f -print0 | sort -z |
	xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'

# stuck in infinity loop
find $cpio_dir -type f -exec perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' {} \;

# remove parallel 8 threads, stuck in infinity loop
find $cpio_dir -type f -print0 |
	xargs -0 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'

So there are 2 cases with exec and pipeline have the same issue and in-place cause problem in toybox find version.

# use sed with in-place and dummy pattern, stuck in infinity loop
find $cpio_dir -type f -print0 |
	xargs -0 -P8 -n1 sed -i 's/__DUMMY__//g;'

bmx666 · 2023-07-15T19:41:14Z

I configure my PC environment and got 100% bug all the time:

OS - Ubuntu 20.04, kernel 5.15.0-76-generic x86_64
SSD - INTEL SSDPEKNW020T8
/etc/fstab - UUID=XXX /work btrfs defaults,autodefrag 0 2
output of mount command - /dev/nvme0n1p1 on /work type btrfs (rw,relatime,ssd,space_cache,autodefrag,subvolid=5,subvol=/)

If folder exceed 988 files - toybox find command does two iterations and then exit with success, but if folder contain more then 1384 - then it stuck in infinity loop.

landley · 2023-07-17T10:37:17Z

Hmmm, reproduces under Ubuntu 20.04 with btrfs you say...

landley · 2023-07-17T11:27:27Z

On the host I did:

sudo debootstrap chimaera newroot http://pkgmaster.devuan.org/merged
sudo tar cvpJf newroot.tgz newroot
truncate -s 8g blah.img

And in two different tabs, I ran:
toybox netcat -s 127.0.0.1 -p 8888 -L toybox httpd .
kvm -m 1536 -cdrom xubuntu-20.04.4-desktop-amd64.iso -drive format=raw,file=blah.img -boot d

(I note the ubuntu 20.04.4 ISO I had lying around has kernel "5.13.0-30-generic", not .14 or 1.5.)

When xfce finished booting in its window I clicked the "try" button to make the installer go away and opened a terminal:

sudo /bin/bash
mkfs.btrfs /dev/?da
mount /dev/?da /mnt
cd /mnt
wget http://10.0.2.2:8888/newroot.tgz
tar xpvf newroot.tgz
apt install git build-essential
git clone https://github.com/landley/toybox
cd toybox
make defconfig toybox
./toybox find /mnt/newroot -type f -exec perl...

And it completed? I also did ./toybox find /mnt/newroot -type f -exec echo {} ; | wc -l which said 7362 and completed much faster.

I take it the btrfs bug wasn't present in the 5.13 kernel and got introduced more recently?

You didn't say what directory you ran your perl invocation in? It's possible that if something is writing to the btrfs directory, the hashing goes janky and the directory traversal cursor loses its place and backs up, and if the invocation keeps modifying the directory as it's traversing the traversal never ends? That would obviously be a filesystem bug, but might explain what's going on? (And why some things you shell out to have that problem, but other things don't. Find itself is behaving the same either way, you say in your bug report the behavior varies based on what child processes find launches, which... isn't find's fault?)

bmx666 · 2023-07-17T11:56:22Z

Hi @landley , it doesn't matter perl or sed, it cause error if util is doing in-place insertion.

I made a simple script

#!/bin/bash
set -e

cd /work
rm -fr tmp && mkdir tmp && cd tmp

for i in {1..2000}; do touch $i; done

# simple dump
echo '#!/bin/bash' > inplace.sh
echo 'echo FILE = $*' >> inplace.sh
echo 'sed -i "s/__DUMMY_ABC_//g" $*' >> inplace.sh
chmod a+x inplace.sh

/path/to/toybox find . -type f -exec ./inplace.sh {} \;

Could toybox find use temporary in-place file or it re-check somehow modified time or attributes of file because it was renamed and toybox decides to execute it again as new file?

according strace sed -i 's/a/b/g' /work/tmp/file it creates each time random file name like /work/tmp/sedc6yQuR then sed renames /work/tmp/sedc6yQuR into /work/tmp/file

bmx666 · 2023-07-17T12:15:01Z

Ok, I found even easy command just move file from X to tmp and back from tmp into X

echo '#!/bin/bash' > inplace.sh
echo 'mv -v "$*" tmp' >> inplace.sh
echo 'mv -v tmp "$*"' >> inplace.sh
chmod a+x inplace.sh

bmx666 · 2023-07-17T12:27:52Z

I checked GNU findutil source and I found the special property named as munged_filename and it stores original_filename, later to check visits.

bmx666 · 2023-07-18T08:46:25Z

Hi @landley could you please re-open issue?

I reproduced the same bug on Arch Linux with kernel 6.1.38-2-lts and 6.4.3-arch1-2

I made the test script to reproduce this issue

#!/bin/bash
set -e

CMD_FIND=0
CMD_MOUNT=0

TMPDIR="$(mktemp -d)"
echo "Using temporary dir = $TMPDIR"

function cleanup() {
  set +e
  [[ $CMD_FIND -ne 0 ]] && while [ ! -z "$(jobs -p)" ]; do kill $(jobs -p); sleep 1; done
  [[ $CMD_MOUNT -ne 0 ]] && cd "$TMPDIR" && sudo umount "$TMPDIR"/mount
  cd / && rm -fr "$TMPDIR"
}

trap cleanup EXIT INT HUP

mkdir -p "$TMPDIR" && cd "$TMPDIR"

git clone https://github.com/landley/toybox
cd toybox && ./configure && make defconfig toybox && cd ..

truncate -s 128MB btrfs.img
mkfs.btrfs btrfs.img

mkdir -p mount
sudo mount -o loop btrfs.img mount
CMD_MOUNT=1
sudo chmod -R o+rwx mount
cd mount

for i in {1..2000}; do touch $i; done

echo '#!/bin/bash' > ../inplace.sh
echo 'mv -v "$*" tmp' >> ../inplace.sh
echo 'mv -v tmp "$*"' >> ../inplace.sh
chmod a+x ../inplace.sh

timeout --kill-after=60 60 ../toybox/toybox find . -type f -exec ../inplace.sh {} \; && echo FINISH &
CMD_FIND=1
wait

landley · 2023-07-18T09:41:23Z

So you've confirmed that when btrfs re-hashes a directory, existing directory traversal loses its place.

Let's see what git log fs/btrfs in the kernel source has to say... that there are 2154 commits to that directory since 5.13. (I used --no-merges even!) Not the world's most stable filesystem, is it? (fs/fat has 42 over the same period.)

Hmmm, possibly btrfs fixed this bug in torvalds/linux@8bb6898da627 although the "monotically increasing counter" mentioned in torvalds/linux@04fc7d5123f2 sounds like "iterate over directory, delete and recreate each item you find, the new item is appended live to the end of your search results even though it didn't exist when you started the readdir(), so this NEVER ENDS" may be a design flaw instead of an implementation flaw. (Commit torvalds/linux@723df2bcc9e1 lets me know this is complicated in btrfs...)

Right, could you try running this in a directory full of files on your btrfs and see if it ever completes?

#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  DIR *dir = opendir(".");
  struct dirent *dd;

  while ((dd = readdir(dir))) {
    printf("%s\n", dd->d_name);
    rename(dd->d_name, "TEMPFILE");
    rename("TEMPFILE", dd->d_name);
  }
  closedir(dir);
}

Because it completed just fine on my ext4 partition...

bmx666 · 2023-07-18T10:03:44Z

Hi @landley

Yes! It gets stuck in an infinite loop in all cases: direct FS on SSD or IMG FILE mounted as loop (ssd, hdd, tmpfs)

moetayuko · 2023-07-18T11:26:18Z

Hmmm, possibly btrfs fixed this bug in torvalds/linux@8bb6898da627 although the "monotically increasing counter" mentioned in torvalds/linux@04fc7d5123f2 sounds like "iterate over directory, delete and recreate each item you find, the new item is appended live to the end of your search results even though it didn't exist when you started the readdir(), so this NEVER ENDS" may be a design flaw instead of an implementation flaw.

Not true. The commits were merged in 6.2 and 6.1, but the issue persists in 6.4.3

landley · 2023-07-18T19:19:19Z

Posix doesn't forbid it ("undefined") but Linux handled this reliably until btrfs. (I very vaguely recall a bugfix about this back under 2.3 development, which is why I knew Linux did get this right because at one point they cared.)

Alas, in their old age the Linux developers have stopped caring about a bunch of things they used to in their old age. I can add a workaround option to find to read the contents of the directory into memory and then iterate over the memory version (internally it's the DIRTREE_BREADTH flag), but it changes the directory traversal order (all files and directories at this level, then descend into the subdirectories afterwards), and there's no flag for it in the debian version's man page I can find...

landley · 2023-07-18T19:32:23Z

By the way, the easy workaround is probably something like (untested):

find . -print0 > file.idx
xargs -0 -n1 bash -c ' mv "$1" tmp; mv tmp "$1" ' < file.idx
rm file.idx

I could do buffering inside find to workaround btrfs, but you can just do the buffering yourself.

(Still might be worth sending the test case to the btrfs devs to get them to explicitly say "we don't care what this breaks, it's on you to work around our behavior" on record and all. I've done some head scratching about "maybe -newer would... no, rename doesn't change the timestamp" and so on, but a directory traversal in btrfs is never guaranteed to end, and that's kinda creepy. I wonder what kind of denial of service attacks you could build around that in other programs...)

bmx666 · 2023-07-18T20:37:31Z

@landley, thanks a lot for your investigation! I'm really frustrated by this issue with btrfs, because Google is using and supplying built toybox as default host toolchain to build Android Kernel for other vendors like NXP and Qualcomm too. I see a lot of benefits in shipping already built binaries as it eliminates a lot of side effects, unlike when toybox was built for the host from supplied sources from external/toybox from AOSP itself, by the way toybox source in AOSP are older than the built binary version from Google (0.8.7 source vs 0.8.9 binary)

I will try to create a ticket about it on kernel.org to update gen_kheaders.sh and add sort -z after find to avoid btrfs side effect, Also I will try to create ticket on Google Tracker about it and maybe Qualcomm.

bmx666 · 2023-07-18T21:04:52Z

I made a ticket on kernel.org - https://bugzilla.kernel.org/show_bug.cgi?id=217681
and on Google Bug Tracker - https://issuetracker.google.com/issues/291804253

enh-google · 2023-07-18T21:10:01Z

by the way toybox source in AOSP are older than the built binary version from Google (0.8.7 source vs 0.8.9 binary)

by definition, that ain't true... see https://android.googlesource.com/platform/external/toybox/+/refs/heads/main/toys.h#142 (but, more likely "don't forget to repo sync your AOSP tree!").

bmx666 · 2023-07-18T21:22:45Z

by definition, that ain't true... see https://android.googlesource.com/platform/external/toybox/+/refs/heads/main/toys.h#142 (but, more likely "don't forget to repo sync your AOSP tree!").

Sorry, but no one use master branch as release delivery from Google

for br in $(git branch -a | grep -- 'origin/android13'); do git show ${br}:toys.h | grep TOYBOX_VERSION; done | sort -V -u

#define TOYBOX_VERSION "0.8.6"TOYBOX_VENDOR

even android-s branches have the latest 0.8.6

enh-google · 2023-07-18T21:28:41Z

even android-s branches have the latest 0.8.6

sure (ignoring the fact that S is two years old), but S also has the source to 0.8.6. you can't take the source from one branch and the binary from another and complain they don't match... of course they don't!

landley · 2023-07-18T22:54:11Z

Let me know what the kernel guys say. If the answer is "no", I can add a cache-and-release workaround to the toybox find source.

I want this to work for everybody. It seems pretty clearly like a filesystem bug, but if you're doing a build on that filesystem it needs to work. "Not my bug" doesn't mean I get to ignore it, because it exists and has to work.

I'm wondering if some sort of inode cacheing is sufficient? If I see the same dev+inode a second time, it's stuttering and I can stop... Except will other filesystems do some sort of "return files in alphabetical order, you renamed bcd to xyz so it shows up a second time but there's a zzz after it we haven't seen yet. Which means I don't include the entry with the same dev/inode I've already seen, but I don't get to stop there.

The problem is if I keep reading forever there's no guarantee it terminates on btrfs, and any stop before EOF is basically a heuristic? The cache-and-release approach means we won't perturb it, but we can't guarantee nobody ELSE does (hence denial of service). And if the directory is changing due to other processes editing the filesystem, endless updates aside: the larger the delay between the readdir() and acting upon it, the more likely the entry might have disappeared out from under us. That's a more classic race condition having a wider window to operate on, which is why I didn't want to change the default behavior, and instead add a new option. Maybe -breadth or some such?

landley · 2023-08-12T23:35:40Z

Closed #448 as a dupe of this, and the new action there is I poked the btrfs mailing list about it, ala https://www.spinics.net/lists/linux-btrfs/msg138634.html

If they recommend a workaround I can add it to find, but nothing I come up with on my own seems reliable in avoiding denial-of-service? (But maybe just "good enough" is the best you can hope with on btrfs? Hope nobody else using the filesystem acts maliciously?)

landley · 2023-08-13T16:47:18Z

Filipe Manana submitted a kernel fix https://lore.kernel.org/linux-btrfs/c9ceb0e15d92d0634600603b38965d9b6d986b6d.1691923900.git.fdmanana@suse.com/ and I tested it against 6.4 (it applied with offset and worked fine).

Alas spinics.net is giving me an "ERR_ADDRESS_UNREACHABLE" right now but if it works for you, checking the replies to the above link will probably find it, including my mkroot test procedure.

When a kernel comes out with this in it, remind me to close this.

bmx666 · 2023-08-13T18:07:36Z

Hi @landley, well done! Thanks a lot for your help and effort!

When a kernel comes out with this in it, remind me to close this.

Of course, I will wait a kernel release.

landley · 2023-08-14T21:55:21Z

Merged into the btrfs maintainer's tree, we just missed the -rc6 pull but there might be an -rc7 pull before the release: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next

(No, I don't know why it's in the log twice.)

bmx666 · 2023-08-15T09:36:37Z

Nice to have this fix in mainline but if it doesn't go into LTS kernels, then this issue should be open until the latest LTS in the list will be 6.4 😅

landley · 2023-08-21T18:34:33Z

The kernel fix has been merged for 6.5 torvalds/linux@9b378f6 and backported to 6.4 https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=5951039da4a87f2759f928fa921f9d166213e763

bmx666 · 2024-01-25T23:16:01Z

Hi @landley, good news, there is backported fix for kernel 5.15 -> https://lore.kernel.org/linux-btrfs/cover.1706183427.git.fdmanana@suse.com/
I hope that Ubuntu 20.04 and Ubuntu 22.04 will receive these patches soon and this issue will remain as a part of history 😄

landley · 2024-01-26T10:58:58Z

I've been cc'd on the emails about it. :)

enh-google closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2022

landley reopened this Jul 18, 2023

enh-google mentioned this issue Aug 11, 2023

find on btrfs goes into infinite loop when path is long #448

Closed

landley closed this as completed Aug 21, 2023

find: infinite loop when performing large amount of inplace file operations on btrfs #306

find: infinite loop when performing large amount of inplace file operations on btrfs #306

Comments

moetayuko commented Dec 20, 2021 • edited Loading

Background

MWE

Environment

landley commented Dec 20, 2021

landley commented Dec 20, 2021

moetayuko commented Dec 20, 2021

landley commented Dec 21, 2021 via email

moetayuko commented Dec 21, 2021 • edited Loading

landley commented Dec 21, 2021

moetayuko commented Dec 22, 2021

enh-google commented Jul 25, 2022

bmx666 commented Jul 14, 2023 • edited Loading

bmx666 commented Jul 15, 2023

landley commented Jul 17, 2023

landley commented Jul 17, 2023

bmx666 commented Jul 17, 2023 • edited Loading

bmx666 commented Jul 17, 2023

bmx666 commented Jul 17, 2023

bmx666 commented Jul 18, 2023 • edited Loading

landley commented Jul 18, 2023

bmx666 commented Jul 18, 2023 • edited Loading

moetayuko commented Jul 18, 2023

landley commented Jul 18, 2023

landley commented Jul 18, 2023

bmx666 commented Jul 18, 2023

bmx666 commented Jul 18, 2023 • edited Loading

enh-google commented Jul 18, 2023

bmx666 commented Jul 18, 2023 • edited Loading

enh-google commented Jul 18, 2023

landley commented Jul 18, 2023

landley commented Aug 12, 2023

landley commented Aug 13, 2023

bmx666 commented Aug 13, 2023 • edited Loading

landley commented Aug 14, 2023

bmx666 commented Aug 15, 2023

landley commented Aug 21, 2023

bmx666 commented Jan 25, 2024

landley commented Jan 26, 2024

moetayuko commented Dec 20, 2021 •

edited

Loading

moetayuko commented Dec 21, 2021 •

edited

Loading

bmx666 commented Jul 14, 2023 •

edited

Loading

bmx666 commented Jul 17, 2023 •

edited

Loading

bmx666 commented Jul 18, 2023 •

edited

Loading

bmx666 commented Jul 18, 2023 •

edited

Loading

bmx666 commented Jul 18, 2023 •

edited

Loading

bmx666 commented Jul 18, 2023 •

edited

Loading

bmx666 commented Aug 13, 2023 •

edited

Loading