fd can get stuck when ran onto the whole FS from the root #288

Porkepix · 2018-04-23T15:21:13Z

The ran command is just fd foobar /. It gets stuck and never end. I sadly was unable to understand what's causing it.
I can investigate to give answers if you have some ideas about what else could I check.

Below is the partition's setup:

# lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda              8:0    0 223,6G  0 disk  
├─sda1           8:1    0   512M  0 part  /boot/efi
├─sda2           8:2    0   200M  0 part  
│ └─cryptboot  254:3    0   198M  0 crypt /boot
└─sda3           8:3    0 222,9G  0 part  
  └─lvm        254:0    0 222,9G  0 crypt 
    ├─vg0-swap 254:1    0     8G  0 lvm   [SWAP]
    └─vg0-root 254:2    0 214,9G  0 lvm   /

EDIT: Issue is reproducing everytime on this Archlinux setup. Couldn't reproduce it on another Archlinux with unencrypted system.

The text was updated successfully, but these errors were encountered:

sharkdp · 2018-04-23T15:58:15Z

I can reproduce this on my PC. This seems to be a permission problem with files in /proc and /sys.

Could you try to run

fd -E /proc -E /sys foobar /

where -E (--exclude) excludes these two folders?

(Note: you might have to exclude some other mount points as well)

Porkepix · 2018-04-23T16:13:40Z

Indeed this does work. Strange that my personal server doesn't have the same issue though.

Porkepix · 2018-04-23T16:16:05Z

Oh, and the issue was present even when running fd as root. So I'm not sure about the permission idea.

sharkdp · 2018-04-23T16:18:12Z

Oh, and the issue was present even when running fd as root. So I'm not sure about the permission idea.

I see, interesting. Something else must be going on inside /proc and/or /sys. If anybody has an idea on how to fix this, I'd be glad for any hints.

Porkepix · 2018-04-23T16:21:58Z

Excluding only proc is enough, so problem comes from something inside /proc.
343 items inside /proc makes a lot of things to test though.

Porkepix · 2018-04-23T16:26:08Z

Even more interesting.
fd -E "/proc/[0-9]*" foobar / is succeeding.
fd -E "/proc/[0-9]*" foobar /proc is failing.

sharkdp · 2018-04-23T17:04:24Z

Wait, this also seems to change from time to time. Right now, fd foobar / works perfectly fine on my machine(?!?).

sharkdp · 2018-04-23T18:05:22Z

I suspect this could be caused by infinite recursion within /proc (see here or here).

Quote:

Avoiding symbolic links may not be sufficient for avoiding "infinite" recursion in /proc. On the bright side, "infinite" is bounded by PATH_MAX...

Unfortunately, I cannot reproduce the problem at the moment. Could you try

fd --max-depth 50 foobar /

?

sharkdp · 2018-05-03T06:17:51Z

I'm going to close this (for now), as there is no further feedback. Feel free to comment here if this should be re-opened.

The workaround for now is to explicitly exclude /proc via

fd -E /proc ...

Porkepix · 2018-05-03T06:53:19Z

Sorry, didn't knew you was waiting for input: it was added with an edit, so I wasn't notified: original comment was not asking for input. Please post a new comment in such situation.

I tested what you asked for: didn't solved the issue, so I guess this isn't infinite recursion.
And while it hang like that, CPU is using every single core at 100%.

I guess we could then reopen it?

EDIT: Even if I'm the opener I can't reopen myself.

Porkepix · 2018-05-03T07:09:46Z

By the way, this was on Linux computers.
I'm currently running it on an old mac. I guess it's gonna take long (old, bad shaped HDD, no SSD in it), so I'll let it run, but I think that when I'll be back this evening computer will be brozen with out of memory: currently running for like 10 minutes, CPU is only around 10%, but memory constantly increase. It started low and currently is already at 140MB.

EDIT: CPU lowered to 1.5-2%, but it still continue.

EDIT2: Might not be the bug on the mac, 30 minutes later, it found a file with "foobar" in the name (!) at the 9th level of depth. So I'll see this evening if it ended correctly, the HDD is kinda slow, and probably fd is slowed down by I/O on disk rather than CPU like other SSD-using computers.
However, is it normal for the memory to constantly increase over the time?

Porkepix · 2018-05-03T17:51:50Z

So, not reproduced on the mac, it just took quite some time.

sharkdp · 2018-05-03T20:46:15Z

Thank you for investigating.

To summarize (correct me if I'm wrong):

fd can get stuck in an infinite (or very long?) loop when searching /.
This seems to be caused by something weird going on in /proc.
The problem is not 100% reproducible (I'm pretty sure I saw this on my Linux machine, but I can not reproduce it now - nor on three other Linux machines).
The problem can be prevented by excluding /proc or /proc/[0-9]*.

Porkepix · 2018-05-04T05:52:03Z

Well, that's pretty much it except that I detected a weird case in comment #288 (comment)
It looks like excluding /proc/[0-9]* and searching in / make it work but excluding the same thing and searching in /procmake it fail… which is pure nonsense to me. I was trying the find if there was a culprit file and stopped when I discovered that because that didn't make any sense.

The exact same computer I did the tests on yesterday morning now succeed on fd foobar /… I'll try to test the other one that had issue when coming to work.

Porkepix · 2018-05-04T08:21:24Z

Yeah, this seems really random. The laptop now do it again, the one that didn't 2H ago and the desktop don't anymore…
On the laptop it spawns 10 threads, and currently 4 of them are running a core at 100%. I've seen all my threads at 100% previous week.

Do you have any idea how to debug that? See where does it hang like that?

And this is kinda out of topic, but per comment #288 (comment) is it normal that on the mac case memory did increase during the whole search process?

sharkdp · 2018-05-04T11:40:59Z

Do you have any idea how to debug that? See where does it hang like that?

Maybe fd . /?

And this is kinda out of topic, but per comment #288 (comment) is it normal that on the mac case memory did increase during the whole search process?

I'm not sure, I will look into it.

Porkepix · 2018-05-04T12:30:57Z

Okay, so running fd . / on the desktop computer reliably fail on /proc/11364/task/11364/net .
That process is a Firefox one: clement 11364 0.0 0.0 0 0 ? Z 10:22 0:00 [firefox] <defunct>

The laptop's case however is more complicated. It seems to stop randomly on any file at some point. I can share some of them here if you think that's useful but I'm not sure about that. Maybe a doing single thread might help. Don't know why one computer is affected while the other isn't though.

Porkepix · 2018-05-04T12:32:56Z

Okay, actually I just RTFM'd a bit and used -j 1.

Got stuck on /proc/19848/net

ps aux give that:
clement 19848 0.0 0.0 0 0 ? Z 08:46 0:00 [pingsender] <defunct>

Porkepix · 2018-05-04T12:37:08Z

Oh, and surprisingly, using -j 1 still shows two processes. Both use 100% of the CPU.

And last thing but I've no idea about: sometimes when the bug triggers it "breaks" the terminal I used, terminology. ie. display was still refreshed, but I couldn't type anymore, even when I opened new windows. Only happened one time or two, and got fixed when I pkill'd the process from elsewhere.

EDIT: When it happens, ctrl + c don't do the trick, but seems to work if you wait a long time and retry after having done so a first time.

sharkdp · 2018-05-10T09:08:35Z

Thank you very much for your investigation!

It looks like you are up to something with the <defunct> processes! I can now reliably reproduce this by creating a zombie process on purpose:

Run fd foobar /proc => everything is fine
Copy the code from https://stackoverflow.com/a/25228579/704831 into a file called zombie.c
Compile it: gcc -o zombie zombie.c.
Create the zombie process: ./zombie.
Run ps -ef | grep defunct in a new terminal. It should show [zombie] <defunct>.
Run fd foobar /proc while zombie is still running. It will hang.
Stop zombie and run fd foobar /proc again => everything is fine.

Experiment 2:

Run fd foobar / => everything ok
Create the zombie process: ./zombie.
Get the PID of the defunct process via ps -ef | grep defunct
Call fd foobar / -E /proc/<PID> => everything ok
Call fd foobar / => it hangs

sharkdp · 2018-05-10T17:41:05Z

(see the linked ticked in ripgrep for some further debugging)

sharkdp · 2018-05-10T18:29:38Z

This might be a bug in the ReadDir iterator in Rusts standard library. The issue has been reported here: rust-lang/rust#50619

sharkdp · 2018-07-02T17:09:14Z

My pull request which fixes a bug in Rusts standard library has been merged (rust-lang/rust#50630). Now we have to wait for the next Rust release in order to fix this bug in fd.

sharkdp · 2018-08-03T18:20:17Z

Actually, we have to wait for Rust 1.29. I forgot about the beta stage. This bug is fixed when compiling fd with the current rustc 1.29.0-beta.1.

This upgrades the minimum required version of Rust to 1.29 in order to fix #288. See also: - Rust compiler bug ticket: rust-lang/rust#50619 - Rust compiler PR with the fix: rust-lang/rust#50630 closes #288

sharkdp · 2018-09-18T19:09:20Z

This is finally fixed! ✨

sharkdp · 2018-10-27T16:48:35Z

Fix released in fd-7.2.0.

monkeyt00l · 2022-12-11T13:08:37Z

This issue seems to still exist on version 8.6.0

Running "cd /proc && fd teststring" will make fd freeze and use up all cpu cores (until I cancelled the operation after a minute)

tavianator · 2022-12-11T17:02:24Z

Interesting, I can't reproduce that locally. fd -L hangs, but that's somewhat expected since /proc/self/root is a symlink to /. Do you possibly have an alias like fd=fd -L?

Otherwise, does it still reproduce with fd -j1? Can you paste some of the output of cd /proc && strace -f fd teststring once it hangs?

monkeyt00l · 2022-12-11T18:34:21Z

Running 'fd -j1' will list files until it stopped at 24677/map_files/
24677 was the process id of firefox (/usr/lib/firefox/firefox -contentproc)

I suspect this may have to do with the sandboxing features in firefox, since it also uses seccomp and blocks ptrace.
Other sandboxed apps behave similarly and for any process that uses seccomp, I usually need root privileges to do something like 'lsof -p '

However, this one seems to be unreliable to reproduce sicne it I could not reproduce it on a clean test install

The strace log shows a bunch of logs that repeats endlessly https://gist.github.com/monkeyt00l/30a2bdcd3544db3fbc896acc934bbc30

The pids in the log also seem to belong to firefox content processes

tmccombs · 2022-12-12T02:43:01Z

Hmm, from looking at the strace output I wonder if this is related to #1186 . It seems like in both cases fd is getting stuck in an infinite loop due to an error (in this case, a permission error).

tavianator · 2022-12-12T03:05:28Z

I hope so! EACCES is much easier to debug than EIO :)

tavianator · 2022-12-12T20:17:44Z

I can reproduce this now:

tavianator@graphene$ (sleep 1& (sleep 2 && fd . /proc/${!}/net --show-errors)& exec /bin/sleep 3)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
[fd error]: Invalid argument (os error 22)
...

That command creates a zombie process (the sleep 1) by replacing the shell with a command that won't wait() for its children (the exec /bin/sleep 3). In the meantime, we wait for the zombie to die and then run fd in its /proc/<PID>/net directory. For a zombie process, the open() will succeed but readdir() will fail with EINVAL. This is key to triggering the error.

Those with a long memory might remember the bug rust-lang/rust#50619, which @sharkdp filed and then fixed as a result of this bug.

Unfortunately, some silly programmer named @tavianator reintroduced the bug in rust-lang/rust#92778. Or to be a little more charitable, the original fix only applied to some platforms, of which Linux used to be one. But now Linux uses a different ReadDir implementation that is better in many ways but regressed this bug. Oops!

I guess I'll fix it in Rust, unless someone beats me to it.

tavianator · 2022-12-12T22:18:00Z

Here's the fix: rust-lang/rust#105638

sharkdp · 2022-12-15T20:42:09Z

So if I understand correctly, your PR landed in Rust 1.60. Which is precisely our MSRV right now 😄. So there's currently no way to fix this bug by compiling with an older version of rustc, unless we backport fd to 1.27 <= MSRV < 1.60. Which might not be a big deal maybe. And then we set the MaximumSRV to 1.59 for a while?

tmccombs · 2022-12-15T20:58:28Z

clap 4.0 has an MSRV of 1.60, so we'd probably have to downgrade clap to 3.x again if we did that.

tavianator · 2022-12-15T21:03:51Z

Alternatively we can work around it in ignore by adding

if result.is_err() {
    break;
}

here: https://github.com/BurntSushi/ripgrep/blob/515f120b5c2c7984c8dfa8bafeda42916457b0ba/crates/ignore/src/walk.rs#L1497-L1507

tavianator · 2022-12-28T18:14:16Z

Did that: BurntSushi/ripgrep#2378

sharkdp added bug help wanted labels Apr 23, 2018

sharkdp closed this as completed May 3, 2018

sharkdp reopened this May 3, 2018

sharkdp mentioned this issue May 10, 2018

Zombie processes cause rg --files to hang in /proc BurntSushi/ripgrep#916

Closed

sharkdp removed the help wanted label May 19, 2018

sharkdp mentioned this issue Sep 17, 2018

Upgrade minimum reqiured version of Rust to 1.29 #325

Merged

sharkdp closed this as completed in #325 Sep 18, 2018

sharkdp mentioned this issue Oct 26, 2018

fd hangs after accepting entry in fuzz finder #353

Closed

friday mentioned this issue Dec 31, 2018

Heavy resource usage with "--follow" flag and /sys #378

Closed

monkeyt00l mentioned this issue Dec 11, 2022

Searching in / and /proc freezes the application Canop/broot#639

Open

gabydd mentioned this issue Aug 5, 2023

transition to nucleo for fuzzy matching helix-editor/helix#7814

Merged

fd can get stuck when ran onto the whole FS from the root #288

fd can get stuck when ran onto the whole FS from the root #288

Comments

Porkepix commented Apr 23, 2018 • edited Loading

sharkdp commented Apr 23, 2018 • edited Loading

Porkepix commented Apr 23, 2018

Porkepix commented Apr 23, 2018

sharkdp commented Apr 23, 2018

Porkepix commented Apr 23, 2018

Porkepix commented Apr 23, 2018

sharkdp commented Apr 23, 2018

sharkdp commented Apr 23, 2018 • edited Loading

sharkdp commented May 3, 2018 • edited Loading

Porkepix commented May 3, 2018 • edited Loading

Porkepix commented May 3, 2018 • edited Loading

Porkepix commented May 3, 2018

sharkdp commented May 3, 2018

Porkepix commented May 4, 2018

Porkepix commented May 4, 2018

sharkdp commented May 4, 2018

Porkepix commented May 4, 2018 • edited Loading

Porkepix commented May 4, 2018

Porkepix commented May 4, 2018 • edited Loading

sharkdp commented May 10, 2018 • edited Loading

sharkdp commented May 10, 2018

sharkdp commented May 10, 2018

sharkdp commented Jul 2, 2018

sharkdp commented Aug 3, 2018

sharkdp commented Sep 18, 2018

sharkdp commented Oct 27, 2018

monkeyt00l commented Dec 11, 2022

tavianator commented Dec 11, 2022 • edited Loading

monkeyt00l commented Dec 11, 2022

tmccombs commented Dec 12, 2022

tavianator commented Dec 12, 2022

tavianator commented Dec 12, 2022

tavianator commented Dec 12, 2022

sharkdp commented Dec 15, 2022 • edited Loading

tmccombs commented Dec 15, 2022

tavianator commented Dec 15, 2022

tavianator commented Dec 28, 2022

Porkepix commented Apr 23, 2018 •

edited

Loading

sharkdp commented Apr 23, 2018 •

edited

Loading

sharkdp commented Apr 23, 2018 •

edited

Loading

sharkdp commented May 3, 2018 •

edited

Loading

Porkepix commented May 3, 2018 •

edited

Loading

Porkepix commented May 3, 2018 •

edited

Loading

Porkepix commented May 4, 2018 •

edited

Loading

Porkepix commented May 4, 2018 •

edited

Loading

sharkdp commented May 10, 2018 •

edited

Loading

tavianator commented Dec 11, 2022 •

edited

Loading

sharkdp commented Dec 15, 2022 •

edited

Loading