Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy resource usage with "--follow" flag and /sys #378

Closed
friday opened this issue Dec 31, 2018 · 8 comments
Closed

Heavy resource usage with "--follow" flag and /sys #378

friday opened this issue Dec 31, 2018 · 8 comments

Comments

@friday
Copy link

friday commented Dec 31, 2018

If I run fd --follow something / (on Linux) fd will use 100% of my CPU and slowly allocate more and more ram until the system runs out and stops responding (unless I kill fd).

Killing fd after a few minutes frees a ton of ram:

fd resource usage 2

Other details

  • fd --follow --hidden something ~ causes it too, if the user has installed Wine, since that creates a symlinks to / from .wine/dosdevices/z:.
  • Without --follow it only takes a second
  • If I exclude /sys (but not /proc) it completes in ~10 seconds.
  • With --max-depth 10 it takes ~3:40 minutes (821,92s user 1006,11s system 819% cpu 3:42,93 total). Ram allocation peaks about halfway, and then starts freeing up ram.
    • Adding --threads 1 (on a system with 8 vcores) oddly makes it run significantly faster 😮 ~2:30m (78,85s user 99,34s system 119% cpu 2:29,18 total). It peaks at about the same amount of RAM though.

I'm using fd 7.2.0 and /proc isn't the problem, so this isn't #288, though the symptoms are nearly identical. Maybe the very large number of symlinks in /sys is the problem (symlinks to folders containing symlinks to folders to ...), causing an even much larger number of full paths that the ignore crate needs to store? Maybe this warrants gigabytes of ram allocated?

I'm not sure if this is to be considered a bug or just an unfortunate combination, but since the outcome is quite serious I thought it deserved it's own issue.

@sharkdp
Copy link
Owner

sharkdp commented Dec 31, 2018

Thank you very much for the detailed bug report. I can reproduce this.

Maybe the very large number of symlinks in /sys is be the problem (symlinks to folders containing symlinks to folders to ...),

I am pretty sure this is the case. It's not just an explosion of the number of symlinks. There are loops, too. find reports these:

> cd /sys
> find -L -name '*something*'
[...]
find: File system loop detected; ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0/block/sda/device’ is part of the same file system loop as ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/subsystem/devices/target2:0:0/2:0:0:0’.
[...]

Interestingly, find also seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):

find: ‘./class/thermal/cooling_device1/device/physical_node/node0/cpu7/firmware_node/subsystem/devices/PNP0C14:00/physical_node/driver/module/holders/ideapad_laptop/drivers/platform:ideapad_acpi/VPC2004:00/rfkill/rfkill0/subsystem/rfkill2/device/device/leds/phy0-led/subsystem/input25::kana/device/device/driver/0003:046D:C318.0004/subsystem/devices/0003:09DA:9090.0001/input/input22/subsystem/input14/device/controlC0/subsystem/pcmC1D0c/device/device/driver/0000:00:03.0/subsystem/devices/0000:00:1f.2/ata1/host0/scsi_host/host0/subsystem/host5/device/target5:0:0/5:0:0:0/block/sdb/device’: Too many levels of symbolic links

Still, even with these checks in place, find -L -name '*something*' also does not seem to finish, when run from /sys.

In principle (as I understand it), loops should be detected by the ignore crate, but we should double-check if we handle this correctly in fd. In a small experiment with the following structure, fd did not end up in an infinite loop:

▶ tree
.
└── bar
    └── foo -> ..

2 directories, 0 files

So something must be different inside /sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".

@tavianator
Copy link
Collaborator

Interestingly, find also seems to have something like a maximum number of redirections through symlinks ("Too many levels of symbolic links"):

This is just ELOOP coming back from the kernel. path_resolution(7) says the limit on modern kernels is 40.

So something must be different inside /sys. Maybe it is related to the fact that these files do not really exist, but are created by the kernel "on the fly".

I don't believe there is much special about /sys except for symlinks that point back up the tree very high, and high fanout. Cycle detection in find only triggers when the directory it's examining is the same as one of its parents. Theoretically this is enough to avoid going infinite, but we can still get combinatorial blowup in the number of paths. Imagine this directory structure:

root/i/j/link -> ../../(i + 1)%10

so root/1/2/link points to root/2 etc. That's only 211 files, but it's well over 100,000,000,000 cycle-free traversals, since you only get a loop after you follow 10 links (root/A/B/link/C/link/.../K/link is a loop, but nothing before it is).

/sys seems to have a similar thing going on with traversals like

/sys/class/thermal/cooling_device13/device/physical_node/driver/cpu21/firmware_node/subsystem/devices/device:6b/physical_node/subsystem/devices/0000:00:17.0/firmware_node/device:b2/physical_node/ata_device/dev4.0/subsystem/dev3.0/device/firmware_node/subsystem

@tavianator
Copy link
Collaborator

Something that might be nice is a "visit files only once" mode, that remembers which files (by identity, i.e. st_dev/st_ino pair on *nix) have already been seen.

@friday
Copy link
Author

friday commented Jan 7, 2019

There's some discussion about that in #318

@sharkdp
Copy link
Owner

sharkdp commented Apr 15, 2020

I have been thinking about this for some time.

I currently don't want to implement a "visit files only once" mode, as it would require proper synchronization across threads with a global set of visited device/inode pairs (on Unix. No idea about Windows). This would (1) increase code complexity and (2) might have a performance impact, as the global set has to be managed on the receiver side (in a single thread).

However, if someone wants to experiment with this, I have an implementation of this exact concept in https://github.com/sharkdp/diskus/

A much simpler (but also more "hacky") solution would be the following: Let's assume that it is very unlikely that someone wants to find something within /sys without explicitly using /sys as a search path. If this assumption is correct, we could ignore the /sys directory by default IF --follow is used AND /sys is not explicitly listed as a search root. Similar to what rg -uuu does for binary files, we could use the third occurence of -u/--unrestricted to lift that ban of /sys and include it in this case.

@sharkdp
Copy link
Owner

sharkdp commented May 25, 2020

It looks like this has been resolved in v7.5.0 for some (yet unknown) reason:

❯ /usr/bin/time -f "%M" fd-v7.4.0 --follow --max-depth 8 non-existing /sys
741792

❯ /usr/bin/time -f "%M" fd-v7.5.0 --follow --max-depth 8 non-existing /sys
7600

Timing is still the same though:

Command Mean [s] Min [s] Max [s] Relative
fd-v7.4.0 --follow --max-depth 7 non-existing /sys 23.867 ± 0.341 23.603 24.252 1.03 ± 0.03
fd-v7.5.0 --follow --max-depth 7 non-existing /sys 23.077 ± 0.462 22.556 23.438 1.00

@friday
Copy link
Author

friday commented May 25, 2020

Since it's no longer hogging up all my ram and causing the system to crash when I run the command I consider this fixed 🙂

@sharkdp
Copy link
Owner

sharkdp commented May 25, 2020

I actually looked into this a bit and it seems to be another instance of an unexpected performance regression improvement!

If I compile fd v7.4.0 with Rust 1.35, I get this:

❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
62104

If I compile it with Rust 1.36 or newer, I get this:

❯ /usr/bin/time -f "%M" fd --follow --max-depth 6 non-existing /sys
7232

Rust 1.36 introduced some changes concerning memory management, so this seems plausible at least (https://blog.rust-lang.org/2019/07/04/Rust-1.36.0.html). But I have no idea what exactly caused the previous memory consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants