Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make directory-only wildcards (cd completions) faster on slow filesystems by keeping dir information #9220

Merged
merged 3 commits into from Sep 21, 2022

Conversation

faho
Copy link
Member

@faho faho commented Sep 19, 2022

Description

This uses wreaddir_resolving, which tries to use the dirent d_type
field if it exists. In that way, it can skip the stat to determine
if the given file is a directory.

This allows cd completions to skip stat in most cases:

strace -Ce newfstatat fish --no-config -c 'complete -C"cd /tmp/completion_test/"' >/dev/null

prints before:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00    0,002627           2      1033         4 newfstatat

after:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00    0,000054           1        31         3 newfstatat

for a directory with 1000 subdirectories.

(just fish --no-config -c exit does 26 newfstatat)

This should improve the situation with slow filesystems like fuse or
network fsen.

In case we have no d_type, we use stat, which would yield about the
same results.

The worst case is that we need directories and descriptions or the
"executable" flag (which we don't currently check for cd, if I read
this right?). I don't think that case exists or is all that common?

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

@faho faho added the performance Purely performance-related enhancement without any changes in black box output label Sep 19, 2022
@faho faho added this to the fish-future milestone Sep 19, 2022
@faho faho changed the title Keep dir information Make directory-only wildcards (cd completions) faster on slow filesystems by keeping dir information Sep 19, 2022
Copy link
Member

@ridiculousfish ridiculousfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Nice idea and perf improvement.

Curious why this is disabled under WSL?

One way to take this even further: the is_dir value is reliable, so if we have directories_only && !is_dir then we can return false immediately. This would require a three-way value (is dir, not dir, unknown) but is probably worth it since it would save additional stat calls for files which are not directories.

edit alternatively since if we have directories_only we always have a known is_dir, we only need an ordinary boolean here; we would just leave a comment about when the value is reliable.

@faho
Copy link
Member Author

faho commented Sep 20, 2022

the is_dir value is reliable,

unfortunately it's not in all cases. i tried throwing non-dirs out early and it ran afoul of one of the logical PWD tests.

we would have to absolutize that more eagerly for this to work, and I'm not sure that's worth it.

In fact I'm not convinced there isn't already a logical $PWD bug here ("what if ../foo is a directory physically but isn't logically"). I'll have to see if I can find it.

Curious why this is disabled under WSL?

There's a check later for it, and the early-out checks simply mimic things we do anything special for later.

Edit: This is the check I'm talking about:

fish-shell/src/wildcard.cpp

Lines 477 to 480 in 3d8f98c

if (is_windows_subsystem_for_linux() &&
string_suffixes_string_case_insensitive(L".dll", filename)) {
return false;
}

This makes it so globs for completions (this is only called if we're completing) never match ".dll". That's probably too much - it should be gated by looking for executables. If that's in, we can reuse this fast path on WSL.

@faho
Copy link
Member Author

faho commented Sep 20, 2022

Ah, got it. Two issues:

  1. expand_last_segment passes an empty base_dir to wreaddir_resolving, which makes it use "/" as the base dir
  2. it doesn't normalize the actual path.

fstatat won't work here because we need the normalized path for logical $PWD.

We could use path_normalize_for_cd, but that is too ornery with what it accepts - leading and trailing slashes, and if you get it wrong it assert()s out.

@faho
Copy link
Member Author

faho commented Sep 20, 2022

Okay, benchmarks:

In a directory with 1000 directories and 1000 files, running complete -C'cd ' yields:

version number of syscalls time relative
skipping early 2215 10.3ms 1.0x
not skipping early 3215 13.8ms 1.35x
lots of stat 4213 16.2ms 1.58x

The syscall count for the skipping early version does a total of 5 "getdents64", 31 "newfstatat" and 18 "openat" calls. That's 5 newfstatat and 2 openat more than a pure fish --no-config -c exit - including function files like cd.fish or source.fish.

Of the 2215 syscalls, 2000 are write() - we might want to have a look at that because it "write()"s the newline separately, which seems wasteful?

This could end up trying to `stat()` a file in /, like "/glassdoor",
if the dir_path was empty.
This was overzealous and didn't allow anything named ".dll" in any
file completions.

This allows us to now add the cd completion fast path for WSL
This uses wreaddir_resolving, which tries to use the dirent d_type
field if it exists. In that way, it can skip the `stat` to determine
if the given file is a directory.

This allows `cd` completions to skip stat in most cases:

```fish
strace -Ce newfstatat fish --no-config -c 'complete -C"cd /tmp/completion_test/"' >/dev/null
```

prints before:
```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00    0,002627           2      1033         4 newfstatat
```

after:

```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00    0,000054           1        31         3 newfstatat
```

for a directory with 1000 subdirectories.

(just `fish --no-config -c exit` does 26 newfstatat)

This should improve the situation with slow filesystems like fuse or
network fsen.

In case we have no d_type, we use `stat`, which would yield about the
same results.

The worst case is that we need directories *and* descriptions or the
"executable" flag (which we don't currently check for cd, if I read
this right?).
@faho faho merged commit 6a93d58 into fish-shell:master Sep 21, 2022
@faho faho deleted the keep-dir-information branch September 21, 2022 17:49
@faho faho modified the milestones: fish-future, fish 3.6.0 Sep 21, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Purely performance-related enhancement without any changes in black box output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants