Skip to content

wildcard: Rationalize file/command completions #10052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 14, 2023

Conversation

faho
Copy link
Member

@faho faho commented Oct 10, 2023

Description

Alternative to #10050. NOTE The rust version is not up-to-date, it's #10050. I based it on that, I will of course adjust that before merging if we do decide that this is an acceptable approach.

This keeps the entry_t as long as possible, and asks it, so especially
on systems with working d_type we can get by without a single stat in
most cases.

Then it guts file_get_desc, because that is only used for command
completions - we have been disabling file descriptions for years,
and so this is never called there.

That means we have no need to print descriptions about e.g. broken symlinks, because those are not executable.

Put together, what this means is that we, in most cases, only do
an access(2) call instead of a stat, because that might be checking
more permissions.

So we have the following constellations:

  • If we have d_type:
    • We need a stat() for every symlink to get the type (e.g. dir or regular)
      (this is for most symlinks, if we want to know if it's a dir or executable)
    • We need an access() for every file for executables
  • If we do not have d_type:
    • We need a stat() for every file
    • We need an lstat() for every file if we do descriptions
      (i.e. just for command completion)
    • We need an access() for every file for executables

As opposed to the current way, where every file gets one lstat whether
with d_type or not, and an additional stat() for links, and an
access. And this is on top of the stat() that the entry might already have done - even without d_type, we save that.

So we go from two syscalls to one for executables, while improving accuracy compared to #10050.

Numbers

All acquired via strace -c -e %file fish --no-config -c $thething, counting "file" syscalls like access and the stat family. This is an up-to-date linux system on ext4, with d_type.

complete -C "ls ~/complete_test/", where complete_test is a directory with 10k directories and 10k files:

complete -C "" (command completion), with 2870 files in $PATH, 2865 of which are executable files and 407 are links:

I will have to see if I can get numbers for without d_type. The gains there are that we can reuse the stat from is_dir() if we are completing directories specifically, and that we can skip the lstat() for non-executables.

TODOs:

  • [N/A] Changes to fish usage are reflected in user documentation/manpages.
  • [N/A] Tests have been added for regressions fixed
  • [N/A] User-visible changes noted in CHANGELOG.rst

faho added 3 commits October 10, 2023 19:20
We no longer add descriptions for normal file completions, so this was
only ever reached if this was a command completion, and then it was
only added if the file wasn't a regular file... in which case it can't
be an executable.

So this was dead.
This gives us the full information, not just "no" or "maybe"
This keeps the entry_t as long as possible, and asks it, so especially
on systems with working d_type we can get by without a single stat in
most cases.

Then it guts file_get_desc, because that is only used for command
completions - we have been disabling file descriptions for *years*,
and so this is never called there.

That means we have no need to print descriptions about e.g. broken symlinks, because those are not executable.

Put together, what this means is that we, in most cases, only do
an *access(2)* call instead of a stat, because that might be checking
more permissions.

So we have the following constellations:

- If we have d_type:
  - We need a stat() for every _symlink_ to get the type (e.g. dir or regular)
    (this is for most symlinks, if we want to know if it's a dir or executable)
  - We need an access() for every file for executables
- If we do not have d_type:
  - We need a stat() for every file
  - We need an lstat() for every file if we do descriptions
    (i.e. just for command completion)
  - We need an access() for every file for executables

As opposed to the current way, where every file gets one lstat whether
with d_type or not, and an additional stat() for links, *and* an
access.

So we go from two syscalls to one for executables.
@faho faho added the performance Purely performance-related enhancement without any changes in black box output label Oct 10, 2023
@faho faho added this to the fish 3.7.0 milestone Oct 10, 2023
@faho faho requested a review from ridiculousfish October 10, 2023 17:51
Copy link
Member

@ridiculousfish ridiculousfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic. I love it. This seems like the minimal possible syscalls for command completions.

/// \param definitely_executable Whether we know that it is executable, or don't know
static const wchar_t *file_get_desc(const wcstring &filename, bool is_dir,
bool is_link, bool definitely_executable) {
if (is_link) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function is so much nicer now!

@mqudsi mqudsi self-requested a review October 11, 2023 17:36
This saves quite a few checks if e.g. System32 is in $PATH (which it
is if you inherit windows paths, IIRC).

Note: Our WSL check currently fails for WSL2, where this would
be *more* important because of how abysmal the filesystem performance
on that is.
@@ -379,6 +379,13 @@ fn wildcard_test_flags_then_complete(
return false;
}

if executables_only
&& is_windows_subsystem_for_linux()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@mqudsi this should really also be a thing on WSL2)

Copy link
Contributor

@mqudsi mqudsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work, much cleaner.

I agree on the WSL2 bit; I can open a PR with the unification of WSL1 and WSL2 stuff. Maybe there's nothing that it won't hurt to do under WSL2.

@faho faho merged commit 0f8bcb0 into fish-shell:master Oct 14, 2023
@faho faho deleted the gut-wildcard-descriptions branch October 14, 2023 06:45
faho added a commit that referenced this pull request Oct 14, 2023
* wildcard: Remove file size from the description

We no longer add descriptions for normal file completions, so this was
only ever reached if this was a command completion, and then it was
only added if the file wasn't a regular file... in which case it can't
be an executable.

So this was dead.

* Make possible_link() a maybe

This gives us the full information, not just "no" or "maybe"

* wildcard: Rationalize file/command completions

This keeps the entry_t as long as possible, and asks it, so especially
on systems with working d_type we can get by without a single stat in
most cases.

Then it guts file_get_desc, because that is only used for command
completions - we have been disabling file descriptions for *years*,
and so this is never called there.

That means we have no need to print descriptions about e.g. broken symlinks, because those are not executable.

Put together, what this means is that we, in most cases, only do
an *access(2)* call instead of a stat, because that might be checking
more permissions.

So we have the following constellations:

- If we have d_type:
  - We need a stat() for every _symlink_ to get the type (e.g. dir or regular)
    (this is for most symlinks, if we want to know if it's a dir or executable)
  - We need an access() for every file for executables
- If we do not have d_type:
  - We need a stat() for every file
  - We need an lstat() for every file if we do descriptions
    (i.e. just for command completion)
  - We need an access() for every file for executables

As opposed to the current way, where every file gets one lstat whether
with d_type or not, and an additional stat() for links, *and* an
access.

So we go from two syscalls to one for executables.

* Some more comments

* rust link option

* rust remove size

* rust accessovaganza

* Check for .dll first for WSL

This saves quite a few checks if e.g. System32 is in $PATH (which it
is if you inherit windows paths, IIRC).

Note: Our WSL check currently fails for WSL2, where this would
be *more* important because of how abysmal the filesystem performance
on that is.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Purely performance-related enhancement without any changes in black box output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants