Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin path, v2 #8958

Merged
merged 63 commits into from
May 29, 2022
Merged

Builtin path, v2 #8958

merged 63 commits into from
May 29, 2022

Conversation

faho
Copy link
Member

@faho faho commented May 14, 2022

Description

This adds a path builtin to deal with paths.

It offers the following subcommands:

  • filter to go through a list of paths and only print the ones that pass some filter - exist, are a directory, have read permission, ...
  • is as a shortcut for filter -q to only return true if one of the paths passed the filter
  • basename, dirname and extension to print certain parts of the path
  • change-extension to change the extension to a different one (as a string operation)
  • normalize and resolve to canonicalize the paths in various flavors
  • sort to sort paths, also only using the basename or dirname as a key

The definition of "extension" here was carefully considered and should line up with how extensions are actually used - ~/.bashrc doesn't have an extension, but ~/.conf.d does (".d").

These subcommands all compose well - they can read from arguments or stdin (like string), they can use null-delimited input or output (input is autodetected - if a NULL happens in the first PATH_MAX bytes it switches automatically).

It is both a failglob exception (so like set if a glob passed to it fails it just doesn't get any arguments for it instead of triggering an error), and passes output to command substitution buffers explicitly split (like string split0) so newlines are easy to handle.

This PR is a rerun of #8265 because that was very long and hard to follow.

I intend to merge this for 3.5.0 in two weeks or so.

Fixes issue #7658

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

There is one thing I'm not entirely sure about that would be awkward to change later:

  • filter currently follows symlinks, so any filter applies to what the link points to.
    This isn't unreasonable, but it might be possible that turning it around and requiring an option to follow is the better choice. Or we leave it as-is and maybe add an option to ignore symlinks?

@faho faho added this to the fish 3.5.0 milestone May 14, 2022
@krobelus
Copy link
Member

BTW here is a creative strategy for cleaning up an ugly sequence of commits but that's okay, I can look at the sum of the changes.

doc_src/cmds/path.rst Outdated Show resolved Hide resolved
src/builtins/path.cpp Outdated Show resolved Hide resolved
doc_src/cmds/path.rst Outdated Show resolved Hide resolved
src/builtins/path.cpp Outdated Show resolved Hide resolved
src/builtins/path.cpp Outdated Show resolved Hide resolved
doc_src/cmds/path.rst Outdated Show resolved Hide resolved
path sort GENERAL_OPTIONS [(-v | --invert)] \
[-u | --unique] [--key=basename|dirname|path] [([PATH...]

GENERAL_OPTIONS := [(-z | --null-in)] [(-Z | --null-out)] [(-q | --quiet)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to add null switches to other commands as well? I don't see path and string being all that different in this regard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should add them to string as well, yes.

The main reason this is in path first is because it's necessary for path to be complete - NUL is the only byte you can use to separate paths unambiguously, and so it exists in a lot of commands (find -print0). Whereas string already passes through NULs from stdin, and has string split0 that you can use to work with NUL-delimited strings by passing them as arguments.

Unfortunately string won't be able to use the PATH_MAX trick, because strings are arbitrary.

doc_src/cmds/path.rst Outdated Show resolved Hide resolved
share/completions/path.fish Outdated Show resolved Hide resolved
/// This constructs the wgetopt() short options string based on which arguments are valid for the
/// subcommand. We have to do this because many short flags have multiple meanings and may or may
/// not require an argument depending on the meaning.
static wcstring construct_short_opts(options_t *opts) { //!OCLINT(high npath complexity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this OCLINT comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if oclint is still a thing. I couldn't get it to run last time I tried.

If it's not, we should remove all of these.

src/builtins/path.cpp Outdated Show resolved Hide resolved
@krobelus
Copy link
Member

filter currently follows symlinks, so any filter applies to what the link points to.

Right, if /bin is a symlink then both path filter --type=dir /bin and path filter --type=linke /bin match.
This is a bit weird but it's probably the correct behavior in most situations, so it sounds ok.

@krobelus
Copy link
Member

I took a tired look at everything, looks fine

faho and others added 23 commits May 29, 2022 17:48
This adds a "path" builtin that can handle paths.

Implemented so far:

- "path filter PATHS", filters paths according to existence and optionally type and permissions
- "path base" and "path dir", run basename and dirname, respectively
- "path extension PATHS", prints the extension, if any
- "path strip-extension", prints the path without the extension
- "path normalize PATHS", normalizes paths - removing "/./" components
- and such.
- "path real", does realpath - i.e. normalizing *and* link resolution.

Some of these - base, dir, {strip-,}extension and normalize operate on the paths only as strings, so they handle nonexistent paths. filter and real ignore any nonexistent paths.

All output is split explicitly, so paths with newlines in them are
handled correctly. Alternatively, all subcommands have a "--null-input"/"-z" and "--null-output"/"-Z" option to handle null-terminated input and create null-terminated output. So

    find . -print0 | path base -z

prints the basename of all files in the current directory,
recursively.

With "-Z" it also prints it null-separated.

(if stdout is going to a command substitution, we probably want to
skip this)

All subcommands also have a "-q"/"--quiet" flag that tells them to skip output. They return true "when something happened". For match/filter that's when a file passed, for "base"/"dir"/"extension"/"strip-extension" that's when something about the path *changed*.

Filtering
---------

`filter` supports all the file*types* `test` has - "dir", "file", "link", "block"..., as well as the permissions - "read", "write", "exec" and things like "suid".

It is missing the tty check and the check for the file being non-empty. The former is best done via `isatty`, the latter I don't think I've ever seen used.

There currently is no way to only get "real" files, i.e. ignore links pointing to files.

Examples
--------

> path real /bin///sh
/usr/bin/bash

> path extension foo.mp4
mp4

> path extension ~/.config
  (nothing, because ".config" isn't an extension.)
Like `grep -v`/`string match -v`.
Co-authored-by: Johannes Altmanninger <aclopte@gmail.com>
This is theoretically sound, because a path can only be PATH_MAX - 1
bytes long, so at least the PATH_MAXest byte needs to be a NULL.

The one case this could break is when something has a NULL-output mode
but doesn't bother printing the NULL for only one path, and that path
contains a newline. So we leave --null-in there, to force it on.
These were officially called "--null-input", but I just used
"--null-in" everywhere, which worked because getopt allows unambiguous abbreviations.

But since *I* couldn't keep it straight and the "put" is just
superfluous, let's remove it.
This is needed because you might feasibly give e.g. `path filter`
globs to further match, and they might already present no results.
It's also well-handled since path simply does nothing if given no paths.
These are short flags for "--perm=read" and "--type=link" and such.

Not every type or permission has a shorthand - we don't want "-s" for
"suid". So just the big three each get one.
This replaces `test -e` and such.
No idea why this mentioned string so much.
C++ is a silly language.
I'm not sure if this is the actual proper syntax to describe this, but
it sure is a heck of a lot more readable.
"dir" sounds like it asks "is it a directory".
This allows replacing the extension, e.g.

    > path change-extension mp4 foo.wmv
    foo.mp4
faho added 20 commits May 29, 2022 17:48
Because we now count the extension including the ".", we print an
empty entry.

This makes e.g.

```fish
set -l base (path change-extension '' $somefile)
set -l ext (path extension $somefile)
echo $base$ext
```

reconstruct the filename, and makes it easier to deal with files with
no extension.
This allows e.g. sorting first by dirname and then by basename.
This is now added to the two commands that definitely deal with
relative paths.

It doesn't work for e.g. `path basename`, because after removing the
dirname prepending a "./" doesn't refer to the same file, and the
basename is also expected to not contain any slashes.
This would still remove non-existent paths, which isn't a strict
inversion and contradicts the docs.

Currently, to only allow paths that exist but don't pass a type check,
you'd have to filter twice:

path filter -Z foo bar | path filter -vfz

If a shortcut for this becomes necessary we can add it later.
it is the american standard code for information, after all
Using getcwd is naughty here because we want to separate these things
in future.
clang-tidy explains this is better. I hate C++.
More sorty, less generic.
oh no this made no sense given that it was *prepending* to `rest`.
@faho faho merged commit 4612343 into fish-shell:master May 29, 2022
@faho
Copy link
Member Author

faho commented May 29, 2022

It is done. Merged.

fitrh added a commit to fitrh/effishient that referenced this pull request May 30, 2022
@faho faho deleted the builtin-path branch June 10, 2022 06:52
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants