Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better test builtin: is #6124

Open
zanchey opened this issue Sep 19, 2019 · 24 comments
Open

A better test builtin: is #6124

zanchey opened this issue Sep 19, 2019 · 24 comments

Comments

@zanchey
Copy link
Member

zanchey commented Sep 19, 2019

test sucks (#1337, #5186, #5947, #6114). Plus I always have to look up the manpage because I can never remember what the options do. Its one saving grace is its compliance with POSIX, so rather than enhancing it beyond that, can we do better? Bash/zsh has [[, which is mostly backward-compatible, but I think we should take a different approach.

I propose a new builtin, which I will call is for now.

Some examples:

is file $somefilename; or touch $somefilename

is exists /etc/debian_release; or rm -rf /

if is sgid /usr/bin/screen ...

if is 1 greaterthan 2 ... # or gt for shorthand?

if is 1 = 1 ...

if is $FISH_VERSION versionlessthan 3.0.0
    echo go fishing!
end

Replaces #6065. Should include #3589.

Absolutely must not include combining operators.

Should not include operations available elsewhere (eg isatty, string, contains); I'd be tempted to elide string operations entirely in favour of string match --entire and string length, though perhaps some shorthand could be added.

One problem then is that is is then numeric comparisons and file operations, which are not at all related; perhaps it could be broken out into isnum and ispath?

Name open to debate (though is doesn't collide with any binaries in Debian, at least).

@zanchey zanchey added this to the fish-future milestone Sep 19, 2019
@faho
Copy link
Member

faho commented Sep 19, 2019

We need to be careful to not redo tests argument parsing bogosities.

You've already left out the combiners, and we should also leave out the no-operator form (test somestring being the same as test -n somestring is the single worst bit about test).

I've had some ideas over the years, including always requiring three arguments:

  • is a = b tests string equality
  • No test -n or test -z, instead you'll have to use is a = '' and not is a = ''

But that causes problems with e.g. unquoted variables - what would is $a = '' do if $a is list-empty?

It could error out, or since there is a valid operator it could assume the other argument is empty. Erroring seems a bit excessive.

Or we could require operators first, like you've done here.

Then there's the question of what happens if an argument looks like an operator. If we used tests -eq etc numeric operators (unfortunately the intuitive < et al would require more than a builtin), we could get something like is $a -eq 4, and then $a could start with a "-" like -4, or it could even be -eq.

I don't think that's a huge problem - if you forget to add an operator and the variable looks like one you're gonna have a bad time.

A bigger issue is what happens with lists. If we allow is $a = $b and either $a or $b has more than a single element it'll fail. It would technically be possible to find the = and compare the arguments before with the arguments after, but that has a whole bunch of issues, including that the lists themselves can contain =. But anything else would require special syntax again (this is also the main impetus behind [[).

Name open to debate (though is doesn't collide with any binaries in Debian, at least).

Another possible name: check.


Also there's a backwards-compatibility question: Do we plan to ever remove test? We technically could, given that there's a conforming test command on the system, but that would slow down old scripts running on new fish, and our test has been extended with float support.

@zanchey
Copy link
Member Author

zanchey commented Sep 19, 2019

leave out the no-operator form

Definitely!

Or we could require operators first, like you've done here.

Note I only did that for the path (pathis? ispath?) operations; operator first for numeric things in Polish notation style will cause more problem than it solves, I think.

Another possible name: check.

Couple of those already: https://packages.debian.org/search?suite=buster&arch=any&mode=exactfilename&searchon=contents&keywords=bin/check (though not a showstopper).

I don't think we'd remove our test builtin.

@ridiculousfish
Copy link
Member

The more I think about this the more I like it. It's a good thing to have a place to put arbitrary predicates, it doesn't need to be consistent.

If we do this, let's dodge the empty expansion problem with strict "verb subject object", never infix:

if is greater $x 100

where if a parameter is missing it always fails. There's nothing to parse so no parsing pitfalls.

(Well maybe we can support is not)

@zanchey
Copy link
Member Author

zanchey commented Sep 23, 2019

My problem with that is I can never remember which way round the subject and object go. Is it greater 1 2? or greater 2 1? It might be possible to help to a degree by implementing only equal, greater and greater-or-equal, and requiring inverted arguments/not for less than (that way the greater argument is always on one side, regardless of operation).

@mqudsi
Copy link
Contributor

mqudsi commented Oct 7, 2019

My dream shell would imbue if with expression parsing abilities, along the lines of math but even more so, and parse that as a single expression payload rather than going through the typical parser, freeing the universal mathematical operators like < and > where there's no ambiguity as to the order of expressions or arguments, and let you directly evaluate conditionals, along the lines of if $status < 7 or if $status is foo (rather than requiring the is to always come first)... but I think that is probably too fundamental of a change to ever happen.

@zanchey
Copy link
Member Author

zanchey commented Oct 12, 2019

I was thinking about an optional "than" argument:

if is greater 2 [than] 1

but I don't really know that it helps, and mangles the English grammar further.

@equwal
Copy link

equwal commented Nov 26, 2019

I was thinking about an optional "than" argument:

if is greater 2 [than] 1

but I don't really know that it helps, and mangles the English grammar further.

I don't think than is helpful. This reminds me of the debate between Common Lisp's loop and iterate (hint: superfluous pseudo-COBOL pseudo-English is not helpful).

If keeping a polish prefix syntax, it is much better just to stick to it. Here is a table of proposed comparators with examples.

is = 1 1 # common numerical operations
is < 1 1
is <= 1 1
is >= 1 1
is eq str1 str2 # common string operations, perl style
is neq str1 str2
is lt str1 str2
is or a b c d... # short circuit
is and a b c d...

It might be sane to wrap these things. If so, we have a lisp interpreter (Greenspun's tenth rule).

is {and {> 1 0} {< 0 1}} # true

Just some ideas.

@mikelward
Copy link
Contributor

mikelward commented Jun 26, 2020

Other comments allude to this, but let me also name drop issue #2037: test -n $unsetvar as something I'd love this to address.

@faho
Copy link
Member

faho commented Jul 19, 2020

I've been thinking about this some more, and I have a design that looks okay for most things, while being implementable and maintainable.

The idea is to avoid the worst test misfeatures (the zero-argument form being the absolute worst, so we just always demand a command) and to make something that can handle lists sensibly so you don't always have to quote.

And for some tests that's simple:

  • is file FILES... would just return 0 if all the given FILES (or any of them?) exist (same for the other file operations)
  • is same STRINGS... would return 0 if all the given STRINGS are the same
  • is empty STRINGS... would return 0 if all the given STRINGS are empty (or no strings have been given)

For these operations that should all be intuitively understandable. I'm leaning towards always testing ALL because otherwise you'd want to have the arguments that passed or failed (which of the given files exists?)

The one thing that seems weird with this approach is greater/less, because the logical form of that is:

  • is ascending NUMBERS... returns 0 if every number is greater or equal to the last
  • is strict-ascending NUMBERS... returns 0 if every number is greater than the last

And that works, but is that something people would reach for?

Also note that is same has a failure case: When you do is same $foo $bar, and $bar consists of more same elements, e.g.:

set foo 1 1 1
set bar 1
is same $foo $bar

I don't imagine this to be a problem in practice, and it's only really fixable by having syntax to distinguish between the two lists before expansion. Even if you only allowed two arguments you could have an empty $foo and a bar of two same elements.

This just shows that is same compares strings, not lists. A list-comparison builtin would also be possible - is samevar foo bar.

@krobelus
Copy link
Member

krobelus commented Jul 19, 2020

is same should silently return 1 if an argument is missing so we can write is same $USER root without quotes.
This makes for another "edge case" where is same $undefined '' is false which I think is okay.

Sounds good to check all elements in a list - technically "all" implies truth if no elements are given but we can simply allow that only for is empty.

Substituting test -n "$foo" with is not empty $foo and test x"$foo" = xbar with is same $foo bar would replace the majority of our uses of test. I think those two are the most important ones to get right.

is ascending NUMBERS...

I think we can still use greater-or-equalless-or-equal with the same semantics, I expect most uses to pass only two numbers.

Prototype (without lifting binary operations to lists yet):

set -l is_subcommands empty not same \
file exists directory \
greater-than less-than \
greater-or-equal less-or-equal \
newer-than older-than

complete -c is -n 'is = 1 (count (commandline -opc))' -xa (string join ' ' $is_subcommands)
complete -c is -n '__fish_seen_subcommand_from not !' -xa '(
	set -l cmd (commandline -opc) (commandline -ct)
	complete -C"is $cmd[3..-1]"
)'

function is -d 'Perform tests on arguments'
    set -l argc (count $argv)
    set -q argv[1]; or echo 'is: please specify a command' && return 1
    set -l subcommands_with_one_argument file exists directory
    set -l subcommands_with_two_arguments \
    greater-than less-than greater-or-equal less-or-equal \
    newer-than older-than

    switch $argv[1]
        case $subcommands_with_one_argument
            if and test $argc -ne 2
                echo "is $argv[1]: need exactly one argument"
                return 1
            end

        case $subcommands_with_two_arguments
            if test $argc -ne 3
                echo "is $argv[1]: need exactly two arguments"
                return 1
            end
    end

    switch $argv[1]
        # n-ary
        case empty
            not string length -q $argv[2..-1]
        case not !
            not is $argv[2..-1]
        case same =
            set -q argv[3] && test x$argv[2] = x$argv[3]
        # unary
        case file
            test -f $argv[2]
        case exists
            test -e $argv[2]
        case directory
            test -d $argv[2]
        # binary
        case greater-than
            test $argv[2] -gt $argv[3]
        case greater-or-equal
            test $argv[2] -ge $argv[3]
        case less-than
            test $argv[2] -lt $argv[3]
        case less-or-equal
            test $argv[2] -le $argv[3]
        case older-than newer-than # by mtime
            set -l mtimes (stat -c '%Y' $argv[2 3])
            set -l op less-than
            if is same $argv[1] newer-than
                set op greater-than
            end
            is $op $mtimes
        case '*'
            echo "is: unknown command: $argv[1]" && return 1
    end
end

versionlessthan would be nice to have, but it's hard to get right. We could use the semver binary but IIRC that one has some surprising results when not following SemVer.

@faho
Copy link
Member

faho commented Jul 19, 2020

This makes for another "edge case" where is same $undefined '' is false which I think is okay.

That seems okay to me. The empty string is not the empty list.

I think we can still use greater-or-equal with the same semantics, I expect most uses to pass only two numbers.

To be clear: ascending is true here:

is ascending 1 2 3 4 5

If you called that "greater-or-equal" it would look like

is greater-or-equal 1 2 3 4 5

I would actually assume it's more intuitive to have it the other way around:

is less-or-equal 1 2 3 4 5

Because this way you can just slot in a "less-or-equal" comparison between any two numbers in order, so is less-or-equal 1 2 3 4 5 checks 1 <= 2, 2 <= 3, 3 <= 4 and 4 <= 5.

Substituting test -n "$foo" with is not empty $foo and test x"$foo" = xbar with is same $foo bar would replace the majority of our uses of test. I think those two are the most important ones to get right.

Those are the ones with the most annoying behavior, yes.

versionlessthan would be nice to have, but it's hard to get right. We could use the semver binary but IIRC that one has some surprising results when not following SemVer.

The "definitive" version comparison function is probably rpmvercmp, used by rpm and Archlinux's pacman/ALPM. It's LGPL, so we could use it. (this is also what arch ships as its vercmp helper utility)

I'm not saying it's perfect - version comparison is surprisingly hard, with all the gunk people add in there! But it would be a known algorithm that would agree with other, widely used, tools, which is worth a lot.

@krobelus
Copy link
Member

Of course I'd mix up greater and less 🤦

The "definitive" version comparison function is probably rpmvercmp

Sounds good, we can add this at some point (or implement it in fish script if someone is feeling adventurous).

@faho
Copy link
Member

faho commented Jul 19, 2020

Alternative behavior for greater et al: Check against the first argument.

E.g. is greater 5 6 7 6 would be true, because 6, 7 and 6 are all greater than 5.

@faho
Copy link
Member

faho commented Jul 19, 2020

Okay, so I've gone a slightly different way and written the documentation first.

.. _cmd-is:

is - check conditions on files, numbers and text
================================================

Synopsis
--------

::

    is empty STRINGS...
    is notempty STRINGS...
    is same STRINGS...
    is path PATHS...
    is file PATHS...
    is directory PATHS...
    is link PATHS...
    is readable PATHS...
    is writable PATHS...
    is executable PATHS...
    is equal NUMBERS...
    is greater NUMBERS...
    is greater-equal NUMBERS...
    is less NUMBERS...
    is less-equal NUMBERS...

Description
-----------

``is`` checks if conditions are true. It operates on lists of arguments. The first argument is the command.

A missing or unknown command is an error.

All commands except ``empty`` return false if no argument was passed.

Unlike ``test``, ``is`` does not support logical operators and lacks some negated forms. Use :ref:`combiners <combiners>` and multiple calls to ``is`` instead.

String commands
---------------

- ``empty`` checks if all given strings are empty. This returns true if no string was given.
- ``notempty`` checks if all given strings are notempty.
- ``same`` checks if all the given strings are the same.

To check if any string is not empty, negate the ``is`` instead: ``not is empty "" "" "foo"`` will be true.

To check if a string has one of a number of given values, use :ref:`contains <cmd-contains>`: ``contains -- foo bar foo baz``.

File and directory commands
---------------------------

- ``path`` checks if all given paths exist.
- ``file`` checks if all given paths are regular files.
- ``directory`` checks if all given paths are directories.
- ``link`` checks if all given paths are symbolic links.
- ``readable`` checks if all given paths are readable.
- ``writable`` checks if all given paths are writable.
- ``executable`` checks if all given paths are executable.

Number commands
---------------

``is`` can compare floating point numbers. The radix character is ``.``, not dependent on locale.

- ``equal`` checks if all given numbers are numerically equal - unlike ``same``, this ignores leading zeroes and trailing zeroes after a "." and can handle hexadecimal numbers with ``0x``.
- ``greater`` checks if every number is greater than the next, i.e. the numbers are in descending order.
- ``less`` checks if every number is smaller than the next, i.e. they are in ascending order.
- ``greater-equal``, like ``greater``, but also true if the numbers are equal.
- ``less-equal``, like ``less``, but also true if the numbers are equal.

Examples
--------

From __fish_complete_docutils::

  if is same $cmd rst2html5
      complete -x -c $cmd -l table-style -a "borderless booktabs align-left align-center align-right colwidths-auto" -d "Specify table style"

From fish_vi_cursor::
  
  # With test - this:
  # - needs to test if $KONSOLE_VERSION is non-empty before checking if it's greater
  # - needs to quote $KONSOLE_VERSION because `test -n` is *true*
  # - still errors out if $KONSOLE_VERSION is non-numeric
  not test -n "$KONSOLE_VERSION" -a "$KONSOLE_VERSION" -ge 200400

  # With is - no error when the argument is not numeric or missing.
  not is greater $KONSOLE_VERSION 200400

From __fish_complete_man::

  if test -z "$section" -o "$section" = 1

  if is empty $section; or is same $section 1


  if test -z "$token" -a "$section" != "[^)]*"

  if is empty $token; and not is same $section "[^)]*"

__fish_md5::

  # Note: This either needs to be quoted or needs to have been checked beforehand.
  if test $argv[1] = -s

  if is same $argv[1] -s

  if is = $argv[1] -s

Ideas
-----

- Version comparison using ``vercmp``
- ``is true`` - check if a value is "truthy" - number greater than 0, a string like "ON" or "true".
- ``is number`` - check if the value is a number.
- ``--any``, before the command, to return true if any value is true.
- Remove ``notempty``? Add ``notequal``?
- Other names for numeric commands? ``=``?
- Allow the test option naming, possibly as an alternative? "lt"/"le"/"gt"/"ge"?
- ``is prefix``, checking if the first argument is prefix of all the others? (same for suffix etc)

Unimplemented bits:

Some operators for files that nobody really uses much
-----------------------------------------------------

- ``-b FILE`` returns true if ``FILE`` is a block device.

- ``-c FILE`` returns true if ``FILE`` is a character device.

- ``-g FILE`` returns true if ``FILE`` has the set-group-ID bit set.

- ``-G FILE`` returns true if ``FILE`` exists and has the same group ID as the current user.

- ``-k FILE`` returns true if ``FILE`` has the sticky bit set. If the OS does not support the concept it returns false. See https://en.wikipedia.org/wiki/Sticky_bit.

- ``-O FILE`` returns true if ``FILE`` exists and is owned by the current user.

- ``-p FILE`` returns true if ``FILE`` is a named pipe.

- ``-s FILE`` returns true if the size of ``FILE`` is greater than zero.

- ``-S FILE`` returns true if ``FILE`` is a socket.

- ``-t FD`` returns true if the file descriptor ``FD`` is a terminal (TTY).

- ``-u FILE`` returns true if ``FILE`` has the set-user-ID bit set.

Operators to compare and examine numbers
----------------------------------------

- ``NUM1 -ne NUM2`` returns true if ``NUM1`` and ``NUM2`` are not numerically equal.

Operators to combine expressions
--------------------------------

- ``COND1 -a COND2`` returns true if both ``COND1`` and ``COND2`` are true.

- ``COND1 -o COND2`` returns true if either ``COND1`` or ``COND2`` are true.

Expressions can be inverted using the ``!`` operator:

- ``! EXPRESSION`` returns true if ``EXPRESSION`` is false, and false if ``EXPRESSION`` is true.

Expressions can be grouped using parentheses.

- ``( EXPRESSION )`` returns the value of ``EXPRESSION``.

 Note that parentheses will usually require escaping with ``\(`` to avoid being interpreted as a command substitution.

@krobelus
Copy link
Member

krobelus commented Jul 26, 2020

This looks almost done!

Regarding other ideas in this thread, it would surely be doable to change
the grammar to support infix expressions for is but that's probably not worth
the increase of complexity of language and implementation/tooling. With
verb-subject-object order, is is consistent and fairly unambiguous.

We already have something that feels quite natural for both the single-object
case and also when there is a list of objects. It's important that
the list case does not complicate the documentation for the simple case.

  • --any, before the command, to return true if any value is true.
  • Remove notempty? Add notequal?

I'd say remove notempty. For most practical uses it is too similar to not is empty.

For use cases like is notempty $foo $bar, which is not equivalent to
not is empty $foo $bar, we should consider a more general alternative,
that allows the same for other checks.

We could use is not, making these two equivalent:

	is not file $foo $bar
	not is file $foo && not is file $bar

The Python-esque is not sure is nice!

We can also go with verbs that better fit into the list context:

	is [all] empty $foo $bar
	is any   empty $foo $bar
	is none  empty $foo $bar

Of course, these could be switches like --any and --none.

To keep the natural flow for the more important single-object case, I'd lean towards not and any.
Possibly adding none as alias to not.

  • Other names for numeric commands? =?

I believe that is same will be used much more often than the numeric is equal -
in many cases the former can be used for numbers just fine.
Therefore I'd prefer = to behave like same if we use it.
I'd like to add a standalone function = that is an alias to is same.

(Pushing that further, ? could be an alias for not is empty)

  • Allow the test option naming, possibly as an alternative? "lt"/"le"/"gt"/"ge"?
    work
    Probably a good idea, especially for contexts where tab-completion is not available.

Alternative behavior for greater et al: Check against the first argument.

At some point I wasn't sure whether is greater x y z means
is greater x y && is greater x z (it doesn't), or if it means
is greater x y && is greater y z (it does).
Not sure if we should do anything about this - we could still have
descending to have both, or include a switch like --scan or --fold
but that's surely overshooting our goals... I think it is best to keep
greater.

@faho
Copy link
Member

faho commented Aug 8, 2020

Okay, I've implemented my proposal and played around with it a bit.

I think it's about the best we could get if we 1. require lists to work, 2. require it to be an ordinary builtin.

Unfortunately, I'm not sold that it would actually help anyone. Sure, is empty is easier to use than test -n, but I don't see people getting to grips with is greater. I'm not even convinced that people would actually try it because it looks so verbose and it's not available in older fish versions.

So, I'm coming around again to that other design where lists fail:

is x OPERATOR y. OPERATOR is basically what test uses without the weird -: is 5 gt 2, is foo = foo.

The trick is that if it ever doesn't get exactly three arguments the test fails. No error message, just a falsy return. The exception being is = which is true - both sides are empty. Probably also is = '' and is '' = to make is = $foo the equivalent of test -z $foo.

The file and such operations are basically like the other design - is file /path/to/file. They require exactly two arguments.

As above, no logical operations, use &&.

This is a good deal less elegant in terms of doing a ton of checks at once - it can't be used to check if a list is sorted. But it is much more intuitive with the infix operators and should still fix the main problems with test - the zero-argument form and the multi-argument confusion. Instead anything weird just causes it to fail, which is most likely what you want anyway.

This has some very obscure failure cases - set foo gt =; set bar; is $foo gt $bar. That's probably acceptable.

Anything else basically requires special syntax.

@hugomg
Copy link
Contributor

hugomg commented Aug 8, 2020

I think that maybe we should split this problem into two problems

  1. Some of the flags of the test command are hard to remember
  2. It is easy to shoot ourselves in the foot if the arguments to test are an unquoted list, or if they contain special strings such as -a and -o.

For the first problem, I really love the suggestion of introducing a new is command, with predicates such as is file and is directory. That is much more readable than the traditional test -f or test -d, and it also generalizes nicely to multiple files in situations like is writable *.txt.

However, I think that the second problem will still be present in any attempt to replace test with another command. Even if we design a new command that is saner than test, at the end of the day it will still do unexpected things if one of the arguments is an undefined variable, a list, or if one of the variables contains a "keyword" such as -a or -ge.

For example, in the proposal from three weeks ago, if we call is greater $x 17 with x being an undefined variable then it expands to to is greater 17, which silently succeeds. Similarly, if we call is equal $x $y where both are undefined variables then it returns false, because all is operators return false if they return an empty list. At the end of the day, the only workaround for these problems is to remember to always quote all the variables, which is one of the things that we wanted to avoid in the first place.

The proposal from today of having the comparison command fail unless it receives exactly three arguments helps a bit, but it is just a band-aid that doesn't totally fix the problem of the command doing unexpected things if the number of arguments are not what is expected.


If we were allowed to make a backwards incompatible change, my dream proposal would be to replace the test command with a similar test syntax, kind of like what Bash did with [[. This way, we would still be able to use familiar notation like test $x = $y or test -n $x, except that now it wouldn't blow in our face if we left x or y unquoted.

On top of that, we could also add an is file, is directory, etc as more readable alternatives to test -f, test -d and so on.

@faho
Copy link
Member

faho commented Aug 9, 2020

However, I think that the second problem will still be present in any attempt to replace test with another command.

It's possible to make the problem much smaller.

For example, in the proposal from three weeks ago, if we call is greater $x 17 with x being an undefined variable then it expands to to is greater 17, which silently succeeds.

In the implementation I've written It doesn't - the binary operators fail if they only get one operand.

It only succeeds if you do is greater $x $y and x has at least two elements that happen to be sorted and y has no elements. Which, to a first approximation, doesn't happen.

The proposal from today of having the comparison command fail unless it receives exactly three arguments helps a bit

I've noted the problem and it is "if the list includes something that perfectly mirrors the other side and operator and includes an =".

Which... yeah, sure, quote it if that can happen in your particular case. In most cases it can't, because the shape of the operands is reasonably fixed.

Compare tests problem: test -n $var returns true if $var is empty. That happens constantly, because many things turn out to be empty.

@hugomg
Copy link
Contributor

hugomg commented Aug 9, 2020

It's possible to make the problem much smaller.

That is fair. Getting rid of the one-operand mode and and getting rid of combining operators like -o and -a helps a lot to mitigate the parsing issues.

However, I think the biggest argument in favor of treating test as syntax is that it just makes test do what people always expected it should do. This is why I think it is useful to split the problems with the test command into two kinds of problems -- the first being that test is ugly and the second being that test is full of footguns. The ugliness problem can be cleanly solved by a new command: anyone that thinks that test -d is ugly will be happy to be able to use is directory instead. However, the footgun problem is harder to solve with a new command. As long as the old test still exists, people will continue to shoot themselves in the foot with it.

For the matter, I actually kind of like the existing test command. I think that infix comparison operators are easier to read than prefix comparison operators and I also like the [ ] syntax. I just wish that it didn't interact so poorly with unquoted variables, although I'll confess that maybe that is just me asking for a faster horse instead of a car :)

@faho
Copy link
Member

faho commented Aug 9, 2020

As long as the old test still exists, people will continue to shoot themselves in the foot with it.

Lemme head you off right there. The old test will continue to exist. Even if we removed the builtin, the command would still be on your system (and the only change would be missing float support and slower performance).

Replacing test with something incompatible of the same name is a bad idea. Even just removing the one-argument mode (which would fix test -n) is enough of a compatibility break that we probably don't want to do it.

@hugomg
Copy link
Contributor

hugomg commented Aug 9, 2020

Even just removing the one-argument mode (which would fix test -n) is enough of a compatibility break that we probably don't want to do it.

The idea would be to replace the regular test command by a special syntactic form that behaves almost exactly the same. Sort of like [[ in bash, in an alternate universe where instead of adding [[ they had replaced [ with something that behaves like [[. The main difference compared to the old test command is that arguments would be parsed before variable or command substitution. For example, test -n $x would always be parsed as a two-argument test, regardless of how many arguments $x expands into.

I suspect that very few existing Fish programs would behave differently if we replaced the builtin test command with a special syntactic form. I did a quick grep in the fish scripts that ship together with fish and in that cursory search I couldn't find any examples where test would break if it became a special syntax along the lines of [[. But I did find plenty of places where the arguments to test are unquoted, including at least two instances of the specially problematic test -n $x.

Replacing test with something incompatible of the same name is a bad idea.

Backwards compatibility for old scripts would probably be pretty good. The bigger problem would be backwards compatibility for new scripts. Old scripts that use quotes in test -n "$x" should continue to work fine if test became a special syntax. However, if new scripts start writing test -n $x because the quotes are no longer necessary in the newer versions of Fish, then those scripts could behave weirdly on older versions of Fish.

I think the pros of the change might still outweigh the cons, but I'll admit that I could be wrong about this.

Even if we removed the builtin, the command would still be on your system (and the only change would be missing float support and slower performance).

This is actually one of the reasons why I'm attracted to the idea of trying to fix test by treating it specially instead of treating it like a regular command. If we get rid of test or if we try to encourage users to use a different command, the old test command will still be there, inviting people to misuse it. However, if we replace the test command with a better-behaved syntactic form that is also called test, then people would need to explicitly use /usr/bin/test or command test if they want the foot gun.

@alisraza
Copy link

From the view of someone who does computer programming as a side-hobby and does not have formal training, I have a few suggestions about the implementation of the is command.

One consideration is the possible ambiguity arising from Polish notation, as mentioned earlier in this issue. Leaving aside arrays for the moment, my understanding of the current implementation is as follows. Given is greater $x $y, one should read this as, "Is it greater than $x, this variable $y?", as opposed to "Is $x greater than $y". This agrees with the current use of contains, where contains cat $animals may be read as "Does it contain the string cat, this array $animals?"

Assuming the preceding is true, I wonder if it might be helpful to consider making the verb-object-subject more explicit in regard to the names of the sub-commands to is. For example, although more verbose and perhaps not as elegant, should is greater-than be preferable to is greater?

I have a few suggestions along these lines, focusing on readability (and therefore, to some extent, agreement with English language syntax). The proposals are in descending order of degree of change from the current suggested implementation, i.e., Proposal A would be perhaps the most ideal but also the most annoying to implement.

Proposal A:

Instead of operator overloading for is, create two separate built-ins, is and are, with the latter specific to arrays. This would be more explicit and more readable. Further, the are built-in would have two subcommands, any and all.

Current:

is empty STRINGS...

Proposed:

is empty STRING

are any empty STRING STRING [STRING...]

are all empty STRING STRING [STRING...]

Current:

is notempty STRINGS...

Proposed:

is non-empty STRING

are any non-empty STRING STRING [STRING...]

are all non-empty STRING STRING [STRING...]

Alternate:

There is no separate implementation. Use not syntax.

not is empty STRING

not are all empty STRING STRING [STRING...]

not are any empty STRING STRING [STRING...]

Pros: One less thing to implement. More consistent with pattern of using not to invert the other conditionals.

Cons: The logic of all and any is inverted, i.e., not are any empty is logically equivalent to are all non-empty, with the latter being more readable.

Current:

is same STRINGS...

Proposed:

is same STRING STRING

are any same STRING STRING [STRING...]

are all same STRING STRING [STRING...]

Alternate:

is same-as STRING_OBJ STRING_SUBJ

are any same STRING STRING [STRING...]

are all same STRING STRING [STRING...]

Pros: Usage of verb-object-subject is reinforced and consistent with other conditionals that accept two arguments.

Cons: An additional syntax variation with not much value-added, given that the object-subject order does not matter when testing if two strings are the same.

Current:

is path PATHS...

Proposed:

is path PATH

are any paths PATH PATH [PATH...]

are all paths PATH PATH [PATH...]

Current:

is file PATHS...

Proposed:

is file PATH

are any files PATH PATH [PATH...]

are all files PATH PATH [PATH...]

Current:

is directory PATHS...

Proposed:

is dir PATH

are any dirs PATH PATH [PATH...]

are all dirs PATH PATH [PATH...]

Alternate:

is directory PATH

are any directories PATH PATH [PATH...]

are all directories PATH PATH [PATH...]

Pros: Avoids unnecessary abbreviations, which is more consistent with the general approach here.
Cons: Variation is more complex than simply adding s for plural; may be unnecessarily confusing for those less comfortable with English.

Current:

is link PATHS...

Proposed:

is link PATH

are any links PATH PATH [PATH...]

are all links PATH PATH [PATH...]

Current:

is readable PATHS...

Proposed:

is readable PATH

are any readable PATH PATH [PATH...]

are all readable PATH PATH [PATH...]

Current:

is writable PATHS...

Proposed:

is writable PATH

are any writable PATH PATH [PATH...]

are all writable PATH PATH [PATH...]

Current:

is executable PATHS...

Proposed:

is executable PATH

are any executable PATH PATH [PATH...]

are all executable PATH PATH [PATH...]

Current:

is equal NUMBERS...

Proposed:

is equal NUMBER NUMBER

are any equal NUMBER NUMBER [NUMBER...]

are all equal NUMBER NUMBER [NUMBER...]

Alternate:

is equal-to NUMBER_OBJ NUMBER_SUBJ

are any equal NUMBER NUMBER [NUMBER...]

are all equal NUMBER NUMBER [NUMBER...]

Pros and Cons: Please see comments for is same-as above.

Current:

is greater NUMBERS...

Proposed:

is greater-than NUMBER_OBJ NUMBER_SUBJ

are any descending (Not Implemented)

are all descending --unique NUMBER NUMBER [NUMBER...]

Pros: helps resolve the ambiguity of is greater $x $y, which is intended to be read as "is greater (than) $x, $y?" but can easily be read as "is greater $x (than) $y?" Also, is greater for an array is intended to be read as "is (each) greater (than the next)", i.e., descending, but may be read as "is (each successively) greater (than the prior), i.e., ascending.

Current:

is greater-equal NUMBERS...

Proposed:

is at-least NUMBER_OBJ NUMBER_SUBJ

are any descending (Not Implemented)

are all descending NUMBER NUMBER [NUMBER...]

Current:

is less NUMBERS...

Proposed:

is less-than NUMBER_OBJ NUMBER_SUBJ

are any ascending (Not Implemented)

are all ascending --unique NUMBER NUMBER [NUMBER...]

Current:

is less-equal NUMBERS...

Proposed:

is at-most NUMBER_OBJ NUMBER_SUBJ

are any ascending (Not Implemented)

are all ascending --unique NUMBER NUMBER [NUMBER...]

For the following, I believe there is no current implementation, although many of these have been discussed earlier in this thread.

Proposed:

is command COMMAND

are any commands COMMAND COMMAND [COMMAND...] (Not Implemented)

are all commands COMMAND COMMAND [COMMAND...] (Not Implemented)

Proposed:

is unique STRING (Not Implemented)

are any unique STRING STRING [STRING...] (Not Implemented)

are all unique STRING STRING [STRING...]

Alternate:
Remove all given implied by unique.

are unique STRING STRING [STRING...]

Cons: Redundant with not are any same STRING STRING [STRING...]

Proposed:
Test for whether a variable is both set and not an empty string.

is variable VARNAME

Pros: More readable than set --query VARNAME and set -q VARNAME.
Cons: Redundant. Less clear than just using both is set VARNAME and is non-empty $VARNAME

Proposed:
Test for whether two files have identical content.

is identical-content --sha256 FILE FILE

Alternate:
is equal-hashsum --sha256 FILE FILE

Regarding the "Ideas" section from @faho's earlier post:

  • I appreciate and agree with is true.
  • Same sentiment with is number (although would consider is numeric over is (a) number ). Would also consider adding is integer (although in that case is number would be preferable to is numeric for consistency.)
  • As you can see from the proposal, I certainly favor the idea of --any. In addition to the possible subcommand any versus argument --any, I would consider the value of having the explicitness of a required argument of either --any or --all.
  • I appreciate is prefix and perhaps would consider is prefixed-by.
  • See notempty for my thoughts about why keeping it might be worthwhile.
  • I am not particularly excited about using = for numeric commands, as I am not sure if >= or <= would require escaping redirection and I am not sure about the value-added of something such as is >= $x $y. I feel similarly about the test option naming scheme lt/le/gt/ge, particularly as the test built-in would remain an option.

Proposal B:

The following changes from Proposal A:

  • Make any and all arguments rather than subcommands, i.e.:

    are --any empty STRING STRING [STRING...]

    are --all empty STRING STRING [STRING...]

Proposal C:

The following changes from Proposal B:

  • Implement only a single built-in, are, designed for arrays. The cases with one or two arguments would be covered by the same command using the same syntax, i.e.:

is greater-than NUMBER_OBJ NUMBER_SUBJ (Not Implemented)

are all descending --unique NUMBER NUMBER [NUMBER...]

Pros: Easier to implement. Simpler usage pattern.

Cons: For some comparisons, such as those with two numbers, syntax such as is at-least 65 age may be preferable and/or more comprehensible than are all ascending 65 age.

Proposal D

The following changes from Proposal B:

  • Implement only a single built-in, but keep the name is rather than switching to are.

is greater-than NUMBER_OBJ NUMBER_SUBJ

are all descending --unique NUMBER NUMBER [NUMBER...] (Not Implemented)

Pros: Easier to implement. Simpler usage pattern.

Cons: May be less readable for some conditionals when using arrays; please see the comment under is greater in Proposal A for details.

Cons may be partially addressed by the following:

Sub-Proposal:
is descending --unique NUMBER NUMBER [NUMBER...]

Alternate:
is decreasing --unique NUMBER NUMBER [NUMBER...]

Apologies for the long-winded post.

@faho
Copy link
Member

faho commented Jan 14, 2021

Apologies for the long-winded post.

Oh, no worries, I see you've given it some thought!

Given is greater $x $y, one should read this as, "Is it greater than $x, this variable $y?", as opposed to "Is $x greater than $y".

It isn't. It's exactly the other way around. The idea is it returns the same thing as if the operator was between the arguments -

is greater a b c
# is true if a > b > c

I had written a longer answer and accidentally lost it, so let me summarize my criticisms with your proposals:

  • Some of these are pure renaming - that's okay, but not really a big material difference. Especially with greater-than I don't see what the than brings to the table.
  • The are/is dichotomy seems to be not worth all that much - one can be implemented in terms of the other with a for-loop here or there, and I don't think there's much to be gained from being able to check just one path. instead having two slightly different commands seems confusing, and requires us dipping into the global namespace twice for very common words.
  • I don't like having to specify --any or --all - that'll just annoy users who try is same foo bar and are told they have to pick. Pick one as a default and stick with it. The any and all subcommand idea seems to be basically the same thing as the options.
  • is command COMMAND is just command -q COMMAND
  • Some of your additional commands are neat ideas, but I'd leave them for later - that's the cool thing about subcommands, we can add more. Unfortunately is identical-content --sha256 FILE FILE requires a sha256 implementation in fish, and I'm not sure we want to take that on for such a small feature - sha256sum exists, you can just compare its output

Pros: More readable than set --query VARNAME and set -q VARNAME.

Yeah, that's... just your opinion. "More readable" is one of those empty phrases that's super easy to throw around. set -q is well-known, established and works well. Providing yet another way of doing it doesn't seem too useful.

I feel similarly about the test option naming scheme lt/le/gt/ge, particularly as the test built-in would remain an option.

We really really really want to move people away from test. test is probably the single cause of the majority of bugs in fish script today. It has foot guns that really should be eliminated with a better replacement. We do not want people using test just because they prefer the spelling.

Which means finding a carrot for the replacement, something that is awesome about it that makes you want to use it, and I'm not sure we've got it yet.

@alisraza
Copy link

alisraza commented Jan 14, 2021

It isn't. It's exactly the other way around. The idea is it returns the same thing as if the operator was between the arguments -

I'm sorry, I should have double-checked; I conflated Polish notation with verb-object-subject, but I can see above that @ridiculousfish said "verb subject object". In part, I confused myself because I was assuming consistency with contains, which I believe is verb-object-subject.

I had written a longer answer and accidentally lost it, so let me summarize my criticisms with your proposals:

I'm sorry to hear that! I know that can be incredibly frustrating.

* Some of these are pure renaming - that's _okay_, but not really a big material difference.

I agree that many of my suggestions are centered around renaming, although my intent was to underscore the importance of word-choice as it seems that you are moving closer toward implementation. I did not mean to imply we should cater to my personal preferences! Rather, I was hoping we could consider word-choice and start to develop a consensus based on the goals of minimal ambiguity and maximum consistency. My hope is to avoid replacing a large foot-gun with a small foot-pistol, so-to-speak.

Especially with greater-than I don't see what the than brings to the table.

Absolutely, I agree. This suggestion stems from my mistaken assumption that verb-object-subject order was being used, so the than was intended to indicate that the following word was the object. Given your clarification, I would actually argue that greater-than is worse than greater.

* The `are`/`is` dichotomy seems to be not worth all that much - one can be implemented in terms of the other with a `for`-loop here or there, and I don't think there's much to be gained from being able to check _just one_ path. instead having two slightly different commands seems confusing, and requires us dipping into the global namespace _twice_ for very common words.

Upon reflection, you've convinced me about not "dipping into the global namespace twice for common words." I think it may still be worth thinking about is versus are, but only in terms of ambiguity/consistency as mentioned above, i.e., is the following:

if are ascending $x $y $z

easier to understand, less ambiguous, etc., than:

if is greater $x $y $z
* I don't like having to specify `--any` or `--all` - that'll just annoy users who try `is same foo bar` and are told they have to pick. Pick one as a default and stick with it. The `any` and `all` subcommand idea seems to be basically the same thing as the options.

Okay, I understand the reasoning behind --all as an implied default, and I agree using sub-commands would just be syntax sugar. I am fairly ambivalent about this one.

* `is command COMMAND` is just `command -q COMMAND`

Sure, but when I first started learning fish, I thought of command as something to use when explicitly wanting to run an external command while avoiding aliases and built-ins. I don't feel strongly about this one, either, but the argument here would be discoverability.

* Some of your additional commands are neat ideas, but I'd leave them for later - that's the cool thing about subcommands, we can add more.

Sure, that's completely understandable.

Unfortunately is identical-content --sha256 FILE FILE requires a sha256 implementation in fish, and I'm not sure we want to take that on for such a small feature - sha256sum exists, you can just compare its output

I didn't realize it would require a sha256 implementation in fish; agree that it would not be worth it.

Pros: More readable than set --query VARNAME and set -q VARNAME.
Yeah, that's... just your opinion. "More readable" is one of those empty phrases that's super easy to throw around. set -q is well-known, established and works well. Providing yet another way of doing it doesn't seem too useful.

Was this a The Big Lebowski reference? Anyway, I'm not sure that I agree with it being an entirely "empty" phrase, but perhaps "more readable" was a bad choice of words. I suppose in part it depends on your intended audience, but I can see why from your point of view set -q seems well-known and perhaps even obvious. On the other hand, I do agree it would violate "the law of orthogonality" without a compelling reason for doing so.

I feel similarly about the test option naming scheme lt/le/gt/ge, particularly as the test built-in would remain an option.

We really really really want to move people away from test. test is probably the single cause of the majority of bugs in fish script today. It has foot guns that really should be eliminated with a better replacement. We do not want people using test just because they prefer the spelling.

I hear you.

Which means finding a carrot for the replacement, something that is awesome about it that makes you want to use it, and I'm not sure we've got it yet.

Seems like you're getting there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants