Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let '&' only separate as the first char of a word #7991

Merged
merged 2 commits into from
Jul 23, 2021

Conversation

faho
Copy link
Member

@faho faho commented May 11, 2021

This is a quick PR as a conversation starter.

Personally, I've always hated how & has this exalted position in the syntax - it terminates commands (so thing & otherthing is two commands! you don't need to use thing &; otherthing) and it's even interpreted in the middle of words!

thing& and echo foo& both trigger backgrounding,

This is quite awkward in combination with URLs:

curl https://example.com/thing?foo=bar&duran=duran

complains about duran=duran being invalid syntax (because it's set duran duran, duh)!

So this makes it so & is only treated specially at the start of a word.

Inside words, the & will simply be used as a regular old char.

It's technically a compatibility break both with old fish and with
posix shells, but as it's always possible to just add a space I do not
consider that to be important.

&& and &> are also impacted - currently both

echo bar&>foo

and

echo bar&&echo foo

are allowed and do a &> redirection or && conjunction,
respectively. I consider that awful behavior to begin with, and doubt anyone is using it on purpose.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

@kopischke
Copy link

kopischke commented May 12, 2021

&& and &> are also impacted - currently both

echo bar&>foo

and

echo bar&&echo foo

are allowed and do a &> redirection or && conjunction,
respectively. I consider that awful behavior to begin with, and doubt anyone is using it on purpose.

Just one consistency note on collateral effects of this proposed change: currently, all redirections and boolean conjunctions work without a word separator. The change would special case &, but both

echo bar>foo

and

false||echo foo

would keep on being valid.

Now, as you might have noticed, I am not a fan of accumulating special cases, because they make it progressively harder to explain and to memorise a language’s logic: “all statements separators work with or without surrounding word separators”, i.e. the current behaviour, is easy to explain and to apply. “Some do, some don’t, keep in mind which are which” is … far less so. Which is not to say the proposed change should not happen. However, it might be worth considering either circumscribing it strictly to the backgrounding operator (but that would mean reworking the parser, because it currently only looks ahead one character, unless I am mistaken) or extending it to all forms in affected areas, i.e. to all conjunctions, redirections and pipes (&| would be affected by the proposed change too, wouldn’t it?).

@faho
Copy link
Member Author

faho commented May 12, 2021

Just one consistency note on collateral effects of this proposed change: currently, all redirections and boolean conjunctions work without a word separator.

To be honest I consider that horrible as well, but restricting this change to the boolean & and | symbols, whatever their use, would be enough.

So there's two ways of going about this:

  • A minimalist way, just for the background operator - echo foo& would print foo&, echo foo&&echo foo would print "foo" twice
  • A maximalist way, for & and | - echo foo&&foo would print foo&&foo, echo foo|echo foo would print foo|echo foo (and similar for || and &> and &| and...)

To be honest, every single style guide ever will tell you it's a horrible idea to not add a space before the pipe or the &&. fish_indent will add them. Not having spaces is horrible style and hard to read, and unintuitive. These are conceptually entirely separate things - in echo foo& the & has nothing at all to do with the foo.

So I would be okay with the maximal solution, but the minimal one would solve 98% of the problem with much less of a compatibility break (especially given that fish's backgrounding is currently rather limited and hence won't be used much).

(tbh that backgrounding made it into fish in that form was a mistake to begin with)

@krobelus
Copy link
Member

An even more minimalist one: keep treating & as separator, except when it's surrounded by two non-separating characters.
This would avoid invalidating all those sleep 1& in my history.

I agree that keeping the language definition simple is important but in this case we can improve a real use case, which might be more important? Third party implementations like highlighters will need updating but it's such a rare case that it hardly matters.
So far we had only (?) []{} as characters that are special depending on context, which causes problems once in a while.

@kopischke
Copy link

… or, maybe, even more minimalist: do not treat & as a backgrounding operator if it is followed by a non separating character. That would still misinterpret a dangling ampersand in a URL, but then, this is invalid (or rather: useless) in the URL too, so that should be one rare and easy to pinpoint error.

And I agree with @faho the “no need for spaces” parsing of separators is really ugly – I was pretty surprised it worked that way, because I never wrote code like that before exploring edge cases for my syntax plugin. However, among unspaced forms, sticking the ampersand right after the command does seem to be the idiom most often found ITW. A minimalist approach might be the best way to reconcile both URL handling usability and existing code.

@zanchey
Copy link
Member

zanchey commented May 16, 2021

I've been doing a bunch of command-line REST API banging recently and the behaviour of & is irritating me a lot, so I approve of a change to the behaviour; &... seems like a reasonable way of doing it, though &|/&;/etc. should probably continue to work.

@ridiculousfish
Copy link
Member

I like this fix! & constantly breaks URLs and this will prevent many such cases. I don't have strong opinions on echo foo&&echo bar, I don't personally use it but I imagine others might, but I'm up for trying it and seeing if anyone notices.

@kopischke
Copy link

kopischke commented May 21, 2021

I like this fix! & constantly break breaks URLs and this will prevent many such cases. I don't have strong opinions on echo foo&&echo bar, I don't personally use it but I imagine others might, but I'm up for trying it and seeing if anyone notices.

I think everyone on this thread agrees changing & handling to accommodate URLs is useful; the question is how invasive this change should be.

Even if we say “whatever” to my personal bugbear, the inconsistency resulting from having ls|cat - be valid, but ls&|cat - be invalid, it’s not just &&: the cmd arg& form of backgrounding is found ITW (see, for instance, @krobeluspost; heck, a quick rg run on the fish repo reveals instances of the pattern in one of fish’s own tests). And while I think breaking cmd1&cmd2 is fine (that really is abominably quirky), the proposed fix would also break the more reasonable pattern cmd1 arg&; cmd2 everywhere.

I think a compromise – breaking as few user scripts as possible while improving URL handling – is possible by changing the behaviour of the & operator less broadly (see @krobelus’ and my suggestions for ideas).

@faho
Copy link
Member Author

faho commented May 25, 2021

Yeah the inconsistency between && and || is unacceptable. We can have different kinds of operators act differently no problem (and do - $ is interpreted in double-quotes, * isn't), but the same level of operator should work the same.

To be honest I'm tempted to go with:

  • A minimalist solution just about & backgrounding
  • Tied to a feature flag
  • With a corresponding deprecation warning - we can have a special FLOG category for this that you can disable once you've seen the warnings. This would also be useful for qmark-noglob and regex-easyesc (and would have been great for stderr-nocaret)

@krobelus krobelus removed the RFC label Jun 29, 2021
@krobelus krobelus added this to the fish 3.4.0 milestone Jun 29, 2021
krobelus added a commit to faho/fish-shell that referenced this pull request Jun 29, 2021
@krobelus
Copy link
Member

krobelus commented Jun 29, 2021

I've pushed a combination of the suggestions from @kopischke and @faho.
The second commit adds the deprecation warning; that one could probably be improved a bit (maybe include a stack trace)

echo foo&&bar still has the old meaning, I think that's better for now because almost all operators behave like this.
So this is just a targeted fix for URLs.

(Only docs+changelog are missing)

krobelus added a commit to krobelus/fish-shell that referenced this pull request Jul 3, 2021
@krobelus
Copy link
Member

krobelus commented Jul 3, 2021

I've tried with the new debug category; assuming a user has seen the warning and wants to turn it off they would need to do pass a flag, which will also print some extra lines.

$ fish --debug="-deprecated"
Debug enabled for category: debug
Debug enabled for category: error
Debug enabled for category: warning
Debug enabled for category: warning-path

The output can be fixed, but it also seems inconvenient to have to add --debug=-foo everywhere.

So what about using the feature-flags mechanism instead (easily settable via $fish_features)?
We can simply not show any deprecation warnings if a feature is disabled explicitly.

It probably doesn't matter much for & - I don't expect users to want to disable that, but the mechanism is useful.

For qmark-noglob we can print a warning if a string like foo? successfully expands to something like foo1.

For regex-easyesc it seems harder to do because we don't know how many backslashes the user wants in the output.
Maybe there is a good heuristic.

tok_is_string_character(pstree->src.at(next_char), false, none())) {
FLOGF(deprecated,
_(L"future versions of fish will not treat '&' as background operator when "
L"followed by a word character. See 'status features'."));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if this could also print a backtrace, so you can find the use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I forgot to push that - it's just a matter of adding parser->stack_trace().c_str(). Anyway, leaving this for the next.

@@ -237,7 +239,8 @@ tok_t tokenizer_t::read_string() {
}
break;
}
} else if (mode == tok_modes::regular_text && !tok_is_string_character(c, is_first)) {
} else if (mode == tok_modes::regular_text &&
!tok_is_string_character(c, is_first, this->token_cursor[1])) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is token_cursor[1] required to exist here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in the worst case it's the terminating L'\0' which is not a word character, so it echo foo& still backgrounds.

tests/checks/features-ampersand-nobg1.fish Outdated Show resolved Hide resolved
/// first character. Hash (#) starts a comment if it's the first character in a token; otherwise it
/// is considered a string character. See issue #953.
static bool tok_is_string_character(wchar_t c, bool is_first) {
bool tok_is_string_character(wchar_t c, bool is_first, maybe_t<wchar_t> next) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this name already exists, but above we're talking about a "word" character, which I like more. (okay for later cleanup!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm here I was probably wrong to use "word character", because chars like "$/ are included here.

The fish lexer knows 8 different token kinds (see token_type_t); and basically everything that is not a separator is a "string".

I think I'll just change word -> token in the user-facing parts.

src/flog.h Outdated
@@ -53,6 +53,8 @@ class category_list_t {
category_t warning{L"warning", L"Warnings (on by default)", true};
category_t warning_path{
L"warning-path", L"Warnings about unusable paths for config/history (on by default)", true};
category_t deprecated{L"deprecated", L"Warnings about deprecated features (on by default)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should already start this as a subcategory - deprecated-ampersand or so - so you can disable just that, without disabling all the deprecation warnings we might want to add.

@faho
Copy link
Member Author

faho commented Jul 8, 2021

I've tried with the new debug category; assuming a user has seen the warning and wants to turn it off they would need to do pass a flag, which will also print some extra lines.

Yeah, at least the default-enabled categories should not be printed. Or maybe remove that entirely, the user did request it explicitly. Only warn about non-matching ones. Or maybe just print ones that matched globs.

The output can be fixed, but it also seems inconvenient to have to add --debug=-foo everywhere.

I'd probably allow $FISH_DEBUG to be set with this instead. If we manage to make that settable after startup, as a list (or with comma separators for importing from the environment - or also with spaces for importing when it has been set as a list).

So you could add

set -ga FISH_DEBUG -deprecation-ampersand-in-words

or similar to config.fish to turn it off. The issue here is that it's tricky to not clobber the variable when you do want to debug something.

The alternative is to add a subcommand to status - status disable-warning deprecation-ampersand-in-words, or status disable-log or status add-log -deprecation-'*'.

For qmark-noglob we can print a warning if a string like foo? successfully expands to something like foo1.

It doesn't even have to expand successfully. I'd do it like this:

If qmark-noglob is not set (and the warning isn't turned off), whenever we encounter a question mark glob, print the deprecation warning (with an explicit note on how to turn it off!) and a stacktrace.

For regex-easyesc it seems harder to do because we don't know how many backslashes the user wants in the output.

Yeah, that one we should still need some tool to find it. The main issue with these deprecations, and why that one was even added as a feature flag (I think it's rarely used) is that they are tough to find. I can't even make a good 90%-regex like with stderr-nocaret (grep '\^[&/]' finds most uses, with few false-positives where it's quoted already).


Anyway, since it seems like there's a few things to be hashed out about the warning, what I'd do now is to merge this without the warning, and then figure out a good deprecation system later.

@krobelus
Copy link
Member

I'd do now is to merge this without the warning, and then figure out a good deprecation system later.

Ok, I'll do that. I guess we could also start with warnings that cannot be turned off? Not sure.

@krobelus
Copy link
Member

krobelus commented Jul 10, 2021

The commits without the warning are ready, I plan to merge one of these days.

Maybe there's a better name - I used a short one to fit the existing output of status features

stderr-nocaret  on      3.0     ^ no longer redirects stderr
qmark-noglob    on      3.0     ? no longer globs
regex-easyesc   on      3.1     string replace -r needs fewer \'s
ampersand-nobg  on      3.4     & only backgrounds if followed by a separating character

Maybe ampersand-no-bg-in-token?

This is opt-in through a new feature flag "ampersand-nobg-in-token".

When this flag and "qmark-noglob" are enabled, this command no longer
needs quoting:

	curl https://example.com/thing?foo=bar&duran=duran

Compared to the previous approach e1570a4 ("Let '&' only separate as
the first char of a word"), this has some advantages:

1. "&&" and "&>" are no longer affected. They are still special, even
   if used between tokens without spaces, like "echo bar&>foo".
   Maybe this is not really *better*, but it avoids risking to annoy
   users by breaking the old variant.

2. "&" is still special if at the end of a token, like in "sleep 1&".

Word movement is not affected by the semantics change, so Alt-F and
friends still stop at every "&".
So it can handle syntax changes that call for different formatting.
@krobelus krobelus merged commit a2b3005 into fish-shell:master Jul 23, 2021
@faho faho deleted the background-first branch July 29, 2021 15:57
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants