-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let '&' only separate as the first char of a word #7991
Conversation
Just one consistency note on collateral effects of this proposed change: currently, all redirections and boolean conjunctions work without a word separator. The change would special case echo bar>foo and false||echo foo would keep on being valid. Now, as you might have noticed, I am not a fan of accumulating special cases, because they make it progressively harder to explain and to memorise a language’s logic: “all statements separators work with or without surrounding word separators”, i.e. the current behaviour, is easy to explain and to apply. “Some do, some don’t, keep in mind which are which” is … far less so. Which is not to say the proposed change should not happen. However, it might be worth considering either circumscribing it strictly to the backgrounding operator (but that would mean reworking the parser, because it currently only looks ahead one character, unless I am mistaken) or extending it to all forms in affected areas, i.e. to all conjunctions, redirections and pipes ( |
To be honest I consider that horrible as well, but restricting this change to the boolean So there's two ways of going about this:
To be honest, every single style guide ever will tell you it's a horrible idea to not add a space before the pipe or the So I would be okay with the maximal solution, but the minimal one would solve 98% of the problem with much less of a compatibility break (especially given that fish's backgrounding is currently rather limited and hence won't be used much). (tbh that backgrounding made it into fish in that form was a mistake to begin with) |
An even more minimalist one: keep treating I agree that keeping the language definition simple is important but in this case we can improve a real use case, which might be more important? Third party implementations like highlighters will need updating but it's such a rare case that it hardly matters. |
… or, maybe, even more minimalist: do not treat And I agree with @faho the “no need for spaces” parsing of separators is really ugly – I was pretty surprised it worked that way, because I never wrote code like that before exploring edge cases for my syntax plugin. However, among unspaced forms, sticking the ampersand right after the command does seem to be the idiom most often found ITW. A minimalist approach might be the best way to reconcile both URL handling usability and existing code. |
I've been doing a bunch of command-line REST API banging recently and the behaviour of |
I like this fix! & constantly breaks URLs and this will prevent many such cases. I don't have strong opinions on |
I think everyone on this thread agrees changing Even if we say “whatever” to my personal bugbear, the inconsistency resulting from having I think a compromise – breaking as few user scripts as possible while improving URL handling – is possible by changing the behaviour of the |
Yeah the inconsistency between To be honest I'm tempted to go with:
|
e1570a4
to
d669fc3
Compare
I've pushed a combination of the suggestions from @kopischke and @faho.
(Only docs+changelog are missing) |
I've tried with the new debug category; assuming a user has seen the warning and wants to turn it off they would need to do pass a flag, which will also print some extra lines.
The output can be fixed, but it also seems inconvenient to have to add So what about using the feature-flags mechanism instead (easily settable via It probably doesn't matter much for For For |
src/parse_execution.cpp
Outdated
tok_is_string_character(pstree->src.at(next_char), false, none())) { | ||
FLOGF(deprecated, | ||
_(L"future versions of fish will not treat '&' as background operator when " | ||
L"followed by a word character. See 'status features'.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if this could also print a backtrace, so you can find the use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I forgot to push that - it's just a matter of adding parser->stack_trace().c_str()
. Anyway, leaving this for the next.
@@ -237,7 +239,8 @@ tok_t tokenizer_t::read_string() { | |||
} | |||
break; | |||
} | |||
} else if (mode == tok_modes::regular_text && !tok_is_string_character(c, is_first)) { | |||
} else if (mode == tok_modes::regular_text && | |||
!tok_is_string_character(c, is_first, this->token_cursor[1])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is token_cursor[1] required to exist here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, in the worst case it's the terminating L'\0'
which is not a word character, so it echo foo&
still backgrounds.
src/tokenizer.cpp
Outdated
/// first character. Hash (#) starts a comment if it's the first character in a token; otherwise it | ||
/// is considered a string character. See issue #953. | ||
static bool tok_is_string_character(wchar_t c, bool is_first) { | ||
bool tok_is_string_character(wchar_t c, bool is_first, maybe_t<wchar_t> next) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this name already exists, but above we're talking about a "word" character, which I like more. (okay for later cleanup!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm here I was probably wrong to use "word character", because chars like "$/
are included here.
The fish lexer knows 8 different token kinds (see token_type_t
); and basically everything that is not a separator is a "string".
I think I'll just change word -> token in the user-facing parts.
src/flog.h
Outdated
@@ -53,6 +53,8 @@ class category_list_t { | |||
category_t warning{L"warning", L"Warnings (on by default)", true}; | |||
category_t warning_path{ | |||
L"warning-path", L"Warnings about unusable paths for config/history (on by default)", true}; | |||
category_t deprecated{L"deprecated", L"Warnings about deprecated features (on by default)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should already start this as a subcategory - deprecated-ampersand
or so - so you can disable just that, without disabling all the deprecation warnings we might want to add.
Yeah, at least the default-enabled categories should not be printed. Or maybe remove that entirely, the user did request it explicitly. Only warn about non-matching ones. Or maybe just print ones that matched globs.
I'd probably allow $FISH_DEBUG to be set with this instead. If we manage to make that settable after startup, as a list (or with comma separators for importing from the environment - or also with spaces for importing when it has been set as a list). So you could add set -ga FISH_DEBUG -deprecation-ampersand-in-words or similar to config.fish to turn it off. The issue here is that it's tricky to not clobber the variable when you do want to debug something. The alternative is to add a subcommand to
It doesn't even have to expand successfully. I'd do it like this: If qmark-noglob is not set (and the warning isn't turned off), whenever we encounter a question mark glob, print the deprecation warning (with an explicit note on how to turn it off!) and a stacktrace.
Yeah, that one we should still need some tool to find it. The main issue with these deprecations, and why that one was even added as a feature flag (I think it's rarely used) is that they are tough to find. I can't even make a good 90%-regex like with stderr-nocaret ( Anyway, since it seems like there's a few things to be hashed out about the warning, what I'd do now is to merge this without the warning, and then figure out a good deprecation system later. |
Ok, I'll do that. I guess we could also start with warnings that cannot be turned off? Not sure. |
The commits without the warning are ready, I plan to merge one of these days. Maybe there's a better name - I used a short one to fit the existing output of
Maybe |
d669fc3
to
07fe5ea
Compare
This is opt-in through a new feature flag "ampersand-nobg-in-token". When this flag and "qmark-noglob" are enabled, this command no longer needs quoting: curl https://example.com/thing?foo=bar&duran=duran Compared to the previous approach e1570a4 ("Let '&' only separate as the first char of a word"), this has some advantages: 1. "&&" and "&>" are no longer affected. They are still special, even if used between tokens without spaces, like "echo bar&>foo". Maybe this is not really *better*, but it avoids risking to annoy users by breaking the old variant. 2. "&" is still special if at the end of a token, like in "sleep 1&". Word movement is not affected by the semantics change, so Alt-F and friends still stop at every "&".
So it can handle syntax changes that call for different formatting.
07fe5ea
to
eb3388d
Compare
This is a quick PR as a conversation starter.
Personally, I've always hated how
&
has this exalted position in the syntax - it terminates commands (sothing & otherthing
is two commands! you don't need to usething &; otherthing
) and it's even interpreted in the middle of words!thing&
andecho foo&
both trigger backgrounding,This is quite awkward in combination with URLs:
complains about
duran=duran
being invalid syntax (because it'sset duran duran
, duh)!So this makes it so
&
is only treated specially at the start of a word.Inside words, the
&
will simply be used as a regular old char.It's technically a compatibility break both with old fish and with
posix shells, but as it's always possible to just add a space I do not
consider that to be important.
&&
and&>
are also impacted - currently bothand
are allowed and do a
&>
redirection or&&
conjunction,respectively. I consider that awful behavior to begin with, and doubt anyone is using it on purpose.
TODOs: