Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed history kills fish #7497

Closed
jos128 opened this issue Nov 22, 2020 · 10 comments
Closed

Malformed history kills fish #7497

jos128 opened this issue Nov 22, 2020 · 10 comments
Labels
bug Something that's not working as intended
Milestone

Comments

@jos128
Copy link

jos128 commented Nov 22, 2020

Trying to discover multi-line command blocks to use with time, I discovered a bug instead:

fish@host $> sh -c 'env HOME=$(mktemp -d) fish'
1. Enter malformed command "block", containing a variable not set, or not set to danger yet...
debug_fish@host $> time {
                   random choice $words
                   random choice $words
                   }
fish: The expanded command was empty.
time {
random choice $words
random choice $words
}
     ^
  • At this point, bringing it up again with 🠅, parsing seems fine, nothing breaks...
2. Setting the variable as a somewhat long list (e.g. set words a b c, won't break fish)
debug_fish@host $> set words (cat /usr/share/dict/words)
debug_fish@host $>
3. Recalling the command history until the malformed command appears
  • 🠅, 🠅
  • Get malformed history entry, wait a few seconds or whatever...
debug_fish@host$> time {
                  random choice $words
                  random choice $words
                  }Killed
fish@host $>
  • The few seconds before fish breaks, the syntax highlighting is all over the place, which to me suggests syntax parsing gets lost at expanding those long lists.
  • It's kind of nasty, because with the delay you may very well skip over the malformed command many entries ago, getting fish broken somewhere unrelated.
  • Reproducing with fish -d 4 only shows the shell died by SIGKILL.

That's all I got <3


  • Fish version 3.1.2
  • Ubuntu 20.10, Wayland, Gnome terminal
@jos128
Copy link
Author

jos128 commented Nov 23, 2020

Because I stumbled upon this organically, my attempt reducing the reproduction apparently made things a little bit too complicated. Again... XD

It's not limited to recalling history, but just typing the last }, if $words expands to something rather huge. Actually, it can be reduced (duh..) to typing this:

fish@host $> set var (cat /usr/share/dict/words)
fish@host $> time {$var;$var}

(one $var seems barely survivable here...)

So it works the other way too. The mentioned delay just allows for hitting enter and writing this parser trap into history, where it can be re-triggered, if the variable gets set.

@faho faho added the bug Something that's not working as intended label Nov 23, 2020
@faho faho added this to the fish 3.2.0 milestone Nov 23, 2020
@mqudsi
Copy link
Contributor

mqudsi commented Nov 25, 2020

Is that the OOM killer kicking in? We've had similar reports and discussed just limiting dynamic substitution output at the prompt.

@jos128
Copy link
Author

jos128 commented Nov 25, 2020

Does the syntax highlighter (I assume that's what's exploding here) really need to expand those variables at all? Can't this be done in a symbolic way?

As a naive user, I wouldn't mind actually overloading the shell on execution by accident, much less if the shell aborts gracefully and notifies me about hitting OOM. However, just typing something should never ever hit any limit or even use up significant resources IMO.

@krobelus
Copy link
Member

Does the syntax highlighter (I assume that's what's exploding here) really need to expand those variables at all? Can't this be done in a symbolic way?

It does because the variables are in command position. $PWD/doesnotexist is highlighted red, unless the file exists and is executable.

@jos128
Copy link
Author

jos128 commented Nov 25, 2020

Ah, alright, because of time. Is there a use case where $var needs to be evaluated in command position beyond $var[1]? I mean you only ever run one command, one path there, no?

Experimenting, it seems fish isn't caring about anything else but $var[1] anyway, so why would this cause fish to crash on extensive $vars? E.g. I cannot run fish as $PATH/fish, or call fish as set var / usr / bin / fish, or even set var ' ' /usr/bin/fish.

This does not break fish:

fish@host $> set var (cat /usr/share/dict/words)
fish@host $> time {$var[1];$var[1]}

It appears, evaluating only ever $var[1] at command position would be an easy fix for this problem.

However, apparently fish also crashes when the variable is not in command position (as I understand it): echo {$var $var} (or separated by ; or ,) breaks fish, but neither echo {$var$var}, nor echo $var $var does the trick.

@ridiculousfish
Copy link
Member

$var in command position expands to a command and arguments, e.g.:

> set cmd echo abc ; $cmd
abc

we should have some sort of protection against very large expansions.

@krobelus
Copy link
Member

Trying to discover multi-line command blocks to use with time

time begin; ...; end should work

It appears, evaluating only ever $var[1] at command position would be an easy fix for this problem.

Yes, that's clever, hopefully we can do that without adding too much complexity. I'll try.
Related: #7495 (comment).

What makes this blow up so fast is the fact that cartesian product expansion happens inside

> set 5 (seq 5)
> echo { $5.$5 }
1.1 2.1 3.1 4.1 5.1 1.2 2.2 3.2 4.2 5.2 1.3 2.3 3.3 4.3 5.3 1.4 2.4 3.4 4.4 5.4 1.5 2.5 3.5 4.5 5.5
>

Curiously, the same happens when there is a space between the variables like echo { $5 $5 }. Spaces inside brace expansions don't separate words.

So when typing that command, fish tries to compute autosuggestions for any of the list elements.
This makes a ton of sense for a commandline like ls /{bin,sbin}/fish.
But with a variable of 1000 entries, the expanded version will have a million entries. I believe fish runs wildcard_expand_string for them all which results in a million stat(2) calls.

@jos128
Copy link
Author

jos128 commented Nov 27, 2020

@ridiculousfish Thanks, that's actually a rather obvious concept, I didn't think of. Though, I think the syntax checker should still stop expansion after the first list entry, verifying a command, as there is visually no way to tell the error position apart, anyway, and it's constructed enough, to only fail at execution and still be user friendly, IMO.

@krobelus Thank you, for the tip. I figured out the command block syntax myself, in the end (via RTFM). However, I think my experimentation above isn't completely unreasonable, I think, in terms of discovering features for a new user.

The syntax parsing of the cartesian product only breaks fish, if written in braces, which is why I think there might be something else broken there (may explain the curious case of , too). E.g. writing echo $var.$var is totally fine. Personally, I think the syntax check should absolutely not expand a cartesian product apart from very trivial cases, as... well, something something Turing XD

Tho, executing the echo line breaks fish still, which I think is an issue in itself, as the load is easily estimated in advance and/or could be handled procedurally. Checking the size of $var won't introduce significant overhead, would it? Now, memory explodes before a single line is printed. I feel like, although $words$words is very much quite the load, it's still something a programming environment should be able to handle, or if there is some implicit data complexity underneath, abort in advance.

Also, could this be an integer overflow? As 100.000^2 exceeds 2^32.

(I would love to help with the code, instead of having opinions here, but C++ and a project fish's size is beyond my scope. Thank you all, for being this involved and active <3)

Edit: Sorry, I somehow missed the last paragraph, which makes some of my reasoning seem obsolete, or rather ignorant. Still, my point about the cartesian product stands, It think.

@krobelus
Copy link
Member

krobelus commented Dec 23, 2020

Fixed by a8080e8, see also #7407 (comment)

Tho, executing the echo line breaks fish still, which I think is an issue in itself

This is fixed by 594a6a3, it looks like this:

$ echo $var$var
fish: Expansion produced too many results

I would love to help with the code, instead of having opinions here, but C++ and a project fish's size is beyond my scope.

The source should be easy to read - if it's not we should change that. Finding out how to make a change is usually not that hard,
plus you can always ask. (Edit: that sounds weird, of course it can get really complex..)
The hard part is knowing what to do ;)

@jos128
Copy link
Author

jos128 commented Dec 23, 2020

Thank you for fixing this. <3

@krobelus, I have no doubt the code is of high quality and readable. However, a project this size to me has the main problem of overcoming the complexity of the code base itself. I think experienced developers tend to forget what that looks like at the beginning.

For example, if I wanted to understand how fish's parser works, where would I start? What would it take?

To me it looks like src/parser.cpp would be a good start. But then what? The code doesn't mean anything to me. I don't know what I am looking at. Don't get me wrong, I don't expect a book on parsers, but I think beginners would really greatly benefit from a short summery of what each file does and some broad introduction to the core principles at hand, e.g. parser idiom, terms and keywords for further exploration; answering the question "What is needed to understand this code?". This is generally the main hurt for stepping out of tutorial hell, I think, and the reason more people aren't contributing. Of course, this applies to pretty much any OSS project and I totally see how this is a major overhead to those actually doing some work, easy to see why it's there and why it's there.

Off-topic: I am actually thinking about this from time to time. The opensource ecosystem would be much more approachable, if we had a thorough "How to even start?" documented in detail for some exemplary projects. I imagine a service where users could share notes on repository exploration, or experienced developers could explain their take on a code base of a remarkable project (as a tribute? a love letter?). Maybe a repository snapshot on the left (this doesn't need to be the latest state; future retakes could be pointed out as valuable "crime scene" analysis, too), with a wiki-like article on every file and hierarchy level on the right; references to code sections within the file or the global scope; links to fundamental algorithms, conventions or idioms.

I assume here most projects aren't really that dissimilar in the end, but all of them are very different from where you were left off learning the basics. With much conceptual bloat from modularization, testing, building, git and documentation. And this entry barrier is actually even selecting for somewhat unrelated qualities, not inherently linked to the potential quality of contribution. That is, of course, if you don't make a circular argument for the suggested problem at hand.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something that's not working as intended
Projects
None yet
Development

No branches or pull requests

5 participants