-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of slowness compared to Bash (while or test problem?) #2007
Comments
I would expect them to take equal amounts of total time, except that fish will fully buffer the output (#1396) so you won't see anything until the command is complete. Does the fish version take more total time, or is it just more time before the first output is printed? |
Thanks for the quick response. I'm not noticing the buffering you mentioned--output from Fish seems constant and gradual--but it seems that fish is simply taking a lot longer to do it, both in real time and in CPU time, and is using about 10 times as much memory as Bash.
It's not a big deal really, because I still write most scripts in Bash. The situation here was that I was writing some functions to quickly find directories with find/locate, Percol, and Fish, and I ended up having to pipe through Bash instead of using Fish functions because Fish was so much slower (about 3 times slower wall-time here). (Well, I actually ended up piping through Python scripts, because that was even faster, but the point here is the difference between Bash and Fish. :) Thanks. |
Thanks, that's very useful. Looking at a sample, we're spending like 25% of our time in |
Ha ha ha, that's because |
Hmm, it looks like reading multiple bytes is not trivial. Challenge accepted, though. I was thinking, read N bytes at a time, process up to I wonder how Bash does this. |
Actually, Bash is reading one character at a time as well, since this is a pipe (and it can't actually do buffered IO). @ridiculousfish something else must be going on.. |
I'm sure you're right, probably many things are going on :) But this is definitely one of them. For every byte read, fish blocks and then unblocks signals. See read_blocked. That pthread_sigmask call is consuming 25% of the time. |
We're only blocking |
Honestly I don't understand why any signals need to be blocked there. That's why I've been afraid to touch it! Regarding SIGCHLD specifically, blocking that may have been necessary in the past, but the SIGCHLD handler is now completely trivial so it shouldn't be necessary to block. I'm OK with replacing that call to |
I want to revisit this one. |
Does fish have the capability of reading multiple character at a time? Will it be in the future? If |
When I profiled it a lot of it was not the one byte buffer size, but instead manipulating the signal mask for every byte read. I think this can be greatly simplified. |
Is this also why |
No, this is specific to the non-interactive case. I have never found fish to be slow in pasting. Can you share your environment? Maybe #2215? |
My environment is:
Maybe the cause is using a slow computer, but I don't find pasting slow when using |
I've found another example of Fish being very slow and CPU-intensive compared to Bash: ➤ for i in one two three four five
# Populate some buckets
dmesg | head -n50 | ./bucket.sh $i
end
➤ time ./bucket.sh -DVl
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+8outputs (0major+1494minor)pagefaults 0swaps
➤ time ./bucket.fish -DVl # List buckets
0.24user 0.10system 0:00.39elapsed 89%CPU (0avgtext+0avgdata 20536maxresident)k
0inputs+488outputs (0major+47543minor)pagefaults 0swaps This is using Bucket, a simple script that stores and retrieves text in files in a directory. There's a Bash and a Fish version, and they both do exactly the same thing, nearly a verbatim translation from Bash to Fish. But the Fish version is very slow compared to the Bash version. |
@alphapapa: There's a few possible bottlenecks in that script - among them
I'd suggest you run it with |
Hey faho, thanks for the quick response. Yeah, I use I actually just switched from using a glob to I guess I haven't been keeping up though, because I wasn't even aware of the Fish profiling. I'll have to check it out. Thanks. |
The idea was that you're running it once per loop, while for bash you're doing it once - O(n) vs O(1).
The standard place to look for bottlenecks in shellscripting is anytime it calls an outside utility. Now, ls isn't too slow (and it's more that parsing ls output isn't a great habit to get into - though it's less bad with fish than with bash, fish will still fail on newlines in filenames).
Yeah, the profile will show what's really going on. |
Yeah, that's the way
Well, this is what seems interesting to me. For example, in Bash, sometimes it may be faster to use built-in string substitutions and arrays; but for data beyond a certain size (though I'm not sure where the line is), it's faster to pipe to external utilities like So I guess my question would be, is Fish significantly less efficient at doing that than Bash is? Also, compared to Bash, is there a lot of overhead when calling shell functions? I'm guessing the answers may be, to some extent, yes. So then my question would be, will Fish ever be as optimized as Bash (or zsh or dash, etc) for scripting? Or should I expect to only use Fish scripting for very simple, one-off tasks? |
(My neck hair is standing up for some reason)
I haven't seen any of that - at least not significantly. Anything I write tends to work fine - maybe my style tends to work better.
I think I speak for all of us here when I say that fish should be completely suitable for scripting - anything else is a bug. |
Retargeting to after 2.3. |
I don't think we should compare |
@pickfire Why? :) Since bash is the most common shell, shouldn't fish consider it its primary competition? |
@alphapapa Nope, |
Haha, so what you mean is, let's make fish the fastest shell. :) |
@alphapapa, the recent fish seems to have a slowdown, weird. It is even slow than bash now. More info #2776 |
I love a good performance problem so I'm taking ownership of this. The first step is eliminating irrelevancies (e.g., the Running
five times yields an average elapsed time of 120.63 seconds, user mode of 48.04, and sys mode of 71.39. The equivalent test using bash (installed via Homebrew, not the macOS version) yields an average elapsed time of 13.41 seconds, user mode of 10.79, and sys mode of 2.54. So the fish to bash ratio for elapsed time is 9.0, user mode is 4.5, and sys mode is 28.1. Obviously the main issue is the amount of time fish spends executing syscalls. The signal manipulation identified by @ridiculousfish is obviously a prime candidate for the slow down and a good place to start improving. P.S., I'm only going to address the performance discrepancy in the original comment. It's important that there be one problem per issue to minimize confusion and ensure we know when the issue can be closed. If there are other scenarios that show egregious differences in performance when comparing fish to a competing shell please open a new issue. |
Awesome to have someone looking at this, especially krader! TBH I'm unclear on what blocking signals is meant to accomplish here - it was like this from the fish 1.x days but I don't know why. Is it simply to avoid having to retry on EINTR? |
There are obviously some problems with signal management. The first thing I noticed in looking into this issue is that Also, the The core problem is that the |
Yeah, |
The |
I used the macOS
Notice that the |
I reduced the number of paths to the first 100,000 from
So my first attempt at improving the performance of the benchmark in this Using Consider these two syscall profiles for the first 60 seconds of run time of Top ten syscalls, by elapsed time, for bash:
Top ten syscalls, by elapsed time, for my improved fish:
Notice that in the same interval bash manages to do more than 2x times as
|
I've implemented a chunked read implementation. In the table below the results labeled "w/perf improvements" includes both the reduced signal manipulation improvements plus the chunked read enhancement. That yields a 72% reduction in run time. We're now only 2.4x slower than bash versus 8.3x slower with the git head version of fish. Not great but a damn sight better. I'd be satisfied if we were within a factor of 2x.
|
The instrumented syscall summary for the fish version labeled "w/perf improvements" in my previous comment looks like this:
TBD is where the |
The
I have absolutely no idea why the main thread would fork to execute a builtin command. If we got rid of that fork fish would be reasonably close to bash on this benchmark. Running the benchmark under
This is another manifestation of issue #1396: fish buffering the output of not just functions but blocks in general. Fixing that is far outside the scope of this issue. |
fish probably forks |
Yes, the actual benchmark is doing
I still don't see why that needs to fork. The relevant comment in src/exec.cpp says we have to fork so that the file will be truncated even if the block produces no output. But that seems rather silly since the truncation should have been done when the redirection was setup. I'm guessing that if I look at the code more closely I'll see that it is deferring the opening of the file for some reason. In any event I think that's a problem for another day. Specifically, after the code in that module has been refactored to eliminate most of the oclint complexity warnings. Right now I'd simply like as many people to build and run a fish with this patch as possible. Even if just for a few minutes or hours. I'm nervous that the change to how signals are managed might only exhibit problems in the face of extensive use of the |
The shell was doing a log of signal blocking/unblocking that hurts performance and can be avoided. This reduced the elapsed time for a simple benchmark by 25%. Partial fix for #2007
Refactor `builtin_read()` to split the code that does the actual reading into separate functions. This introduces the `read_in_chunks()` function but in this change it is just a clone of `read_one_char_at_a_time()`. It will be modified to actually read in chunks in the next change. Partial fix for #2007
Provide a more efficient method for reading lines (or null terminated sequences) when the input is seekable. Another partial fix for #2007
I'm going to close this because the two most significant improvements to the scenario discussed in this issue have been merged. Note that one of the improvements is gated on the variable |
One thing I noticed is that ctrl-C is less reliable with |
Kurtis, fantastic work on this, thanks so much. If I have time, I'll see if I can test with the env var set as you described. |
This is relatively quick:
But this is painfully slow:
The text was updated successfully, but these errors were encountered: