New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of slowness compared to Bash (while or test problem?) #2007

Closed
alphapapa opened this Issue Apr 2, 2015 · 42 comments

Comments

Projects
None yet
6 participants
@alphapapa

alphapapa commented Apr 2, 2015

This is relatively quick:

locate $HOME | bash -c 'while read dir; do [[ -d $dir ]] && echo $dir; done'

But this is painfully slow:

function onlyDirs
    while read dir
        test -d $dir; and echo $dir
    end
end

locate $HOME | onlyDirs
@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Apr 2, 2015

Member

I would expect them to take equal amounts of total time, except that fish will fully buffer the output (#1396) so you won't see anything until the command is complete. Does the fish version take more total time, or is it just more time before the first output is printed?

Member

ridiculousfish commented Apr 2, 2015

I would expect them to take equal amounts of total time, except that fish will fully buffer the output (#1396) so you won't see anything until the command is complete. Does the fish version take more total time, or is it just more time before the first output is printed?

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Apr 2, 2015

Thanks for the quick response.

I'm not noticing the buffering you mentioned--output from Fish seems constant and gradual--but it seems that fish is simply taking a lot longer to do it, both in real time and in CPU time, and is using about 10 times as much memory as Bash.

$cat testfish.fish 
#!/usr/bin/fish
function onlyDirs
        while read dir
                test -d $dir; and echo $dir
        end
end
locate $HOME | onlyDirs

$ cat testbash.sh
#!/bin/bash
locate $HOME | bash -c 'while read dir; do [[ -d $dir ]] && echo $dir; done'

$ time ./testfish.fish >/dev/null
26.77user 16.72system 0:43.35elapsed 100%CPU (0avgtext+0avgdata 37520maxresident)k
0inputs+104outputs (0major+24078minor)pagefaults 0swaps

$ time ./testbash.sh >/dev/null
6.56user 7.50system 0:13.86elapsed 101%CPU (0avgtext+0avgdata 3124maxresident)k
0inputs+0outputs (0major+961minor)pagefaults 0swaps

It's not a big deal really, because I still write most scripts in Bash. The situation here was that I was writing some functions to quickly find directories with find/locate, Percol, and Fish, and I ended up having to pipe through Bash instead of using Fish functions because Fish was so much slower (about 3 times slower wall-time here). (Well, I actually ended up piping through Python scripts, because that was even faster, but the point here is the difference between Bash and Fish. :)

Thanks.

alphapapa commented Apr 2, 2015

Thanks for the quick response.

I'm not noticing the buffering you mentioned--output from Fish seems constant and gradual--but it seems that fish is simply taking a lot longer to do it, both in real time and in CPU time, and is using about 10 times as much memory as Bash.

$cat testfish.fish 
#!/usr/bin/fish
function onlyDirs
        while read dir
                test -d $dir; and echo $dir
        end
end
locate $HOME | onlyDirs

$ cat testbash.sh
#!/bin/bash
locate $HOME | bash -c 'while read dir; do [[ -d $dir ]] && echo $dir; done'

$ time ./testfish.fish >/dev/null
26.77user 16.72system 0:43.35elapsed 100%CPU (0avgtext+0avgdata 37520maxresident)k
0inputs+104outputs (0major+24078minor)pagefaults 0swaps

$ time ./testbash.sh >/dev/null
6.56user 7.50system 0:13.86elapsed 101%CPU (0avgtext+0avgdata 3124maxresident)k
0inputs+0outputs (0major+961minor)pagefaults 0swaps

It's not a big deal really, because I still write most scripts in Bash. The situation here was that I was writing some functions to quickly find directories with find/locate, Percol, and Fish, and I ended up having to pipe through Bash instead of using Fish functions because Fish was so much slower (about 3 times slower wall-time here). (Well, I actually ended up piping through Python scripts, because that was even faster, but the point here is the difference between Bash and Fish. :)

Thanks.

@ridiculousfish ridiculousfish added this to the fish-future milestone Apr 2, 2015

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Apr 2, 2015

Member

Thanks, that's very useful. Looking at a sample, we're spending like 25% of our time in pthread_sigmask - very surprising.

Member

ridiculousfish commented Apr 2, 2015

Thanks, that's very useful. Looking at a sample, we're spending like 25% of our time in pthread_sigmask - very surprising.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Apr 2, 2015

Member

Ha ha ha, that's because builtin_read reads one byte at a time in the non-interactive case. It is that way in fish 1.x too.

Member

ridiculousfish commented Apr 2, 2015

Ha ha ha, that's because builtin_read reads one byte at a time in the non-interactive case. It is that way in fish 1.x too.

@snnw

This comment has been minimized.

Show comment
Hide comment
@snnw

snnw Apr 9, 2015

Contributor

Hmm, it looks like reading multiple bytes is not trivial. Challenge accepted, though.

I was thinking, read N bytes at a time, process up to \n, \0, or eof, then lseek back to the start of the next line. However, we can't seek is stdin is a pipe. Can we just reuse the buffer in that case, and hope nobody notices that we've bitten off more stdin than we can chew? Sounds dangerous...

I wonder how Bash does this.

Contributor

snnw commented Apr 9, 2015

Hmm, it looks like reading multiple bytes is not trivial. Challenge accepted, though.

I was thinking, read N bytes at a time, process up to \n, \0, or eof, then lseek back to the start of the next line. However, we can't seek is stdin is a pipe. Can we just reuse the buffer in that case, and hope nobody notices that we've bitten off more stdin than we can chew? Sounds dangerous...

I wonder how Bash does this.

@snnw

This comment has been minimized.

Show comment
Hide comment
@snnw

snnw Apr 9, 2015

Contributor

Actually, Bash is reading one character at a time as well, since this is a pipe (and it can't actually do buffered IO). @ridiculousfish something else must be going on..

Contributor

snnw commented Apr 9, 2015

Actually, Bash is reading one character at a time as well, since this is a pipe (and it can't actually do buffered IO). @ridiculousfish something else must be going on..

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Apr 9, 2015

Member

I'm sure you're right, probably many things are going on :) But this is definitely one of them. For every byte read, fish blocks and then unblocks signals. See read_blocked. That pthread_sigmask call is consuming 25% of the time.

Member

ridiculousfish commented Apr 9, 2015

I'm sure you're right, probably many things are going on :) But this is definitely one of them. For every byte read, fish blocks and then unblocks signals. See read_blocked. That pthread_sigmask call is consuming 25% of the time.

@snnw

This comment has been minimized.

Show comment
Hide comment
@snnw

snnw Apr 10, 2015

Contributor

We're only blocking SIGCHLD, right? Would it make sense to call pthread_sigmask once per call to builtin_read? Honestly, I don't really understand why SIGCHLD needs to be blocked.

Contributor

snnw commented Apr 10, 2015

We're only blocking SIGCHLD, right? Would it make sense to call pthread_sigmask once per call to builtin_read? Honestly, I don't really understand why SIGCHLD needs to be blocked.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Apr 10, 2015

Member

Honestly I don't understand why any signals need to be blocked there. That's why I've been afraid to touch it!

Regarding SIGCHLD specifically, blocking that may have been necessary in the past, but the SIGCHLD handler is now completely trivial so it shouldn't be necessary to block.

I'm OK with replacing that call to read_blocked, and maybe all such calls, with read_loop (which doesn't block signals) and seeing if anything goes wrong, but not for fish 2.2.

Member

ridiculousfish commented Apr 10, 2015

Honestly I don't understand why any signals need to be blocked there. That's why I've been afraid to touch it!

Regarding SIGCHLD specifically, blocking that may have been necessary in the past, but the SIGCHLD handler is now completely trivial so it shouldn't be necessary to block.

I'm OK with replacing that call to read_blocked, and maybe all such calls, with read_loop (which doesn't block signals) and seeing if anything goes wrong, but not for fish 2.2.

@ridiculousfish ridiculousfish modified the milestones: next-2.x, fish-future Jul 27, 2015

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jul 27, 2015

Member

I want to revisit this one.

Member

ridiculousfish commented Jul 27, 2015

I want to revisit this one.

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Jul 27, 2015

Contributor

Does fish have the capability of reading multiple character at a time? Will it be in the future?

If fish and bash is reading one character at the same time, why is bash a lot faster than fish?

Contributor

pickfire commented Jul 27, 2015

Does fish have the capability of reading multiple character at a time? Will it be in the future?

If fish and bash is reading one character at the same time, why is bash a lot faster than fish?

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jul 27, 2015

Member

When I profiled it a lot of it was not the one byte buffer size, but instead manipulating the signal mask for every byte read. I think this can be greatly simplified.

Member

ridiculousfish commented Jul 27, 2015

When I profiled it a lot of it was not the one byte buffer size, but instead manipulating the signal mask for every byte read. I think this can be greatly simplified.

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Aug 1, 2015

Contributor

Is this also why fish is slow on pasting?

Contributor

pickfire commented Aug 1, 2015

Is this also why fish is slow on pasting?

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Aug 1, 2015

Member

No, this is specific to the non-interactive case.

I have never found fish to be slow in pasting. Can you share your environment? Maybe #2215?

Member

ridiculousfish commented Aug 1, 2015

No, this is specific to the non-interactive case.

I have never found fish to be slow in pasting. Can you share your environment? Maybe #2215?

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Aug 1, 2015

Contributor

My environment is:

BROWSER=firefox
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-sfCl1KjoFi,guid=7a2da8d2b1a188ea06813d3755bcbacc
DESKTOP_STARTUP_ID=i3/|usr|local|bin|st/424-4-alarmpi_TIME57211
DISPLAY=:0
EDITOR=vim
GTK_IM_MODULE=fcitx
HOME=/home/ivan
LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8
LC_MESSAGES=C
LESS=-i -M -r
LESSOPEN=| /usr/bin/lesspipe.sh %s
LESS_TERMCAP_mb=
LESS_TERMCAP_md=
LESS_TERMCAP_me=
LESS_TERMCAP_se=
LESS_TERMCAP_so=
LESS_TERMCAP_ue=
LESS_TERMCAP_us=
LOGNAME=ivan
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
MAIL=/var/spool/mail/ivan
PATH=/opt:/usr/local/bin:/usr/bin
PWD=/home/ivan
QT_IM_MODULE=fcitx
SHELL=/usr/local/bin/fish
SHLVL=4
TERM=st-256color
USER=ivan
VISUAL=vim
WINDOWID=31457281
WINDOWPATH=1
XAUTHORITY=/home/ivan/.Xauthority
XDG_RUNTIME_DIR=/run/user/1000
XDG_SEAT=seat0
XDG_SESSION_ID=c1
XDG_VTNR=1
XMODIFIERS=@im=fcitx
__fish_bin_dir=/usr/local/bin
__fish_datadir=/usr/local/share/fish
__fish_help_dir=/usr/local/share/doc/fish
__fish_sysconfdir=/usr/local/etc/fish
fish_user_paths=/opt/usr/local/bin/usr/bin
http_proxy=http://localhost:8123/

Maybe the cause is using a slow computer, but I don't find pasting slow when using bash.

Contributor

pickfire commented Aug 1, 2015

My environment is:

BROWSER=firefox
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-sfCl1KjoFi,guid=7a2da8d2b1a188ea06813d3755bcbacc
DESKTOP_STARTUP_ID=i3/|usr|local|bin|st/424-4-alarmpi_TIME57211
DISPLAY=:0
EDITOR=vim
GTK_IM_MODULE=fcitx
HOME=/home/ivan
LANG=en_GB.UTF-8
LC_CTYPE=en_GB.UTF-8
LC_MESSAGES=C
LESS=-i -M -r
LESSOPEN=| /usr/bin/lesspipe.sh %s
LESS_TERMCAP_mb=
LESS_TERMCAP_md=
LESS_TERMCAP_me=
LESS_TERMCAP_se=
LESS_TERMCAP_so=
LESS_TERMCAP_ue=
LESS_TERMCAP_us=
LOGNAME=ivan
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
MAIL=/var/spool/mail/ivan
PATH=/opt:/usr/local/bin:/usr/bin
PWD=/home/ivan
QT_IM_MODULE=fcitx
SHELL=/usr/local/bin/fish
SHLVL=4
TERM=st-256color
USER=ivan
VISUAL=vim
WINDOWID=31457281
WINDOWPATH=1
XAUTHORITY=/home/ivan/.Xauthority
XDG_RUNTIME_DIR=/run/user/1000
XDG_SEAT=seat0
XDG_SESSION_ID=c1
XDG_VTNR=1
XMODIFIERS=@im=fcitx
__fish_bin_dir=/usr/local/bin
__fish_datadir=/usr/local/share/fish
__fish_help_dir=/usr/local/share/doc/fish
__fish_sysconfdir=/usr/local/etc/fish
fish_user_paths=/opt/usr/local/bin/usr/bin
http_proxy=http://localhost:8123/

Maybe the cause is using a slow computer, but I don't find pasting slow when using bash.

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Oct 15, 2015

I've found another example of Fish being very slow and CPU-intensive compared to Bash:

for i in one two three four five
      # Populate some buckets
      dmesg | head -n50 | ./bucket.sh $i 
  endtime ./bucket.sh -DVl
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+8outputs (0major+1494minor)pagefaults 0swaps

➤ time ./bucket.fish -DVl  # List buckets
0.24user 0.10system 0:00.39elapsed 89%CPU (0avgtext+0avgdata 20536maxresident)k
0inputs+488outputs (0major+47543minor)pagefaults 0swaps

This is using Bucket, a simple script that stores and retrieves text in files in a directory. There's a Bash and a Fish version, and they both do exactly the same thing, nearly a verbatim translation from Bash to Fish. But the Fish version is very slow compared to the Bash version.

alphapapa commented Oct 15, 2015

I've found another example of Fish being very slow and CPU-intensive compared to Bash:

for i in one two three four five
      # Populate some buckets
      dmesg | head -n50 | ./bucket.sh $i 
  endtime ./bucket.sh -DVl
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+8outputs (0major+1494minor)pagefaults 0swaps

➤ time ./bucket.fish -DVl  # List buckets
0.24user 0.10system 0:00.39elapsed 89%CPU (0avgtext+0avgdata 20536maxresident)k
0inputs+488outputs (0major+47543minor)pagefaults 0swaps

This is using Bucket, a simple script that stores and retrieves text in files in a directory. There's a Bash and a Fish version, and they both do exactly the same thing, nearly a verbatim translation from Bash to Fish. But the Fish version is very slow compared to the Bash version.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Oct 15, 2015

Member

@alphapapa: There's a few possible bottlenecks in that script - among them

while set optarg (getopts "a:append d:date D:debug e:edit E:empty g:grep h:help l:list v:verbose V:VERBOSE x:expire" $argv) (line 59 - something you don't do in the bash version). I'd also recommend not relying on ls output in lines 170 and 175 - use globs instead (though you then might need to remove the path - call basename or sed or string to do that, or cd into the path before).

I'd suggest you run it with fish -p fish.prof bucket.fish, and then examine fish.prof with sort -k 1 --numeric-sort fish.prof. Do keep in mind however that fish always reads all the config files, so your config.fish will show up in that.

Member

faho commented Oct 15, 2015

@alphapapa: There's a few possible bottlenecks in that script - among them

while set optarg (getopts "a:append d:date D:debug e:edit E:empty g:grep h:help l:list v:verbose V:VERBOSE x:expire" $argv) (line 59 - something you don't do in the bash version). I'd also recommend not relying on ls output in lines 170 and 175 - use globs instead (though you then might need to remove the path - call basename or sed or string to do that, or cd into the path before).

I'd suggest you run it with fish -p fish.prof bucket.fish, and then examine fish.prof with sort -k 1 --numeric-sort fish.prof. Do keep in mind however that fish always reads all the config files, so your config.fish will show up in that.

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Oct 15, 2015

Hey faho, thanks for the quick response.

Yeah, I use getopt in the Bash version, which is a standard binary utility, so of course it's fast. I found https://github.com/bucaran/_getopts for Fish, which works really well (and I hope will be part of the Fish stdlib someday! Even with docopt support coming, it would be nice to have it distributed with Fish). I figure that it is contributing to the slowness a bit, but even so, all it's doing is processing a small amount of text, so I think it's exposing some slowness in Fish too, rather than being a problem per se.

I actually just switched from using a glob to ls on those lines yesterday, because I added a sort-by-date option. However, I shouldn't think that it would be much of an issue, because there are only 5 files in the directory. If Fish can't handle processing 5 short lines of text... :)

I guess I haven't been keeping up though, because I wasn't even aware of the Fish profiling. I'll have to check it out. Thanks.

alphapapa commented Oct 15, 2015

Hey faho, thanks for the quick response.

Yeah, I use getopt in the Bash version, which is a standard binary utility, so of course it's fast. I found https://github.com/bucaran/_getopts for Fish, which works really well (and I hope will be part of the Fish stdlib someday! Even with docopt support coming, it would be nice to have it distributed with Fish). I figure that it is contributing to the slowness a bit, but even so, all it's doing is processing a small amount of text, so I think it's exposing some slowness in Fish too, rather than being a problem per se.

I actually just switched from using a glob to ls on those lines yesterday, because I added a sort-by-date option. However, I shouldn't think that it would be much of an issue, because there are only 5 files in the directory. If Fish can't handle processing 5 short lines of text... :)

I guess I haven't been keeping up though, because I wasn't even aware of the Fish profiling. I'll have to check it out. Thanks.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Oct 15, 2015

Member

Yeah, I use getopt in the Bash version, which is a standard binary utility, so of course it's fast. I found https://github.com/bucaran/_getopts for Fish, which works really well (and I hope will be part of the Fish stdlib someday! Even with docopt support coming, it would be nice to have it distributed with Fish). I figure that it is contributing to the slowness a bit, but even so, all it's doing is processing a small amount of text, so I think it's exposing some slowness in Fish too, rather than being a problem per se.

The idea was that you're running it once per loop, while for bash you're doing it once - O(n) vs O(1).

If Fish can't handle processing 5 short lines of text... :)

The standard place to look for bottlenecks in shellscripting is anytime it calls an outside utility. Now, ls isn't too slow (and it's more that parsing ls output isn't a great habit to get into - though it's less bad with fish than with bash, fish will still fail on newlines in filenames).

I guess I haven't been keeping up though, because I wasn't even aware of the Fish profiling. I'll have to check it out. Thanks.

Yeah, the profile will show what's really going on.

Member

faho commented Oct 15, 2015

Yeah, I use getopt in the Bash version, which is a standard binary utility, so of course it's fast. I found https://github.com/bucaran/_getopts for Fish, which works really well (and I hope will be part of the Fish stdlib someday! Even with docopt support coming, it would be nice to have it distributed with Fish). I figure that it is contributing to the slowness a bit, but even so, all it's doing is processing a small amount of text, so I think it's exposing some slowness in Fish too, rather than being a problem per se.

The idea was that you're running it once per loop, while for bash you're doing it once - O(n) vs O(1).

If Fish can't handle processing 5 short lines of text... :)

The standard place to look for bottlenecks in shellscripting is anytime it calls an outside utility. Now, ls isn't too slow (and it's more that parsing ls output isn't a great habit to get into - though it's less bad with fish than with bash, fish will still fail on newlines in filenames).

I guess I haven't been keeping up though, because I wasn't even aware of the Fish profiling. I'll have to check it out. Thanks.

Yeah, the profile will show what's really going on.

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Oct 15, 2015

The idea was that you're running it once per loop, while for bash you're doing it once - O(n) vs O(1).

Yeah, that's the way getopts.fish is written. I wonder why he decided to do it that way instead of using eval and $argv similar to how getopt does. Maybe I'll ask him about changing it. I'll see what the profile shows though.

The standard place to look for bottlenecks in shellscripting is anytime it calls an outside utility. Now, ls isn't too slow (and it's more that parsing ls output isn't a great habit to get into - though it's less bad with fish than with bash, fish will still fail on newlines in filenames).

Well, this is what seems interesting to me. For example, in Bash, sometimes it may be faster to use built-in string substitutions and arrays; but for data beyond a certain size (though I'm not sure where the line is), it's faster to pipe to external utilities like awk, sed, grep, etc. After all, that's one of the principles of shell scripting: connecting the inputs and outputs of purpose-built utilities that are more efficient at certain tasks.

So I guess my question would be, is Fish significantly less efficient at doing that than Bash is? Also, compared to Bash, is there a lot of overhead when calling shell functions? I'm guessing the answers may be, to some extent, yes. So then my question would be, will Fish ever be as optimized as Bash (or zsh or dash, etc) for scripting? Or should I expect to only use Fish scripting for very simple, one-off tasks?

alphapapa commented Oct 15, 2015

The idea was that you're running it once per loop, while for bash you're doing it once - O(n) vs O(1).

Yeah, that's the way getopts.fish is written. I wonder why he decided to do it that way instead of using eval and $argv similar to how getopt does. Maybe I'll ask him about changing it. I'll see what the profile shows though.

The standard place to look for bottlenecks in shellscripting is anytime it calls an outside utility. Now, ls isn't too slow (and it's more that parsing ls output isn't a great habit to get into - though it's less bad with fish than with bash, fish will still fail on newlines in filenames).

Well, this is what seems interesting to me. For example, in Bash, sometimes it may be faster to use built-in string substitutions and arrays; but for data beyond a certain size (though I'm not sure where the line is), it's faster to pipe to external utilities like awk, sed, grep, etc. After all, that's one of the principles of shell scripting: connecting the inputs and outputs of purpose-built utilities that are more efficient at certain tasks.

So I guess my question would be, is Fish significantly less efficient at doing that than Bash is? Also, compared to Bash, is there a lot of overhead when calling shell functions? I'm guessing the answers may be, to some extent, yes. So then my question would be, will Fish ever be as optimized as Bash (or zsh or dash, etc) for scripting? Or should I expect to only use Fish scripting for very simple, one-off tasks?

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Oct 15, 2015

Member

using eval and $argv

(My neck hair is standing up for some reason)

So I guess my question would be, is Fish significantly less efficient at doing that than Bash is? Also, compared to Bash, is there a lot of overhead when calling shell functions?

I haven't seen any of that - at least not significantly. Anything I write tends to work fine - maybe my style tends to work better.

Or should I expect to only use Fish scripting for very simple, one-off tasks?

I think I speak for all of us here when I say that fish should be completely suitable for scripting - anything else is a bug.

Member

faho commented Oct 15, 2015

using eval and $argv

(My neck hair is standing up for some reason)

So I guess my question would be, is Fish significantly less efficient at doing that than Bash is? Also, compared to Bash, is there a lot of overhead when calling shell functions?

I haven't seen any of that - at least not significantly. Anything I write tends to work fine - maybe my style tends to work better.

Or should I expect to only use Fish scripting for very simple, one-off tasks?

I think I speak for all of us here when I say that fish should be completely suitable for scripting - anything else is a bug.

@faho faho modified the milestones: next-2.x, 2.3.0 Mar 6, 2016

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Mar 6, 2016

Member

Retargeting to after 2.3.

Member

faho commented Mar 6, 2016

Retargeting to after 2.3.

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Mar 6, 2016

Contributor

I don't think we should compare fish to bash, why not compare fish speed to mksh while comparing fish features to bash?

Contributor

pickfire commented Mar 6, 2016

I don't think we should compare fish to bash, why not compare fish speed to mksh while comparing fish features to bash?

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Mar 18, 2016

@pickfire Why? :) Since bash is the most common shell, shouldn't fish consider it its primary competition?

alphapapa commented Mar 18, 2016

@pickfire Why? :) Since bash is the most common shell, shouldn't fish consider it its primary competition?

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Mar 18, 2016

Contributor

@alphapapa Nope, fish should compete not with the most common shell for slowness and instead compete with the fastest shell so that it can improve.

Contributor

pickfire commented Mar 18, 2016

@alphapapa Nope, fish should compete not with the most common shell for slowness and instead compete with the fastest shell so that it can improve.

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Mar 21, 2016

Haha, so what you mean is, let's make fish the fastest shell. :)

alphapapa commented Mar 21, 2016

Haha, so what you mean is, let's make fish the fastest shell. :)

@pickfire

This comment has been minimized.

Show comment
Hide comment
@pickfire

pickfire Mar 22, 2016

Contributor

@alphapapa, the recent fish seems to have a slowdown, weird. It is even slow than bash now. More info #2776

Contributor

pickfire commented Mar 22, 2016

@alphapapa, the recent fish seems to have a slowdown, weird. It is even slow than bash now. More info #2776

@krader1961 krader1961 self-assigned this Dec 28, 2016

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Dec 28, 2016

Contributor

I love a good performance problem so I'm taking ownership of this. The first step is eliminating irrelevancies (e.g., the locate command) and making it easy to consistently run a benchmark. To that end I created a data file containing the first 500,000 lines from find $HOME. I then created two scripts, bash and fish, that I can run under commands like time, gprof, and iprofiler (a macOS specific tool).

Running

time ./onlydirs.fish < data > onlydirs.fish.out

five times yields an average elapsed time of 120.63 seconds, user mode of 48.04, and sys mode of 71.39.

The equivalent test using bash (installed via Homebrew, not the macOS version) yields an average elapsed time of 13.41 seconds, user mode of 10.79, and sys mode of 2.54.

So the fish to bash ratio for elapsed time is 9.0, user mode is 4.5, and sys mode is 28.1. Obviously the main issue is the amount of time fish spends executing syscalls. The signal manipulation identified by @ridiculousfish is obviously a prime candidate for the slow down and a good place to start improving.

P.S., I'm only going to address the performance discrepancy in the original comment. It's important that there be one problem per issue to minimize confusion and ensure we know when the issue can be closed. If there are other scenarios that show egregious differences in performance when comparing fish to a competing shell please open a new issue.

Contributor

krader1961 commented Dec 28, 2016

I love a good performance problem so I'm taking ownership of this. The first step is eliminating irrelevancies (e.g., the locate command) and making it easy to consistently run a benchmark. To that end I created a data file containing the first 500,000 lines from find $HOME. I then created two scripts, bash and fish, that I can run under commands like time, gprof, and iprofiler (a macOS specific tool).

Running

time ./onlydirs.fish < data > onlydirs.fish.out

five times yields an average elapsed time of 120.63 seconds, user mode of 48.04, and sys mode of 71.39.

The equivalent test using bash (installed via Homebrew, not the macOS version) yields an average elapsed time of 13.41 seconds, user mode of 10.79, and sys mode of 2.54.

So the fish to bash ratio for elapsed time is 9.0, user mode is 4.5, and sys mode is 28.1. Obviously the main issue is the amount of time fish spends executing syscalls. The signal manipulation identified by @ridiculousfish is obviously a prime candidate for the slow down and a good place to start improving.

P.S., I'm only going to address the performance discrepancy in the original comment. It's important that there be one problem per issue to minimize confusion and ensure we know when the issue can be closed. If there are other scenarios that show egregious differences in performance when comparing fish to a competing shell please open a new issue.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Dec 28, 2016

Member

Awesome to have someone looking at this, especially krader!

TBH I'm unclear on what blocking signals is meant to accomplish here - it was like this from the fish 1.x days but I don't know why. Is it simply to avoid having to retry on EINTR?

Member

ridiculousfish commented Dec 28, 2016

Awesome to have someone looking at this, especially krader!

TBH I'm unclear on what blocking signals is meant to accomplish here - it was like this from the fish 1.x days but I don't know why. Is it simply to avoid having to retry on EINTR?

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Dec 28, 2016

Contributor

There are obviously some problems with signal management. The first thing I noticed in looking into this issue is that exec_job() declares the var sigset_t chldset;, initializes it, then never uses it. The presence of that code suggests that the subsequent signal blocking/unblocking might be intended to defer SIGCHLD signals when they might cause problems.

Also, the setup_child_process() not only unblocks all signals it also calls signal_reset_handlers() which appears to be pointless at first glance.

The core problem is that the exec_job() function is too long and complicated. It doesn't include any comments that would explain why signals need to be blocked but does include a couple of comments that explain why they need to be unblocked at certain points. That function needs a major refactoring.

Contributor

krader1961 commented Dec 28, 2016

There are obviously some problems with signal management. The first thing I noticed in looking into this issue is that exec_job() declares the var sigset_t chldset;, initializes it, then never uses it. The presence of that code suggests that the subsequent signal blocking/unblocking might be intended to defer SIGCHLD signals when they might cause problems.

Also, the setup_child_process() not only unblocks all signals it also calls signal_reset_handlers() which appears to be pointless at first glance.

The core problem is that the exec_job() function is too long and complicated. It doesn't include any comments that would explain why signals need to be blocked but does include a couple of comments that explain why they need to be unblocked at certain points. That function needs a major refactoring.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Dec 28, 2016

Member

Yeah, exec_job is a monster. Used to be even worse :)

Member

ridiculousfish commented Dec 28, 2016

Yeah, exec_job is a monster. Used to be even worse :)

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Dec 31, 2016

Contributor

The signal_block() and signal_unblock() functions are not necessary but are not the source of the performance problem. To be more precise they aren't needed if fish simply ignores tty related signals such as SIGTTOUT regardless of whether the shell is interactive. However, doing so improves the performance an insignificant amount. It appears the real problem is the use of pthread_sigmask() elsewhere in the code. Whomever wrote this code was using a RPG (rocket propelled grenade) to kill a mosquito rather than using a scalpel.

Contributor

krader1961 commented Dec 31, 2016

The signal_block() and signal_unblock() functions are not necessary but are not the source of the performance problem. To be more precise they aren't needed if fish simply ignores tty related signals such as SIGTTOUT regardless of whether the shell is interactive. However, doing so improves the performance an insignificant amount. It appears the real problem is the use of pthread_sigmask() elsewhere in the code. Whomever wrote this code was using a RPG (rocket propelled grenade) to kill a mosquito rather than using a scalpel.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Dec 31, 2016

Contributor

I used the macOS iprofiler command to trace the syscalls of the simple fish script documented in the original comment for 60 seconds. I then used the Instruments app to read the trace and export it to a CSV file that I then massaged with a python program I threw together. Here are the first few lines of the report:

Syscall                           #calls   total time     avg time/call
BSC_read                          266222   478.096122446  0.001795855
BSC_pthread_sigmask               532411   299.499569213  0.000562535
BSC_wait4                            531    79.603438088  0.149912313
BSC_fork                             268    64.290372855  0.239889451
BSC_stat64                          3223    54.243749031  0.016830204
BSC_ioctl                           6287     0.006766654  0.000001076
BSC_mmap                              36     0.000427370  0.000011871
BSC_open                              24     0.000346700  0.000014446
BSC_open_nocancel                     12     0.000258872  0.000021573
BSC_sysctl                             6     0.000152660  0.000025443

Notice that the pthread_sigmask() calls account for approximately 1/3 of the total time. I'm willing to bet that all, or nearly all, of those calls can be eliminated.

Contributor

krader1961 commented Dec 31, 2016

I used the macOS iprofiler command to trace the syscalls of the simple fish script documented in the original comment for 60 seconds. I then used the Instruments app to read the trace and export it to a CSV file that I then massaged with a python program I threw together. Here are the first few lines of the report:

Syscall                           #calls   total time     avg time/call
BSC_read                          266222   478.096122446  0.001795855
BSC_pthread_sigmask               532411   299.499569213  0.000562535
BSC_wait4                            531    79.603438088  0.149912313
BSC_fork                             268    64.290372855  0.239889451
BSC_stat64                          3223    54.243749031  0.016830204
BSC_ioctl                           6287     0.006766654  0.000001076
BSC_mmap                              36     0.000427370  0.000011871
BSC_open                              24     0.000346700  0.000014446
BSC_open_nocancel                     12     0.000258872  0.000021573
BSC_sysctl                             6     0.000152660  0.000025443

Notice that the pthread_sigmask() calls account for approximately 1/3 of the total time. I'm willing to bet that all, or nearly all, of those calls can be eliminated.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 1, 2017

Contributor

I reduced the number of paths to the first 100,000 from find $HOME.
I then timed each shell five times and chose the best elapsed time of
the five runs:

git head:              32.05 real        12.23 user        19.90 sys
w/o pthread_setmask(): 24.13 real         9.90 user        14.32 sys
bash:                   3.84 real         2.20 user         1.63 sys

So my first attempt at improving the performance of the benchmark in this
issue reduces the elapsed time 25% (user time 19%, sys time 28%). This
is still considerably slower than bash as you can see above (now only 6x
slower rather than 8x). But it's a big step in the right direction.

Using iprofile => Instruments.app export as CSV => summarize data shows
that fish is considerably less efficient in its I/O. Specifically, it makes
a huge number of read() calls to requesting only one byte from the file
containing the path names. Bash reads the same data in chunks of 128 bytes.

Consider these two syscall profiles for the first 60 seconds of run time of
each shell. The fish profile is with my pthread_sigmask() elimination.

Top ten syscalls, by elapsed time, for bash:

BSC_stat64                         52677    12.895956498  0.000244812
BSC_lseek                         105203     8.054019910  0.000076557
BSC_ioctl                         105260     1.575582580  0.000014968
BSC_read                           53615     0.106179200  0.000001980
BSC_write_nocancel                  4023     0.028859836  0.000007174
BSC_mmap                              32     0.000377230  0.000011788
BSC_open                              13     0.000340590  0.000026199
BSC_open_nocancel                      6     0.000135880  0.000022647
BSC_read_nocancel                     12     0.000112360  0.000009363
BSC_sysctl                             2     0.000057410  0.000028705

Top ten syscalls, by elapsed time, for my improved fish:

BSC_read                         1907142   652.282603718  0.000342021
BSC_fork                            1980   347.755721559  0.175634203
BSC_wait4                           3960   114.473033227  0.028907332
BSC_stat64                         22195     0.430642407  0.000019403
BSC_exit                               1     0.238440000  0.238440000
BSC_ioctl                          44189     0.042139878  0.000000954
BSC_mmap                              36     0.000425530  0.000011820
BSC_open                              24     0.000424160  0.000017673
BSC_open_nocancel                     12     0.000231060  0.000019255
BSC_sysctl                             6     0.000150490  0.000025082

Notice that in the same interval bash manages to do more than 2x times as
many stat() calls. Fish spends most of its time reading one byte at a
time from stdin while that is inconsequential in the bash profile. And that
appears to be the most important reason for the difference in performance
between the two shells. Obviously the forks done by fish are also a factor
and itself interesting since there shouldn't be any need to call fork()
for this simple script:

while read dir
    test -d $dir; and echo $dir
end
Contributor

krader1961 commented Jan 1, 2017

I reduced the number of paths to the first 100,000 from find $HOME.
I then timed each shell five times and chose the best elapsed time of
the five runs:

git head:              32.05 real        12.23 user        19.90 sys
w/o pthread_setmask(): 24.13 real         9.90 user        14.32 sys
bash:                   3.84 real         2.20 user         1.63 sys

So my first attempt at improving the performance of the benchmark in this
issue reduces the elapsed time 25% (user time 19%, sys time 28%). This
is still considerably slower than bash as you can see above (now only 6x
slower rather than 8x). But it's a big step in the right direction.

Using iprofile => Instruments.app export as CSV => summarize data shows
that fish is considerably less efficient in its I/O. Specifically, it makes
a huge number of read() calls to requesting only one byte from the file
containing the path names. Bash reads the same data in chunks of 128 bytes.

Consider these two syscall profiles for the first 60 seconds of run time of
each shell. The fish profile is with my pthread_sigmask() elimination.

Top ten syscalls, by elapsed time, for bash:

BSC_stat64                         52677    12.895956498  0.000244812
BSC_lseek                         105203     8.054019910  0.000076557
BSC_ioctl                         105260     1.575582580  0.000014968
BSC_read                           53615     0.106179200  0.000001980
BSC_write_nocancel                  4023     0.028859836  0.000007174
BSC_mmap                              32     0.000377230  0.000011788
BSC_open                              13     0.000340590  0.000026199
BSC_open_nocancel                      6     0.000135880  0.000022647
BSC_read_nocancel                     12     0.000112360  0.000009363
BSC_sysctl                             2     0.000057410  0.000028705

Top ten syscalls, by elapsed time, for my improved fish:

BSC_read                         1907142   652.282603718  0.000342021
BSC_fork                            1980   347.755721559  0.175634203
BSC_wait4                           3960   114.473033227  0.028907332
BSC_stat64                         22195     0.430642407  0.000019403
BSC_exit                               1     0.238440000  0.238440000
BSC_ioctl                          44189     0.042139878  0.000000954
BSC_mmap                              36     0.000425530  0.000011820
BSC_open                              24     0.000424160  0.000017673
BSC_open_nocancel                     12     0.000231060  0.000019255
BSC_sysctl                             6     0.000150490  0.000025082

Notice that in the same interval bash manages to do more than 2x times as
many stat() calls. Fish spends most of its time reading one byte at a
time from stdin while that is inconsequential in the bash profile. And that
appears to be the most important reason for the difference in performance
between the two shells. Obviously the forks done by fish are also a factor
and itself interesting since there shouldn't be any need to call fork()
for this simple script:

while read dir
    test -d $dir; and echo $dir
end
@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 2, 2017

Contributor

I've implemented a chunked read implementation. In the table below the results labeled "w/perf improvements" includes both the reduced signal manipulation improvements plus the chunked read enhancement. That yields a 72% reduction in run time. We're now only 2.4x slower than bash versus 8.3x slower with the git head version of fish. Not great but a damn sight better. I'd be satisfied if we were within a factor of 2x.

git head:              32.05 real        12.23 user        19.90 sys
w/o pthread_setmask(): 24.13 real         9.90 user        14.32 sys
w/perf improvements:    9.03 real         6.26 user         2.81 sys
bash:                   3.84 real         2.20 user         1.63 sys
Contributor

krader1961 commented Jan 2, 2017

I've implemented a chunked read implementation. In the table below the results labeled "w/perf improvements" includes both the reduced signal manipulation improvements plus the chunked read enhancement. That yields a 72% reduction in run time. We're now only 2.4x slower than bash versus 8.3x slower with the git head version of fish. Not great but a damn sight better. I'd be satisfied if we were within a factor of 2x.

git head:              32.05 real        12.23 user        19.90 sys
w/o pthread_setmask(): 24.13 real         9.90 user        14.32 sys
w/perf improvements:    9.03 real         6.26 user         2.81 sys
bash:                   3.84 real         2.20 user         1.63 sys

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jan 2, 2017

remove unnecessary signal management
The shell was doing a log of signal blocking/unblocking that hurts
performance and can be avoided. This reduced the elapsed time for a
simple benchmark by 25%.

Partial fix for #2007

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jan 2, 2017

refactor builtin_read
Refactor `builtin_read()` to split the code that does the actual reading
into separate functions. This introduces the `read_in_chunks()` function
but in this change it is just a clone of `read_one_char_at_a_time()`. It
will be modified to actually read in chunks in the next change.

Partial fix for #2007

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jan 2, 2017

implement chunked reads
Provide a more efficient method for reading lines (or null terminated
sequences) when the input is seekable.

Another partial fix for #2007

@krader1961 krader1961 referenced this issue Jan 2, 2017

Closed

Performance #3691

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 2, 2017

Contributor

The instrumented syscall summary for the fish version labeled "w/perf improvements" in my previous comment looks like this:

BSC_read                           93531    13.990190326  0.000149578
BSC_wait4                           1969    12.372290710  0.006283540
BSC_ioctl                         100208    12.253727625  0.000122283
BSC_lseek                          99466     6.365717475  0.000063999
BSC_fork                             982     3.525459389  0.003590081
BSC_stat64                         50180     0.295604777  0.000005891
BSC_mmap                              36     0.000385300  0.000010703
BSC_open                              24     0.000347520  0.000014480
BSC_open_nocancel                     12     0.000231910  0.000019326
BSC_sysctl                             6     0.000156380  0.000026063

TBD is where the fork() calls are coming from given that the benchmark does not require any external commands.

Contributor

krader1961 commented Jan 2, 2017

The instrumented syscall summary for the fish version labeled "w/perf improvements" in my previous comment looks like this:

BSC_read                           93531    13.990190326  0.000149578
BSC_wait4                           1969    12.372290710  0.006283540
BSC_ioctl                         100208    12.253727625  0.000122283
BSC_lseek                          99466     6.365717475  0.000063999
BSC_fork                             982     3.525459389  0.003590081
BSC_stat64                         50180     0.295604777  0.000005891
BSC_mmap                              36     0.000385300  0.000010703
BSC_open                              24     0.000347520  0.000014480
BSC_open_nocancel                     12     0.000231910  0.000019326
BSC_sysctl                             6     0.000156380  0.000026063

TBD is where the fork() calls are coming from given that the benchmark does not require any external commands.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 2, 2017

Contributor

The fork() calls are, according to fish -d2 due to

<2> fish: Fork #1, pid 23676: internal builtin for 'echo'

I have absolutely no idea why the main thread would fork to execute a builtin command. If we got rid of that fork fish would be reasonably close to bash on this benchmark.

Running the benchmark under fish -d5 shows the reason for the fork has to do with whether or not it produces output:

<3> fish: Skipping fork: no output for internal builtin 'read'
<3> fish: Set status of read dir to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (read dir), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'test -d $dir' with id 2
<3> fish: Skipping fork: no output for internal builtin 'test'
<3> fish: Set status of test -d $dir to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (test -d $dir), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'and echo $dir' with id 2
<2> fish: Fork #10, pid 23966: internal builtin for 'echo'
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 23956 (and echo $dir), UNCOMPLETED, NON-INTERACTIVE

This is another manifestation of issue #1396: fish buffering the output of not just functions but blocks in general. Fixing that is far outside the scope of this issue.

Contributor

krader1961 commented Jan 2, 2017

The fork() calls are, according to fish -d2 due to

<2> fish: Fork #1, pid 23676: internal builtin for 'echo'

I have absolutely no idea why the main thread would fork to execute a builtin command. If we got rid of that fork fish would be reasonably close to bash on this benchmark.

Running the benchmark under fish -d5 shows the reason for the fork has to do with whether or not it produces output:

<3> fish: Skipping fork: no output for internal builtin 'read'
<3> fish: Set status of read dir to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (read dir), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'test -d $dir' with id 2
<3> fish: Skipping fork: no output for internal builtin 'test'
<3> fish: Set status of test -d $dir to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (test -d $dir), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'and echo $dir' with id 2
<2> fish: Fork #10, pid 23966: internal builtin for 'echo'
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 23956 (and echo $dir), UNCOMPLETED, NON-INTERACTIVE

This is another manifestation of issue #1396: fish buffering the output of not just functions but blocks in general. Fixing that is far outside the scope of this issue.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 3, 2017

Member

fish probably forks echo's output because it's got a redirection. There's some complicated logic here (see redirection_is_to_real_file and its uses) and I suspect it's not actually quite right - there may be fd redirections that make it do the wrong thing. Not forking is hard.

Member

ridiculousfish commented Jan 3, 2017

fish probably forks echo's output because it's got a redirection. There's some complicated logic here (see redirection_is_to_real_file and its uses) and I suspect it's not actually quite right - there may be fd redirections that make it do the wrong thing. Not forking is hard.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 3, 2017

Contributor

Yes, the actual benchmark is doing

while read dir
    test -d $dir; and echo $dir
end < data > onlydirs.fish.out

I still don't see why that needs to fork. The relevant comment in src/exec.cpp says we have to fork so that the file will be truncated even if the block produces no output. But that seems rather silly since the truncation should have been done when the redirection was setup. I'm guessing that if I look at the code more closely I'll see that it is deferring the opening of the file for some reason. In any event I think that's a problem for another day. Specifically, after the code in that module has been refactored to eliminate most of the oclint complexity warnings.

Right now I'd simply like as many people to build and run a fish with this patch as possible. Even if just for a few minutes or hours. I'm nervous that the change to how signals are managed might only exhibit problems in the face of extensive use of the trap command, --on-event functions, or similar mechanisms that I'm unlikely to exercise (much) on my systems.

Contributor

krader1961 commented Jan 3, 2017

Yes, the actual benchmark is doing

while read dir
    test -d $dir; and echo $dir
end < data > onlydirs.fish.out

I still don't see why that needs to fork. The relevant comment in src/exec.cpp says we have to fork so that the file will be truncated even if the block produces no output. But that seems rather silly since the truncation should have been done when the redirection was setup. I'm guessing that if I look at the code more closely I'll see that it is deferring the opening of the file for some reason. In any event I think that's a problem for another day. Specifically, after the code in that module has been refactored to eliminate most of the oclint complexity warnings.

Right now I'd simply like as many people to build and run a fish with this patch as possible. Even if just for a few minutes or hours. I'm nervous that the change to how signals are managed might only exhibit problems in the face of extensive use of the trap command, --on-event functions, or similar mechanisms that I'm unlikely to exercise (much) on my systems.

@faho faho modified the milestones: fish 2.5.0, next-minor Jan 6, 2017

krader1961 added a commit that referenced this issue Jan 15, 2017

remove unnecessary signal management
The shell was doing a log of signal blocking/unblocking that hurts
performance and can be avoided. This reduced the elapsed time for a
simple benchmark by 25%.

Partial fix for #2007

krader1961 added a commit that referenced this issue Jan 15, 2017

refactor builtin_read
Refactor `builtin_read()` to split the code that does the actual reading
into separate functions. This introduces the `read_in_chunks()` function
but in this change it is just a clone of `read_one_char_at_a_time()`. It
will be modified to actually read in chunks in the next change.

Partial fix for #2007

krader1961 added a commit that referenced this issue Jan 15, 2017

implement chunked reads
Provide a more efficient method for reading lines (or null terminated
sequences) when the input is seekable.

Another partial fix for #2007
@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 15, 2017

Contributor

I'm going to close this because the two most significant improvements to the scenario discussed in this issue have been merged. Note that one of the improvements is gated on the variable FISH_NO_SIGNAL_BLOCK being defined so that we can gain confidence it does not introduce any regressions without causing problems for users willing to build from source but not willing to use possibly risky behavior. Hopefully we'll have enough confidence that aspect of the change is correct it can become the default behavior for the 2.6.0 release.

Contributor

krader1961 commented Jan 15, 2017

I'm going to close this because the two most significant improvements to the scenario discussed in this issue have been merged. Note that one of the improvements is gated on the variable FISH_NO_SIGNAL_BLOCK being defined so that we can gain confidence it does not introduce any regressions without causing problems for users willing to build from source but not willing to use possibly risky behavior. Hopefully we'll have enough confidence that aspect of the change is correct it can become the default behavior for the 2.6.0 release.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 21, 2017

Member

One thing I noticed is that ctrl-C is less reliable with FISH_NO_SIGNAL_BLOCK set, for this code: while true ; echo derp > /dev/null; end . It's always been a little flaky but this amplifies it. Time to track it down I guess.

Member

ridiculousfish commented Jan 21, 2017

One thing I noticed is that ctrl-C is less reliable with FISH_NO_SIGNAL_BLOCK set, for this code: while true ; echo derp > /dev/null; end . It's always been a little flaky but this amplifies it. Time to track it down I guess.

@alphapapa

This comment has been minimized.

Show comment
Hide comment
@alphapapa

alphapapa Jan 28, 2017

Kurtis, fantastic work on this, thanks so much. If I have time, I'll see if I can test with the env var set as you described.

alphapapa commented Jan 28, 2017

Kurtis, fantastic work on this, thanks so much. If I have time, I'll see if I can test with the env var set as you described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment