New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression since 2.3.1 #3793

Closed
frol opened this Issue Jan 30, 2017 · 28 comments

Comments

Projects
None yet
7 participants
@frol

frol commented Jan 30, 2017

  • Have you checked if problem occurs with fish 2.4.0?
  • Tried fish without third-party customizations (check sh -c 'env HOME=$(mktemp -d) fish')?

fish version installed (fish --version):

This performance drop started with 2.4.0, but I have also tried the master branch (6db3721) build today

OS/terminal used: Arch Linux, terminal doesn't make any difference (tried Gnome Terminal, XTerm, and TTY).

It seems that set_color became much slower in Fish 2.4.0. While it is not noticeable to high-end CPUs (yet 100% CPU load can be a good indicator of that something is wrong as typing a command shouldn't take so much of resources), ARM processors and even x86 processors in powersave mode reveal the noticeable lag in typing using a theme, which used to work just fine on Fish 2.3.1.

Steps to reproduce:

  1. Use Raspberry Pi, or switch your CPU into a powersave mode (or lock your CPU frequency at something around 1GHz), or just use a slow CPU
  2. Install bobthefish theme
  3. Just type anything long (e.g. 100 characters) and try to clear this line using backspace or delete keys; holding backspace pressed for 1 second, fish goes on for another 0.5-1 second removing more than you intended.

This bug is also filed against the bobthefish theme repository: oh-my-fish/theme-bobthefish#70. All other themes I tried work fine (yet, they still take 50+% CPU when I simply press and hold backspace key on an empty line), but this theme also works fine with Fish 2.3.1, and given the fact that my experiments ended up in set_color, I think it is worth taking a look here.

/cc @bobthecow @ycardon

@krader1961 krader1961 added this to the fish-future milestone Jan 30, 2017

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 30, 2017

Contributor

Are you running the terminal emulator on the Raspberry Pi or a different system?

While possible it seems really unlikely that the set_color command is at fault.

Contributor

krader1961 commented Jan 30, 2017

Are you running the terminal emulator on the Raspberry Pi or a different system?

While possible it seems really unlikely that the set_color command is at fault.

@frol

This comment has been minimized.

Show comment
Hide comment
@frol

frol Jan 30, 2017

I am experiencing this on my Intel Core i7 powered laptop when I put the CPU into a powersave mode:

$ sudo pstate-frequency -S -p powersave
pstate-frequency version 3.7.2
    pstate::CPU_DRIVER   -> intel_pstate
    pstate::CPU_GOVERNOR -> powersave
    pstate::TURBO        -> 1 [OFF]
    pstate::CPU_MIN      -> 22% [800000KHz]
    pstate::CPU_MAX      -> 23% [805000KHz]

Thus, my system configuration is:

  • Laptop with Intel Core i7 running at 800MHz
  • Arch Linux x64
  • Gnome 3
  • Gnome Terminal or any other terminal including TTY
  • Fish 2.4.0
  • latest bobthefish

However, @ycardon reported (oh-my-fish/theme-bobthefish#66) the same experience on his Raspberry Pi.

frol commented Jan 30, 2017

I am experiencing this on my Intel Core i7 powered laptop when I put the CPU into a powersave mode:

$ sudo pstate-frequency -S -p powersave
pstate-frequency version 3.7.2
    pstate::CPU_DRIVER   -> intel_pstate
    pstate::CPU_GOVERNOR -> powersave
    pstate::TURBO        -> 1 [OFF]
    pstate::CPU_MIN      -> 22% [800000KHz]
    pstate::CPU_MAX      -> 23% [805000KHz]

Thus, my system configuration is:

  • Laptop with Intel Core i7 running at 800MHz
  • Arch Linux x64
  • Gnome 3
  • Gnome Terminal or any other terminal including TTY
  • Fish 2.4.0
  • latest bobthefish

However, @ycardon reported (oh-my-fish/theme-bobthefish#66) the same experience on his Raspberry Pi.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 30, 2017

Member

Thanks for this investigation. How did you determine that the regression is in set_color?

Member

ridiculousfish commented Jan 30, 2017

Thanks for this investigation. How did you determine that the regression is in set_color?

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 30, 2017

Contributor

Using fish 2.3.1 the BobTheFish theme causes fish to consume 1.2% of a cpu while my normal prompt theme uses just 0.8% when backspacing over a long command.

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command. My normal prompt theme uses only 3%. Using the macOS sample command shows a much greater number of tparm() calls for the BobTheFish case. The big difference is much more time spent in reader_repaint() which callss_write()which is is the parent of thetparm()` calls. We clearly did something to make repainting the command line much more expensive since 2.3.1.

Contributor

krader1961 commented Jan 30, 2017

Using fish 2.3.1 the BobTheFish theme causes fish to consume 1.2% of a cpu while my normal prompt theme uses just 0.8% when backspacing over a long command.

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command. My normal prompt theme uses only 3%. Using the macOS sample command shows a much greater number of tparm() calls for the BobTheFish case. The big difference is much more time spent in reader_repaint() which callss_write()which is is the parent of thetparm()` calls. We clearly did something to make repainting the command line much more expensive since 2.3.1.

@bobthecow

This comment has been minimized.

Show comment
Hide comment
@bobthecow

bobthecow Jan 30, 2017

Any idea what it is about bobthefish that causes so many repaints?

bobthecow commented Jan 30, 2017

Any idea what it is about bobthefish that causes so many repaints?

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 31, 2017

Contributor

Bisected to commit f464704.

Contributor

krader1961 commented Jan 31, 2017

Bisected to commit f464704.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 31, 2017

Contributor

I was dithering on whether to mark this a regression. Performance problems are often the result of adding functionality. Which is the case with this issue and doesn't automatically make the problem a regression in behavior. However, I think the change in behavior in this instance does not justify such a dramatic decrease in performance. Furthermore the original code was itself more expensive than it should have been given that it is called in a hot path for interactive use and performs avoidable computations.

TBD is how best to fix this. Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes. Or should we, can we, change the algorithm to be more efficient?

Contributor

krader1961 commented Jan 31, 2017

I was dithering on whether to mark this a regression. Performance problems are often the result of adding functionality. Which is the case with this issue and doesn't automatically make the problem a regression in behavior. However, I think the change in behavior in this instance does not justify such a dramatic decrease in performance. Furthermore the original code was itself more expensive than it should have been given that it is called in a hot path for interactive use and performs avoidable computations.

TBD is how best to fix this. Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes. Or should we, can we, change the algorithm to be more efficient?

@krader1961 krader1961 added bug regression and removed enhancement labels Jan 31, 2017

@krader1961 krader1961 modified the milestones: next-minor, fish-future Jan 31, 2017

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 31, 2017

Member

What exactly is causing the slowdown? I don't see anything obvious in that diff that would be responsible for a regression.

Member

ridiculousfish commented Jan 31, 2017

What exactly is causing the slowdown? I don't see anything obvious in that diff that would be responsible for a regression.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 31, 2017

Contributor

What exactly is causing the slowdown?

It's the removal of the std::min() call to limit the number of tests to just the first 16 colors. We're now testing 256 or more sequences.

TBD is whether we should reinstate that limit, memoize the results, or otherwise improve the algorithm.

Contributor

krader1961 commented Jan 31, 2017

What exactly is causing the slowdown?

It's the removal of the std::min() call to limit the number of tests to just the first 16 colors. We're now testing 256 or more sequences.

TBD is whether we should reinstate that limit, memoize the results, or otherwise improve the algorithm.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jan 31, 2017

Contributor

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid. Not to mention the original static_cast<size_t>(max_colors) makes me grind my teeth since the cast should be unnecessary.

Contributor

krader1961 commented Jan 31, 2017

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid. Not to mention the original static_cast<size_t>(max_colors) makes me grind my teeth since the cast should be unnecessary.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 31, 2017

Member

bobthefish.trace.zip

I attached an instruments trace. I'm surprised, we really are spending over half our time in tparm(). I didn't expect that to be so expensive. Shows what I know.

I see now that we have a nested loop. So 256 colors * 4 escape sequences is over a thousand checks, each of which do silly things like call sprintf(). It's easy to believe that that's expensive.

Instrumenting this, this theme is indeed using a CSI-style set_color and exit_attribute_mode, so it's running past those 1024 checks a lot (and failing each of them).

One lazy thing we could do is to move the color checks after the mode checks. I also think this is a fine candidate for memoization as you suggested.

Member

ridiculousfish commented Jan 31, 2017

bobthefish.trace.zip

I attached an instruments trace. I'm surprised, we really are spending over half our time in tparm(). I didn't expect that to be so expensive. Shows what I know.

I see now that we have a nested loop. So 256 colors * 4 escape sequences is over a thousand checks, each of which do silly things like call sprintf(). It's easy to believe that that's expensive.

Instrumenting this, this theme is indeed using a CSI-style set_color and exit_attribute_mode, so it's running past those 1024 checks a lot (and failing each of them).

One lazy thing we could do is to move the color checks after the mode checks. I also think this is a fine candidate for memoization as you suggested.

@bobthecow

This comment has been minimized.

Show comment
Hide comment
@bobthecow

bobthecow Jan 31, 2017

In the meantime, is there something the theme could do to make things easier on reader_repaint?

bobthecow commented Jan 31, 2017

In the meantime, is there something the theme could do to make things easier on reader_repaint?

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Jan 31, 2017

Member

Reducing the number of emitted escapes should help. I see that the theme has a lot of functions that end in set_color normal for example, only to have that color reset by the next function. It might be possible to elide these. fish needs to be much faster here but that may help in the meantime.

Member

ridiculousfish commented Jan 31, 2017

Reducing the number of emitted escapes should help. I see that the theme has a lot of functions that end in set_color normal for example, only to have that color reset by the next function. It might be possible to elide these. fish needs to be much faster here but that may help in the meantime.

bobthecow added a commit to oh-my-fish/theme-bobthefish that referenced this issue Feb 1, 2017

Remove a ton of `set_color normal` calls.
Eases the pain of fish-shell/fish-shell#3793 just a bit.

This _shouldn't_ ever look any different to the end user. I think.
@frol

This comment has been minimized.

Show comment
Hide comment
@frol

frol Feb 1, 2017

@krader1961

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command.

Here is a GIF (1MB in size) showing my terminal experience and htop where you can see that fish utilizes up to 250% CPU on adding a new character and up to 100% CPU on deletion.

screencast-fish-performance

Also, I forgot to mention that I have custom keyboard sensitivity, this means that more "keypressed" events get fired than on the default settings and especially than on OS X. You may try my setting: xset r rate 150 60

frol commented Feb 1, 2017

@krader1961

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command.

Here is a GIF (1MB in size) showing my terminal experience and htop where you can see that fish utilizes up to 250% CPU on adding a new character and up to 100% CPU on deletion.

screencast-fish-performance

Also, I forgot to mention that I have custom keyboard sensitivity, this means that more "keypressed" events get fired than on the default settings and especially than on OS X. You may try my setting: xset r rate 150 60

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Feb 1, 2017

Member

Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes.

Well, when can the cached results become invalid? The terminal isn't going to change out from under fish, so $TERM or terminfo (which should be rare enough that we can just ignore it and tell people to restart fish) otherwise changing seems, from my naive perspective, to be the only way to cause this.

So yeah, this appears to be an obvious candidate for caching.

Member

faho commented Feb 1, 2017

Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes.

Well, when can the cached results become invalid? The terminal isn't going to change out from under fish, so $TERM or terminfo (which should be rare enough that we can just ignore it and tell people to restart fish) otherwise changing seems, from my naive perspective, to be the only way to cause this.

So yeah, this appears to be an obvious candidate for caching.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Feb 1, 2017

Contributor

We absolutely have to invalidate the cached data when any terminal related var changes. Fortunately I already added a hook for doing that because it is needed for other reasons; e.g., reinitializing the curses subsystem. See react_to_variable_change() in src/env.cpp. I wasn't actually making the argument we shouldn't memoize that code because of the possible need to invalidate the results. I was simply pointing out that you can't naively assume the cached data will remain valid.

Contributor

krader1961 commented Feb 1, 2017

We absolutely have to invalidate the cached data when any terminal related var changes. Fortunately I already added a hook for doing that because it is needed for other reasons; e.g., reinitializing the curses subsystem. See react_to_variable_change() in src/env.cpp. I wasn't actually making the argument we shouldn't memoize that code because of the possible need to invalidate the results. I was simply pointing out that you can't naively assume the cached data will remain valid.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Feb 1, 2017

Member

A funky idea that avoids needing a cache is to try escapes in MRU fashion. Make a big list of all the escapes we know about. When an escape is hit, splice it to the front of the list, so we try that one firs next time. Prompts tend to reuse the same escapes over and over, so this will mean we end up only testing a handful most of the time.

Member

ridiculousfish commented Feb 1, 2017

A funky idea that avoids needing a cache is to try escapes in MRU fashion. Make a big list of all the escapes we know about. When an escape is hit, splice it to the front of the list, so we try that one firs next time. Prompts tend to reuse the same escapes over and over, so this will mean we end up only testing a handful most of the time.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Feb 2, 2017

Member

Another really obvious and easy thing we ought to do is just cache the length of the prompt. If the output of the prompt doesn't change, then neither will its length. Maybe discard the cache after every command to handle changes in locale, etc.

Member

ridiculousfish commented Feb 2, 2017

Another really obvious and easy thing we ought to do is just cache the length of the prompt. If the output of the prompt doesn't change, then neither will its length. Maybe discard the cache after every command to handle changes in locale, etc.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Feb 2, 2017

Contributor

It is trivial, and probably common, to generate a new prompt having the same length as the previous prompt but different content including different coloring. Perhaps I misunderstood what you proposed, @ridiculousfish, but I don't see how keying on the prompt length will help. Also, a MRU implemented as a list will help a bit but is suboptimal. The list will be constantly updated as different colors and other effects are injected. It won't be a stable structure. Better to just use a map. After all, people aren't generally changing TERM or any other var that would require recomputing the results so generating the cache will generally happen just once per interactive shell.

Contributor

krader1961 commented Feb 2, 2017

It is trivial, and probably common, to generate a new prompt having the same length as the previous prompt but different content including different coloring. Perhaps I misunderstood what you proposed, @ridiculousfish, but I don't see how keying on the prompt length will help. Also, a MRU implemented as a list will help a bit but is suboptimal. The list will be constantly updated as different colors and other effects are injected. It won't be a stable structure. Better to just use a map. After all, people aren't generally changing TERM or any other var that would require recomputing the results so generating the cache will generally happen just once per interactive shell.

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Feb 3, 2017

Member

Sorry, to be clear, we cache the prompt and its length. Obviously if the prompt is different we discard the cache.

The advantage of the MRU approach is that it avoids all invalidation issues. It doesn't remember any lengths, it just changes which escapes it tries first.

Member

ridiculousfish commented Feb 3, 2017

Sorry, to be clear, we cache the prompt and its length. Obviously if the prompt is different we discard the cache.

The advantage of the MRU approach is that it avoids all invalidation issues. It doesn't remember any lengths, it just changes which escapes it tries first.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Feb 3, 2017

Member

Darn, I thought that commit could have trivial performance implications (and noted optimization may be desirable), but am surprised it was so significant. I didn't think very hard about it.

Member

floam commented Feb 3, 2017

Darn, I thought that commit could have trivial performance implications (and noted optimization may be desirable), but am surprised it was so significant. I didn't think very hard about it.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Feb 3, 2017

Member

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid.

The color indexes start at 0 and max_colors at 1. Was I brainfarting?

Member

floam commented Feb 3, 2017

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid.

The color indexes start at 0 and max_colors at 1. Was I brainfarting?

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Feb 3, 2017

Member

(Yes, I was.)

Member

floam commented Feb 3, 2017

(Yes, I was.)

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 4, 2017

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 4, 2017

refactor `escape_code_length()`
Step one in addressing issue #3793

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 4, 2017

cache escape sequences we've seen
Cache the escape sequences we've seen when checking for those which
don't take any visual space when writing the prompt or similar strings.
This reduces the cost of determining the true cost of such strings by a
full order of magnitude if they include lots of such escape sequences.

Fixes #3793

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 4, 2017

refactor `escape_code_length()`
Step one in addressing issue #3793

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 4, 2017

cache escape sequences we've seen
Cache the escape sequences we've seen when checking for those which
don't take any visual space when writing the prompt or similar strings.
This reduces the cost of determining the true cost of such strings by a
full order of magnitude if they include lots of such escape sequences.

Fixes #3793
@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Feb 4, 2017

Contributor

@frol: Please try PR #3811 if you are able to build from source.

Contributor

krader1961 commented Feb 4, 2017

@frol: Please try PR #3811 if you are able to build from source.

@krader1961 krader1961 self-assigned this Feb 4, 2017

@frol

This comment has been minimized.

Show comment
Hide comment
@frol

frol Feb 4, 2017

@krader1961 I will check it later today.

frol commented Feb 4, 2017

@krader1961 I will check it later today.

@frol

This comment has been minimized.

Show comment
Hide comment
@frol

frol Feb 4, 2017

@krader1961 It works like a charm!

frol commented Feb 4, 2017

@krader1961 It works like a charm!

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 5, 2017

optimize the `cached_esc_sequences` behavior
Periodically sort the cached escape sequence lengths based on feedback
from cache hits so that we're always checking for the most likely
sequence lengths first.

Another fix for #3793

krader1961 added a commit to krader1961/fish-shell that referenced this issue Feb 6, 2017

cache prompts and escape sequences we've seen
Cache the escape sequences we've seen when checking for those which
don't take any visual space when writing the prompt or similar strings.
This reduces the cost of determining the true cost of such strings by a
full order of magnitude if they include lots of such escape sequences.

Periodically sort the cached escape sequence lengths based on feedback
from cache hits so that we're always checking for the most likely
sequence lengths first.

Also cache the prompt layouts to avoid doing the calculations if the
prompt doesn't change.

Fixes #3793

krader1961 added a commit that referenced this issue Feb 6, 2017

refactor `escape_code_length()`
Step one in addressing issue #3793

@krader1961 krader1961 closed this in c4f2210 Feb 6, 2017

@faho faho added the release notes label Feb 9, 2017

@ricardomatias

This comment has been minimized.

Show comment
Hide comment
@ricardomatias

ricardomatias Sep 18, 2018

My CPU goes to 50%+ when deleting text on version 2.7.1.

OS/terminal used Kitty & Hyper.js

This seems to be the only issue that mentions this. I'm on a mac OS X High Sierra.

ricardomatias commented Sep 18, 2018

My CPU goes to 50%+ when deleting text on version 2.7.1.

OS/terminal used Kitty & Hyper.js

This seems to be the only issue that mentions this. I'm on a mac OS X High Sierra.

@faho

This comment has been minimized.

Show comment
Hide comment
@faho

faho Sep 18, 2018

Member

@ricardomatias: Please open a new issue. This one was about a specific thing that has since been solved.

Member

faho commented Sep 18, 2018

@ricardomatias: Please open a new issue. This one was about a specific thing that has since been solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment