Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression since 2.3.1 #3793

Closed
frol opened this issue Jan 30, 2017 · 28 comments
Closed

Performance regression since 2.3.1 #3793

frol opened this issue Jan 30, 2017 · 28 comments

Comments

@frol
Copy link

@frol frol commented Jan 30, 2017

  • Have you checked if problem occurs with fish 2.4.0?
  • Tried fish without third-party customizations (check sh -c 'env HOME=$(mktemp -d) fish')?

fish version installed (fish --version):

This performance drop started with 2.4.0, but I have also tried the master branch (6db3721) build today

OS/terminal used: Arch Linux, terminal doesn't make any difference (tried Gnome Terminal, XTerm, and TTY).

It seems that set_color became much slower in Fish 2.4.0. While it is not noticeable to high-end CPUs (yet 100% CPU load can be a good indicator of that something is wrong as typing a command shouldn't take so much of resources), ARM processors and even x86 processors in powersave mode reveal the noticeable lag in typing using a theme, which used to work just fine on Fish 2.3.1.

Steps to reproduce:

  1. Use Raspberry Pi, or switch your CPU into a powersave mode (or lock your CPU frequency at something around 1GHz), or just use a slow CPU
  2. Install bobthefish theme
  3. Just type anything long (e.g. 100 characters) and try to clear this line using backspace or delete keys; holding backspace pressed for 1 second, fish goes on for another 0.5-1 second removing more than you intended.

This bug is also filed against the bobthefish theme repository: oh-my-fish/theme-bobthefish#70. All other themes I tried work fine (yet, they still take 50+% CPU when I simply press and hold backspace key on an empty line), but this theme also works fine with Fish 2.3.1, and given the fact that my experiments ended up in set_color, I think it is worth taking a look here.

/cc @bobthecow @ycardon

@krader1961 krader1961 added this to the fish-future milestone Jan 30, 2017
@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 30, 2017

Are you running the terminal emulator on the Raspberry Pi or a different system?

While possible it seems really unlikely that the set_color command is at fault.

@frol
Copy link
Author

@frol frol commented Jan 30, 2017

I am experiencing this on my Intel Core i7 powered laptop when I put the CPU into a powersave mode:

$ sudo pstate-frequency -S -p powersave
pstate-frequency version 3.7.2
    pstate::CPU_DRIVER   -> intel_pstate
    pstate::CPU_GOVERNOR -> powersave
    pstate::TURBO        -> 1 [OFF]
    pstate::CPU_MIN      -> 22% [800000KHz]
    pstate::CPU_MAX      -> 23% [805000KHz]

Thus, my system configuration is:

  • Laptop with Intel Core i7 running at 800MHz
  • Arch Linux x64
  • Gnome 3
  • Gnome Terminal or any other terminal including TTY
  • Fish 2.4.0
  • latest bobthefish

However, @ycardon reported (oh-my-fish/theme-bobthefish#66) the same experience on his Raspberry Pi.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 30, 2017

Thanks for this investigation. How did you determine that the regression is in set_color?

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 30, 2017

Using fish 2.3.1 the BobTheFish theme causes fish to consume 1.2% of a cpu while my normal prompt theme uses just 0.8% when backspacing over a long command.

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command. My normal prompt theme uses only 3%. Using the macOS sample command shows a much greater number of tparm() calls for the BobTheFish case. The big difference is much more time spent in reader_repaint() which calls s_write()which is is the parent of thetparm()` calls. We clearly did something to make repainting the command line much more expensive since 2.3.1.

@bobthecow
Copy link

@bobthecow bobthecow commented Jan 30, 2017

Any idea what it is about bobthefish that causes so many repaints?

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 31, 2017

Bisected to commit f464704.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 31, 2017

I was dithering on whether to mark this a regression. Performance problems are often the result of adding functionality. Which is the case with this issue and doesn't automatically make the problem a regression in behavior. However, I think the change in behavior in this instance does not justify such a dramatic decrease in performance. Furthermore the original code was itself more expensive than it should have been given that it is called in a hot path for interactive use and performs avoidable computations.

TBD is how best to fix this. Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes. Or should we, can we, change the algorithm to be more efficient?

@krader1961 krader1961 added bug regression and removed enhancement labels Jan 31, 2017
@krader1961 krader1961 modified the milestones: next-minor, fish-future Jan 31, 2017
@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 31, 2017

What exactly is causing the slowdown? I don't see anything obvious in that diff that would be responsible for a regression.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 31, 2017

What exactly is causing the slowdown?

It's the removal of the std::min() call to limit the number of tests to just the first 16 colors. We're now testing 256 or more sequences.

TBD is whether we should reinstate that limit, memoize the results, or otherwise improve the algorithm.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Jan 31, 2017

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid. Not to mention the original static_cast<size_t>(max_colors) makes me grind my teeth since the cast should be unnecessary.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 31, 2017

bobthefish.trace.zip

I attached an instruments trace. I'm surprised, we really are spending over half our time in tparm(). I didn't expect that to be so expensive. Shows what I know.

I see now that we have a nested loop. So 256 colors * 4 escape sequences is over a thousand checks, each of which do silly things like call sprintf(). It's easy to believe that that's expensive.

Instrumenting this, this theme is indeed using a CSI-style set_color and exit_attribute_mode, so it's running past those 1024 checks a lot (and failing each of them).

One lazy thing we could do is to move the color checks after the mode checks. I also think this is a fine candidate for memoization as you suggested.

@bobthecow
Copy link

@bobthecow bobthecow commented Jan 31, 2017

In the meantime, is there something the theme could do to make things easier on reader_repaint?

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 31, 2017

Reducing the number of emitted escapes should help. I see that the theme has a lot of functions that end in set_color normal for example, only to have that color reset by the next function. It might be possible to elide these. fish needs to be much faster here but that may help in the meantime.

bobthecow added a commit to oh-my-fish/theme-bobthefish that referenced this issue Feb 1, 2017
Eases the pain of fish-shell/fish-shell#3793 just a bit.

This _shouldn't_ ever look any different to the end user. I think.
@frol
Copy link
Author

@frol frol commented Feb 1, 2017

@krader1961

Using fish from git head the BobTheFish theme causes fish to consume 12% of a cpu when backspacing over a long command.

Here is a GIF (1MB in size) showing my terminal experience and htop where you can see that fish utilizes up to 250% CPU on adding a new character and up to 100% CPU on deletion.

screencast-fish-performance

Also, I forgot to mention that I have custom keyboard sensitivity, this means that more "keypressed" events get fired than on the default settings and especially than on OS X. You may try my setting: xset r rate 150 60

@faho
Copy link
Member

@faho faho commented Feb 1, 2017

Should we simply memoize the code that is causing the slow down? That would necessitate adding code elsewhere to invalidate the cached results when, for example, $TERM changes.

Well, when can the cached results become invalid? The terminal isn't going to change out from under fish, so $TERM or terminfo (which should be rare enough that we can just ignore it and tell people to restart fish) otherwise changing seems, from my naive perspective, to be the only way to cause this.

So yeah, this appears to be an obvious candidate for caching.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Feb 1, 2017

We absolutely have to invalidate the cached data when any terminal related var changes. Fortunately I already added a hook for doing that because it is needed for other reasons; e.g., reinitializing the curses subsystem. See react_to_variable_change() in src/env.cpp. I wasn't actually making the argument we shouldn't memoize that code because of the possible need to invalidate the results. I was simply pointing out that you can't naively assume the cached data will remain valid.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 1, 2017

A funky idea that avoids needing a cache is to try escapes in MRU fashion. Make a big list of all the escapes we know about. When an escape is hit, splice it to the front of the list, so we try that one firs next time. Prompts tend to reuse the same escapes over and over, so this will mean we end up only testing a handful most of the time.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2017

Another really obvious and easy thing we ought to do is just cache the length of the prompt. If the output of the prompt doesn't change, then neither will its length. Maybe discard the cache after every command to handle changes in locale, etc.

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Feb 2, 2017

It is trivial, and probably common, to generate a new prompt having the same length as the previous prompt but different content including different coloring. Perhaps I misunderstood what you proposed, @ridiculousfish, but I don't see how keying on the prompt length will help. Also, a MRU implemented as a list will help a bit but is suboptimal. The list will be constantly updated as different colors and other effects are injected. It won't be a stable structure. Better to just use a map. After all, people aren't generally changing TERM or any other var that would require recomputing the results so generating the cache will generally happen just once per interactive shell.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 3, 2017

Sorry, to be clear, we cache the prompt and its length. Obviously if the prompt is different we discard the cache.

The advantage of the MRU approach is that it avoids all invalidation issues. It doesn't remember any lengths, it just changes which escapes it tries first.

@floam
Copy link
Member

@floam floam commented Feb 3, 2017

Darn, I thought that commit could have trivial performance implications (and noted optimization may be desirable), but am surprised it was so significant. I didn't think very hard about it.

@floam
Copy link
Member

@floam floam commented Feb 3, 2017

Also, the k < (max_colors - 1) check appears to be incorrect since a less than check against max_colors ensures the index is not invalid.

The color indexes start at 0 and max_colors at 1. Was I brainfarting?

@floam
Copy link
Member

@floam floam commented Feb 3, 2017

(Yes, I was.)

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Feb 4, 2017

@frol: Please try PR #3811 if you are able to build from source.

@krader1961 krader1961 self-assigned this Feb 4, 2017
@frol
Copy link
Author

@frol frol commented Feb 4, 2017

@krader1961 I will check it later today.

@frol
Copy link
Author

@frol frol commented Feb 4, 2017

@krader1961 It works like a charm!

krader1961 added a commit that referenced this issue Feb 6, 2017
Step one in addressing issue #3793
@krader1961 krader1961 closed this in c4f2210 Feb 6, 2017
@faho faho added the release notes label Feb 9, 2017
@ricardomatias

This comment was marked as off-topic.

@faho

This comment was marked as off-topic.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.