-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi all,
I've spent the last few months working on and off on a new escape code parser for kitty, that uses vector CPU instructions (AVX2 or SSE 4.2 or their ARM equivalents, whatever is available) to greatly speed up parsing of the input byte stream. This has led to speedups with throughput for parsing different kind of input data of between 50% to 400%.
There is a new benchmarking kitten that can be used to benchmark terminal performance as well. The benchmark results from running kitten __benchmark__ on my system with kitty master branch and vt branch (the latter has the new code):
master branch:
Results:
Only ASCII chars : 4.55s @ 43.9 MB/s
Unicode chars : 2.73s @ 64.9 MB/s
CSI codes with few chars : 3.56s @ 28.1 MB/s
Long escape codes : 8.12s @ 96.5 MB/s
Images : 10.06s @ 53.0 MB/s
vt branch with SIMD acceleration:
Results:
Only ASCII chars : 1.73s @ 115.7 MB/s
Unicode chars : 1.78s @ 99.5 MB/s
CSI codes with few chars : 1.76s @ 56.7 MB/s
Long escape codes : 2.39s @ 327.9 MB/s
Images : 1.97s @ 270.8 MB/s
A table comparing the results from the vt branch with other terminal emulators I have on my system is at https://github.com/kovidgoyal/kitty/blob/vt/docs/performance.rst#throughput showing kitty is much faster than the rest with the new code.
There are of course some downsides, most importantly, the new code required fairly invasive changes to various parts of kitty so its likely some edge cases are broken. In particular kitty no longer supports input data in non utf-8 encodings and also does not support C1 control codes. The latter are supported only by VTE based terminals and WezTerm out of the box, so they are not widely used.
I'd appreciate if some of you could build kitty from source and help
test things for regressions. Building kitty from source is very easy you
need only C and Go compilers and to run a single command, instructions are at: https://sw.kovidgoyal.net/kitty/build/
Please report any issues you find here.
Thanks, and enjoy!
P.S. For the curious, the things that have been sped up are:
- Searching for the start and end of escape codes in the input byte stream
- UTF-8 decoding via SIMD instructions
- base64 decoding via SIMD instructions (base64 is used for various
escape codes such as images, copy/paste, etc.) - SIMD XOR used for securely storing image frame data in a disk cache
Currently I have restricted the SIMD implementations to AVX2 at max, not
AVX512 as that tends to have a major energy/warm up penalty and prevents
running on "economy cores". Something to revisit in the future.