Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-scrollback system #970

Closed
martinetd opened this issue Sep 14, 2018 · 19 comments
Closed

Double-scrollback system #970

martinetd opened this issue Sep 14, 2018 · 19 comments

Comments

@martinetd
Copy link
Contributor

Hi, another testing the waters proposal; this one is more ambitious code-wise.

Basically, I like having a large buffer for search, but kitty just eats way too much memory for scrollback (for reference as of today, starting an empty kitty with 64k lines scrollback consumes 50MB, filling the 64k of the buffer adds another 300MB. There is no way I can sustain my 45 opened terms as of this minute with that kind of memory usage)

On the other hand, I don't ever go very far with mod+pgup, so I was thinking a hybrid system could be more efficient:

  • keep a few pages of scrollback as currently.
  • copy lines as they go out of that scrollback into a secondary scrollback buffer as plain text, or even better into a compressed buffer (with some fast algorithm like lz4 or zstd)
  • scrollback actions like mod+pgup cannot go past the normal buffer so there is no change here, but the pager scrollback buffer would get its input from both buffers concatenated

In my opinion, that secondary buffer should not be processed at all on resizes etc and just be kept as plain text; for correctness I might actually suggest to fold the scrollback lines back into one when they are line continuation.

code-wise this isn't as clear to me as the selection code, but I suspect historybuf_push could use line_as_unicode/line_as_ansi just before overwriting the string and pushing to the secondary buffer, controlling its size at this point ; not sure how to decide if unicode or ansi there?
On the display side, as_text would just concatenate both, the line count should work out on its own... And the secondary buffer would live happily only in the C side.

(And as bonus, it would become easy to have this secondary buffer backed by a file if someone wants an unlimited buffer like #884 asked ; not that I would do that part)

Anyway, just like the other issue, I'd like to get an opinion before starting anything.

@kovidgoyal
Copy link
Owner

First, I am somewhat surprised that kitty memory usage on startup depends on scrollback buffer size. I thought I fixed the bug where it was accidentally pre-allocating the entire scrollback buffer. The expected behavior is that startup memory usage is independent of scrollback_lines are you saying that's not the case for you, even if you run from current master?

Now coming to the question of storing large amounts of scrollback:

  1. I agree that if it is implemented, it should not be resized with the normal scrollback
  2. I dont know if compressing it is worthwhile, might just be better to write it to disk straightaway. However, scrolling performance is a big priority for kitty, so the implementation of this with or without compression would need to not have a measurable effect on scrolling performance. That probably means using a separate thread for it.
  3. Regarding how to store it: If it is determined that this secondary buffer is to be used only when piping then IMO the best way to store it is as ANSI text, that is plain text with terminal formatting escape codes for colors etc.

@martinetd
Copy link
Contributor Author

Regarding memory usage -- sorry, that would appear to be me misreading ps output yesterday, trying again kitty does consume the same on every fresh startup as expected (50MB); which isn't bad using single instance mode. The big-ish swings on new window I think I remember look gone as well, but that might be related to the short live of this new session; I'll open a different issue after looking closer at what is allocated if I can reproduce.
Another weird difference is that the usage with full buffer looks a bit smaller than yesterday (with 64k lines I'm somewhere between 200 and 250MB, where it was between 300 and 350MB in my memory)... Well, it's too much to sustain many big scrollback buffers anyway, so I think such a feature would make sense.

Re-compression: I agree it might or might not be worthwhile, speed-wise I am confident something like lz4 would not slow much down but a first version would definitely not compress (for simplicity anyway); then it was just an idea for future enhancement.
I'd rather avoid disk personally, but if we're going to disk for e.g. unlimited buffer (could be an option eventually) then compression will actually likely speed things up; or I guess it would require batching for proper performance anyway and at this point compression is almost free.
I agree scrolling performance is very important for me as well - I'll definitely pay attention to that.

Regarding usage of the secondary buffer, I think it would only make sense for piping into pager, but I'm not familiar with just about everything yet; please tell me if you can think of user uses (I thought of hinting that only works in the active screen and does not work with pager if I'm doing this right; I don't do much more with my scrollback?)

Anyway, your reaction probably means this is reasonable, I'll play with this a bit over the weekend/next week :)

@kovidgoyal
Copy link
Owner

The kitty cell structure, is IIRC on the order of 30 bytes. So assuming a line width of 100 characters, thats ~ 3KB per line in the scrollback buffer (there a little bit of non cell overhead as well, like keeping track of newlines).

I cannot think of many uses of the secondary buffer other than searching it, which would be best done in a pager anyway. Feel free to experiment, I am not against the idea in principle. I will of course, have to see the actual code to make a final decision.

@kovidgoyal
Copy link
Owner

Closing this issue, we can discuss further if/when you have a PR with an initial implementation. If you do have question feel free to post on this issue and I will help if I can.

@martinetd
Copy link
Contributor Author

After some more looking at the code, given an ansi line is stored as Py_UCS4, I believe compression can help quite a bit memory-wise given 3/4 bytes will be zeroes most of the time... But let's get something working first.

I've tied the ends a bit and have something that's starting to work now, got a few questions:

  • Do you have anything like a benchmark? right now I run time cat bigfile with various big outputs, but it's not really great. I'm thinking of writing a test to measure time for a few patterns (long lines, short lines), maybe?
    If so I might need a hand with the API there, not sure how to append a lot of text -- looking at the screen tests I can insert text with s.draw() but if I write a linefeed the next line is still considered a continuation and the newline is part of the first line which is not what I would expect; is there a better function to call?

  • I was folding long lines back into a single line in the text buffer so that the text would be invariant to resize, but that doesn't look good in the pager with long lines (as the following lines that are in the normal scrollback will be folded manually).
    I'm a bit unsure here, removing wrap markers in the normal part makes the whole thing consistent but it gets then more complex to compute the position (or would need to count lines differently); on the other hand inserting wrap markers in the pager history buffer (what I call the second scrollback buffer now) will be ugly on resize...

  • At this point I'm storing everything in a huge circular buffer preallocated upfront, and I intend the config value to be in MB and not in lines.
    It would be easy to realloc regularily as buffer grows (say every 1MB or so), but dealing with lines is difficult:

    • It's easy to keep allocating bigger buffers until the line count is reached the first time, but going back to the start of the buffer the line count will be very difficult to track (track how many old lines we erase), that'll require tracking every new linefeed, and dividing the line length by terminal width...
    • if we want to maintain a somewhat constant number of lines and the lines that come in are bigger than the old ones, it'd need allocating chunks in a list and inserting new chunks as required, which is kind of a pain as well...
      So, is a buffer size in MB acceptable to you? The doc could give some estimate e.g. 1MB = ~2500 lines at 100 chars per line without compression.

@kovidgoyal
Copy link
Owner

No I dont have a benchmark specifically for the case of scrolling, it's probably a good idea to add one. You can use s.line_feed() and s.carriage_return() to create hard line breaks.

Regarding wrap markers I think they should be preserved in the secondary scrollback. The secondary scrollback should just be a dumb byte buffer. When the user triggers a read of the secondary scrollback it can be re-interpreted for wrapping etc, since I think modern CPUS should be able to do that fast enough to pipe into the pager.

I am fine with the secondary buffer being in MB

@martinetd
Copy link
Contributor Author

Yeah, I ended up using s.linefeed()/s.carriage_return(), it works.

I've got a fairly noticeable overhead at this point, calling line_as_ansi() on every line when it is purged looks like to have a big impact in profiling.
In line_as_ansi(), a big part of the time is spent on cursor_as_sgr(), we can do a bit better here (like not calling it if no change) but it's still expensive going over all cells and comparing all attributes for every single char and there's no real way around that...
Interestingly enough it's more visible in the test than in a real cat, especially with very long lines - I had around +30% in real usage, and +10% after optimizing line_as_ansi() ; but in the benchmark I can see double the time easily, I'll need to look at why.
(I could argue it's time not spent when looking at the buffer, but that's not the point if it still slows down scrolling)

Now it's down to 10% I'm tempted to say it's not worth trying to make this async with another thread, but I can give it a shot if you think it's worth it... I don't think it'll be pretty though :p

Wrapping - hmm, not sure about that honestly, even if the buffer is just 10MB we need to have the whole buffer processed before it's displayed (because what the user wants to see is at the end of the buffer, and we pipe linearly), I don't think that'll have the "instant" feel it has right now.
It's also not trivial to add wrapping after "ansifying" the buffer because you need to recognize which sequences won't count for line length, it's not very difficult, but that means looking at every byte a few times.
I don't think we can opportunistically store the current wrapping in the buffer either as if the size changes a few times it'll be a mess to track - at this point, I'd rather store the wrapping so we can dump the whole buffer and recompute it on resize, will have to see how long that takes, this can be done in a test as well.

@martinetd
Copy link
Contributor Author

Actually instead of on resize I guess the first time the scrollback is dumped would be ok -- successive looking at that buffer would be fast, and stuff like interactive resize would be much faster.

@martinetd
Copy link
Contributor Author

One more style/test question - I originally made the secondary buffer size an option in the global state, but the tests do not use options...
The state defaults in C do not match the python defaults, so just calling set_options(defaults) lets me do what I want (eventually fiddling with the values), but maybe the tests should have their own config dict to use and always call set_options in the common code, to have coherent results?

Alternatively, I've tried changing the screen new function to take one more argument like scrollback_lines but I don't really like it as arguments are positional and optional it would be easy to miss one - I'd rather stick to handling options more properly in tests.

@martinetd
Copy link
Contributor Author

Re-wrapping, after finally getting it to work (it's only half past four in the morning!), I still think we should let less wrap, all the time.

The reason for this is that search with wrapping will not work on wrapped words the way we do it.
This will cost tracking a second "screen.scrolled_by" that does not include wrapped lines in the count, but I believe it is worth it (especially since it removes the need for that rewrap function :D)

I'll submit a PR with what I have for now so you can look at it, it's definitely not my proudest piece of code and there are still a few areas that'll need cleanup + more testing, but it's good enough to get some comments

@maximbaz
Copy link
Contributor

Speaking of benchmarks, @kovidgoyal you might find vtebench interesting to play with: https://jwilm.io/blog/alacritty-lands-scrollback/

@kovidgoyal
Copy link
Owner

That benchmark is pretty meaningless for kitty. As far as i can tell, it is the usual dump lots of text into the emulator and see how fast it swallows it thing. It can be trivially gamed by simply using a large input buffer in the emulator. kitty for instance, deliberately sleeps in its input and render loops, so it will never perform as well as other emulators in the "swallow lots of input fast" test. As I have said before, the perfromance benchmark that matters to me is CPU consumption and latency/responsiveness. See https://sw.kovidgoyal.net/kitty/performance.html

@martinetd
Copy link
Contributor Author

Interesting, so you're not so much worried about my time cat than what the cpu usage would look like with something that inputs a fixed amount of e.g. 10k lines per second?

@kovidgoyal
Copy link
Owner

kovidgoyal commented Sep 18, 2018

Yes, indeed. Timing tests are useful first approximation/proxy for CPU consumption. But the benchmark I want to optimize against is CPU consumption at fixed rate. The vast majority of terminal interactions are not of the form of "sink arbitrarily large amounts of data". They are of the form of human interaction or some program generating output like a log file or compile job or similar, which are typically finite rate.

@maximbaz
Copy link
Contributor

Consider including alacritty in the comparison on your performance page, I think it's becoming one of your main competitors now and people often choose between kitty and alacritty, and it would be great to highlight where each terminal shines and what it optimizes for, to help people decide 😉

@kovidgoyal
Copy link
Owner

kovidgoyal commented Sep 18, 2018

I chose terminals with comparable feature sets for my comparison. As far as I can tell alacritty is not aiming for anything like the feature set of kitty. But sure, it would be interesting to add it...I'm just lazy about setting up a rust toolchain :)

@martinetd
Copy link
Contributor Author

I've pushed a few more changes on line_as_ansi, to work on GPU cells directly instead of copying to cursor all the time.
I probably should rename cursor_as_sgr since it's not working on cursors anymore... Also not sure why it wasn't doing bold/dim properly but I figured I might as well correct that at the same time, please say if you want that separately.

Lastly I've added a small test program that just outputs random (but predictable) stuff at constant rate and I cannot see much difference in cpu usage tbh; the cut-off (number of lines per second I can set until the program complains it cannot print fast enough) is also pretty close (around 75k lines per second with default options if I make sure the rest of the laptop is idle, but it breaks as soon as there is activity)

Anyway, I think at this point I'm less happy with code quality than with the performance of this change itself; making sure the style is coherent (I'm honestly a bit confused about the coding style), renaming variables, refactorings, etc -- I'll wait till you have time to give an opinion on that

@kovidgoyal
Copy link
Owner

Sounds good. Happily (or perhaps sadly) I'm not very consistent with code style, and conversely dont care very much about it. So dont worry about it. When I merge, I will fix any style inconsistencies that bug e particularly, I dont normally ask contributors to do that.

@kovidgoyal
Copy link
Owner

A separate PR for the bold/dim fix would be appreciated however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants