Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage when editing relatively big files #2454

Closed
andreyorst opened this issue Sep 29, 2018 · 38 comments

Comments

Projects
None yet
7 participants
@andreyorst
Copy link
Contributor

commented Sep 29, 2018

So I've tested three main editors, which I think are comparable: Kakoune, Emacs and NeoVim. All with default settings. I've opened simple web page, which size is 10M. What I've noticed is that both Vim and Emacs use pretty much the same amount of RAM no matter what position I am currently viewing, while Kakoune needs whooping ~340M MB after jump to the end of the file.

So what I did:
Opened this file: https _habr.com_post_423889_htm.tar.gz;

  • kak (user configuration folder doesn't exist): 55.7M
  • gj
  • checked memory usage: 338M

The same for NeoVim:

  • nvim -u NORC: 18.2
  • jump with: G
  • after jump: 18.2

Emacs:

  • open file: 35.8
  • Alt>
  • after jump: 41.5 but highlighting is broken

nvim starts immediately, and jump is super quick. Emacs need some time, comparable to Kakoune, and jump takes some time too.

I know that vim and neovim doesn't highlight whole file, like it is done in Kakoune (maybe I'm wrong, but it is the only meaningful reason why memory usage is so big), but only a part of it. I don't know how it is implemented in Emacs, but memory usage still stands around the same value as at the moment of opening file. Opening file with kak -n, is immediate, and jump is quick too, memory usage is the same, regardless positioning - 18M, but there's no highlighting.

Sometimes I work with huge log files, and I apply syntax highlighting to them. Previously I did this in NeoVim, with my own syntax file. I'd like to stop using neovim for that, because I do like kakoune's editing model more, but working with several files of even bigger size is painful due to RAM limit I have.

Are there any plans of optimizing Kakoune's highlighting system, or a way to limit highlightion to visible part of file or to some meaningful regions so parser could do highlighting correctly, but not keep it for whole file?

@alexherbo2

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2018

How about to level-up the ranges highlighters to support paths as they work for regions?

declare-option range-specs view_port

define-command view-port-update %{ evaluate-commands -draft %{
  # Select the view-port
  execute-keys gtGbGl
  # Update the option
  set-option window view_port %val(timestamp) "%val(selection_desc)|Default"
}}

hook window NormalIdle '' view-port-update
add-highlighter window/view-port ranges view_port
add-highlighter window/view-port/html ref html
@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 1, 2018

@alexherbo2 I do like the idea to make ranges take a subhighlighter instead of forcing just applying a face, however it is a bit tangential to the issue here.

@andreyorst I agree this should be fixed, Kakoune's memory usage on this file is simply unacceptable,
regarding performances, its a bit trickier, as we do need to apply all the regions regexes on the whole file to get correct syntax highlighting. I suspect the solution is a combination of improving html.kak and the core c++ implementation to reduce memory usage (and possibly improve performance, although I doubt there are big gains to do on the C++ impl without reducing guarantees on highlighting)

@eraserhd

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2018

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2018

@mawww What I really don't like about Vim, is that viewing a file not from the top, but from the middle can cause wrong highlighting, or even no highlighting though. Example:
Vim is pissed off because it seems that there's no starting point of augroup, therefore augroup end is not a valid syntax:
image
But scrolling up by single line fixes highlighting:
image
There are more examples, of such behaviour, but I'm not able to show more.

I would much like to use the smallest but highlighted node (which means, that it may not fit the viewed part of the file, and may be even a size of a whole file, because not all languages can be parsed by parts AFAIK) instead of lagging or disabled highlighting.

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 4, 2018

@eraserhd The current implementation is pretty performant after the initial parsing as it only needs to reparse modified lines (which is why region highlighter regexes never match across lines), the problem is that we need to cache all matches.

In the case of html, we cache positions of all < and >, which is a lot, each match takes 32 bytes (in 64bits), I am testing a local change that compacts that to 20 bytes, and it seems to reduce memory usage to 230M in that case, which is still far too much. I'll continue investigating, but we might have to change html highlighting not to use ranges for <...>.

mawww added a commit that referenced this issue Oct 6, 2018

Reduce memory usage of cached matches for RegionsHighlighter
This adds a limitation that capture matching on regions only works
if the regions start/end/recurse match is less than 65635 byte long.
With this limitation we can reduce the RegexMatch struct size to 16
bytes instead of 32.

This is still not good enough,but should slightly improve high memory
usage as reported in #2454
@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 6, 2018

Ok, so things are a bit more complex, the reason for the huge memory usage is that we are not only caching all matches for html regions, but also all matches for css and javascript regions on the whole buffer.

As soon as a <script> or <style> block is displayed, we delegate that region highlighting to the javascript/css highlighter, which itself has regions so maintains a cache of all matches for those. This is why there is a delay when jumping to the end of the buffer, there is a <script> region there, and none at buffer start, so its only when jumping at the end that we cache all matches for the javascript highlighter.

We could improve memory usage here with a smarter data structure where we would only cache matches for some ranges of lines, so that the javascript/css highlighters would not trigger a parsing of the whole buffer, but only of lines in which a <script> or <style> region is active, which would hopefully greatly reduce memory usage.

We would likely stay above 50megs, but if we want to be both fast and correct, we have to sacrifice a bit of memory, vim and emacs chose to sacrifice correctness in order not to sacrifice memory, which I think was a good choice 20 years ago, but harder to defend nowadays.

mawww added a commit that referenced this issue Oct 9, 2018

maintain a list of valid ranges for region highlighting
This should greatly reduce memory usage by only caching matches
for ranges that needs to be highlighted, in the case where multiple
regions are nested, this means only the topmost region needs to parse
and cache the whole buffer, other regions highlighter will only ensure
the lines for the ranges they are called up are cached.

Fixes #2454
@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 9, 2018

I just pushed a regions-memory-usage branch with a commit that should greatly reduce memory usage on the given file, the code needs a bit more love, but I could not resist teasing it. Please give it a go to see if it solve your use cases.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2018

@mawww I just opened that test file (I'm not able to test it with my work files, because I'm having vacation) and holy ***t how much it is better now!! 27 MB and jump to the end right after loading file is instant! This is quite an improvement, what I gonna say!

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2018

So I've scrolled file from top to the bottom with page down, and noticed that scrolling is kind of laggy. I think we wait for each chunk to be highlighted. So sometimes editor lags on scrolling. Memory usage was around 75MB after scrolling through whole document (so I assume that everything was cached). Further re-scrolling doesn't affected memory usage, which is good, but scroll lag still persisted.

Maybe asynchronous highlighting can make scrolling smooth, but it will not highlight large files immediately, that's the problem in, say, VSCode

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 9, 2018

@andreyorst Yeah I noticed that delay on scrolling, I will investigate that before this all gets into master. You are correct that after scrolling though the whole file we should have everything cached again (however for nested regions, we will only cache the relevant ranges, instead of the full buffer, so should still be better).

I would like to avoid asynchrounous highlighting as I think it has 2 big problems:

  • It is pretty complex to implement, and requires adding multithreading to Kakoune's code, which has been written assuming single threading.
  • It removes pressure on highlighting code being fast, as it does not impact the editing speed anymore (yeah, I consider that a drawback, a modern computer should have no trouble highlighting a huge file synchronously while still providing a decent editing experience. Being forced to write code that fullfills this is a feature).
@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 10, 2018

It is pretty complex to implement, and requires adding multithreading to Kakoune's code, which has been written assuming single threading.

Yeah, well, I've written that because I've just looked into xi, which does highlighting in other thread, and does it blazingly fast, and thought if it can be applied to Kakoune.

mawww added a commit that referenced this issue Oct 10, 2018

Cleanup RegexHighlighter code and drop cache when it becomes too big
The RegexHighlighter range cache can get pretty big in nested
regions use cases, and maintaining it can become pretty costly,
so if it hits a certain size, just drop it.

Should improve performances in #2454
@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 10, 2018

After this commit things should be better, scrolling should be smooth again.

mawww added a commit that referenced this issue Oct 10, 2018

maintain a list of valid ranges for region highlighting
This should greatly reduce memory usage by only caching matches
for ranges that needs to be highlighted, in the case where multiple
regions are nested, this means only the topmost region needs to parse
and cache the whole buffer, other regions highlighter will only ensure
the lines for the ranges they are called up are cached.

Fixes #2454

mawww added a commit that referenced this issue Oct 10, 2018

Cleanup RegexHighlighter code and drop cache when it becomes too big
The RegexHighlighter range cache can get pretty big in nested
regions use cases, and maintaining it can become pretty costly,
so if it hits a certain size, just drop it.

Should improve performances in #2454

mawww added a commit that referenced this issue Oct 10, 2018

maintain a list of valid ranges for region highlighting
This should greatly reduce memory usage by only caching matches
for ranges that needs to be highlighted, in the case where multiple
regions are nested, this means only the topmost region needs to parse
and cache the whole buffer, other regions highlighter will only ensure
the lines for the ranges they are called up are cached.

Fixes #2454

mawww added a commit that referenced this issue Oct 10, 2018

Cleanup RegexHighlighter code and drop cache when it becomes too big
The RegexHighlighter range cache can get pretty big in nested
regions use cases, and maintaining it can become pretty costly,
so if it hits a certain size, just drop it.

Should improve performances in #2454
@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 10, 2018

@mawww How could I measure startup time? I feel, that opening that test ile is taking much longer time, compared to master branch.

I've tried:

time kak -e "quit" /path/to/file

but it exits before file loads. And:

time kak -E "quit" /path/to/file

throws an error in debug buffer, that it need client number, but I don't have it at the time of kakoune launch..

@Screwtapello

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

You may be interested in 40a91b1.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 10, 2018

I wanted to run kakoune 10 times with empthy file, 10 times with 9MB file, and kakoune from master with same settings to see time variations. So it would be simple shell script with 4 loops. I don't think that I need -debug switch for that.

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 10, 2018

@andreyorst I think you want something like time kak -e 'exec <c-l>; q' my-file You'll get some overhead from the startup reading and all, in which case you can add -debug profile and add buffer *debug*; write profile before q in the command to get a more detailed profile (you'll want to look a the 'window display update' time in that).

That said here it seems master is much slower to start than regions-memory-usage (40s vs 8s on a debug build, 4.5 vs 1.3s on a release build)

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 12, 2018

@andreyorst Can you confirm if regions-memory-usage does indeed work slower for you ? I'd like to merge it to master, but if there is a big performance regression on some configurations, I'd like to investigate it before. As said previously, here it looks more like a big performance gain.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

@mawww I've tested this in the other way. I've used tmux to send input, since -e executes :q<ret> way before file is loaded. So I've did this for kak which is system installation of kakoune-git from AUR, and ./kak which is compiled by me from your git repo on regions-memory-usage branch, since real input is more real situation:

command tmux send-keys -t \>_:1.1 "time ./kak https\ _habr.com_post_423889_htm" enter ":q" enter

And here are the results:

# regions-memory-usage
./kak https\ _habr.com_post_423889_htm  13.47s user 0.23s system 100% cpu 13.674 total
./kak https\ _habr.com_post_423889_htm  13.67s user 0.26s system 100% cpu 13.891 total
./kak https\ _habr.com_post_423889_htm  13.49s user 0.24s system 100% cpu 13.686 total
./kak https\ _habr.com_post_423889_htm  13.88s user 0.29s system 99% cpu 14.176 total

# v2018.09.04-133-gd652ec9c
kak https\ _habr.com_post_423889_htm  2.45s user 0.22s system 101% cpu 2.630 total
kak https\ _habr.com_post_423889_htm  2.36s user 0.25s system 101% cpu 2.571 total
kak https\ _habr.com_post_423889_htm  2.45s user 0.21s system 101% cpu 2.615 total
kak https\ _habr.com_post_423889_htm  2.41s user 0.26s system 101% cpu 2.625 total
kak https\ _habr.com_post_423889_htm  2.35s user 0.24s system 101% cpu 2.551 total
@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

I've updated both master and regions branch:

# regions v2018.09.04-140-g4d215578 
time ./kak https\ _habr.com_post_423889_htm -debug profile -e 'exec <c-l>; buffer *debug*; write profile; q'
./kak https\ _habr.com_post_423889_htm -debug    11.88s user 0.20s system 100% cpu 12.043 total
./kak https\ _habr.com_post_423889_htm -debug    11.84s user 0.20s system 100% cpu 12.000 total
./kak https\ _habr.com_post_423889_htm -debug    11.93s user 0.16s system 100% cpu 12.066 total
./kak https\ _habr.com_post_423889_htm -debug    11.80s user 0.18s system 100% cpu 11.946 total
./kak https\ _habr.com_post_423889_htm -debug    12.28s user 0.16s system 100% cpu 12.417 total

# master v2018.09.04-133-gd652ec9c
time kak https\ _habr.com_post_423889_htm -debug profile -e 'exec <c-l>; buffer *debug*; write profile; q'
kak https\ _habr.com_post_423889_htm -debug  -e  1.91s user 0.21s system 102% cpu 2.079 total
kak https\ _habr.com_post_423889_htm -debug  -e  1.93s user 0.18s system 102% cpu 2.063 total
kak https\ _habr.com_post_423889_htm -debug  -e  2.04s user 0.15s system 102% cpu 2.150 total
kak https\ _habr.com_post_423889_htm -debug  -e  1.98s user 0.20s system 101% cpu 2.143 total
kak https\ _habr.com_post_423889_htm -debug  -e  1.96s user 0.16s system 102% cpu 2.076 total
@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 13, 2018

Just to make sure, when you built the regions branch, did you pass 'debug=no' to make (like make debug=no -j8), if not, you are comparing a debug build on the regions branch to a release build on the master branch.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2018

no, but i thought that master build is debug build too.

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 14, 2018

Depends, I see that the master build is in your $PATH, while the regions build is in current directory, if you installed master through an external build script (brew, or your distro package manager), its likely it will be a release build.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2018

well, I've installed kakoune-git. Turns out that it has debug=no in the pkgbuild.

@mawww

This comment has been minimized.

Copy link
Owner

commented Oct 14, 2018

Ok, so if you build the regions branch with debug=no, you should see much better performance (I would expect better than on master).

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Oct 14, 2018

ok. If you're saying that you seen prefomance boost compared to master, then I'm not against merge at all

@mawww mawww closed this in 194a5db Oct 14, 2018

mawww added a commit that referenced this issue Oct 14, 2018

Cleanup RegexHighlighter code and drop cache when it becomes too big
The RegexHighlighter range cache can get pretty big in nested
regions use cases, and maintaining it can become pretty costly,
so if it hits a certain size, just drop it.

Should improve performances in #2454
@robertmeta

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

a modern computer should have no trouble highlighting a huge file synchronously while still providing a decent editing experience. Being forced to write code that fullfills this is a feature

@mawww I agree, would it be useful to put some definitions around those terms so people understand if it is being fulfilled? Specifically for people interested in making changes to Kakoune (or even core scripts?) it would be useful to understand where the line is.

decent editing experience

Xi for example just puts a flat number on it, 16ms (60+fps). Now, it controls painting, but in the case of Kakoune, it could be operation commits in $X ms.

huge file

A demo file in the repo or downloadable the defines huge in this case would be useful, specific in relation to that $X ms mentioned above.

modern computer

This is the hardest part of the equation I think, the constantly moving line of the modern computer, the huge file can be mostly static, and the response time is mostly based on human perceptive needs, but the modern computer bit is hard. Possibly whatever 1 nano instance on AWS at (given time) is, or is there some standard external measure of modern computer that could be used?

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2018

modern computer

This is the hardest part of the equation I think, the constantly moving line of the modern computer, the huge file can be mostly static, and the response time is mostly based on human perceptive needs, but the modern computer bit is hard. Possibly whatever 1 nano instance on AWS at (given time) is, or is there some standard external measure of modern computer that could be used?

At university our software engineering teacher told us that we must never orient on modern PC. His point was that in our life most programs that we will write will be relatively simple ones, since usually we don't write something as complex as "simulation of burning radioactive substances" which actually requires a special CPU architecture, but whatever.

Text editor is a relatively simple program. Kakoune is a relatively minimalist text editor. In my opinion text editor should run well on microwave. "Well" is a relative term too, but what I'm trying to say is that if someone wants to measure performance in some kind of software, the performance should be measured with the lowest common hardware you can find on current market. That's why proper algorithms and optimizations matter.

Xi says that everything should work within 16ms, but what PC is meant in this measurement? I can have similar painting speed results on my netbook that was released in late 2008, with 32 bit Intel Atom inside and 1G RAM by using Vi, but will Xi work faster on this computer? I doubt. On my phone current stable Kakoune release lags more, compared to September release. Vim (with relatively same plugins) doesn't lag in comparison.

@robertmeta

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

@andreyorst I am less worried about the specifics of the definition, if the decision is made that a 2008 Intel Atom is the benchmark then so be it. I mentioned a nano-instance at AWS as it is accessible, inexpensive ($0.0065 per hour), and very well defined (one ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor). Just want some baseline so people can know if the performance of an implementation is acceptable.

@mawww

This comment has been minimized.

Copy link
Owner

commented Dec 5, 2018

I think the Xi model is misguided, I think in a way it is too technology centric. I dont think users care about 16ms vs 20ms.. What is important is that it "feels" instant, and that whenever it does not, the user can have an intuition it wont in advance.

I tend to work on very powerful computers (currently on a 8 core Xeon...), so in order to ensure Kakoune does not get too slow, I run the debug version most of the time (which has the added advantage of letting me investigate bugs more easily).

@andreyorst I am surprised the current master would be slower than the september release, I guess it would be nice to have some performance mesurement during testing, so that we can see performance changes and investigate regressions. The problem is to have some stable hardware on which to do that, or somewhere that can run regular performance report runs (iterating on say the last 100 commits).

It would be nice to define what needs to be tested, of the top of my head:

  • Startup time loading all bundled scripts
  • Startup time not loading anything
  • Loading and highlighting a big (100M) file using complex highlighting (html is a pretty good candidate)
  • Various editing performance (global replace, insert some text in every lines, pipe through sort...).

If we had those tests, we could then eventually make it part of real testing, and fail on significant performance reduction.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2018

@andreyorst I am surprised the current master would be slower than the september release, I guess it would be nice to have some performance mesurement during testing, so that we can see performance changes and investigate regressions.

It's not the master, but latest stable, which is v2018.10.27. Previous v2018.09.04 had no lag at all, but current stable lags in doc files for me. Disabling whitespace highlighter helps, even if there no highlighted whitespaces in the doc file for me when it's on, since I highlight only tabs, and doc files use spaces. I'm not sure if it can be categorized as an issue.

It would be nice to define what needs to be tested, of the top of my head:

  • Startup time loading all bundled scripts
  • Startup time not loading anything
  • Loading and highlighting a big (100M) file using complex highlighting (html is a pretty good candidate)
  • Various editing performance (global replace, insert some text in every lines, pipe through sort...).

If we had those tests, we could then eventually make it part of real testing, and fail on significant performance reduction.

I don't know why it's so important for you to load every single script at startup, even those that are not needed currently, since loading scripts for, say, Golang, when I, the end user never actually will work with Go, looks like bad engineering choice for me, but the rest points are good ones I think. Jumping inside huge file to various positions inside deeply nested code can be a good benchmark, if you have a way to measure when the code is fully highlighted. Bundled tools like lint script could also be tested since it uses builtin kakoune features like piping, which should also stay fast, but you've covered it by general pipe testing in the last point I think.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 6, 2018

Just so you know: I've opened a 182MiB xml file with kakoune, and the highlighting was perfectly stable. Then I've opended the same file, but that contained single line, and it still was readable and highlihgted properly. That's great. However, selecting whole buffer and then filtering out 822924 multicursors wasn't responsive at all. Pressing d after that hanged kakoune for very long time. Running kak -n allowed me to change this amount of text with multiple selections, but it was a slide-show experience. Not to blame anyone, I don't know any editor who could do such a thing but kakoune.

The memory usage is 1,120 MiB for a 182 MiB file, which is fine, I suppose.

For some reason file opens with modified state.

@mawww

This comment has been minimized.

Copy link
Owner

commented Dec 7, 2018

I don't know why it's so important for you to load every single script at startup

Its not really important to me that we load the scripts. What is important to me is that Kakoune is fast to start, and Kakoune file loading logic is simple to understand. I think the current system of just loading everything we find in the autoload folder is simple to understand, and I think there is some value in keeping the script sourcing code fast. Loading everything at the beginning puts pressure to keep this fast.

In other terms, I want that, in the default configuration, Kakoune starts fast while still being able to highlight almost any file you throw at it. If we get to a point where the starting speed cannot be made fast enough, we will have to complexify the loading behaviour to keep those two features, but if we can avoid needing that, I'll be happier.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2018

The memory usage is 1,120 MiB for a 182 MiB file, which is fine, I suppose.

Or is it?

@mawww

This comment has been minimized.

Copy link
Owner

commented Dec 12, 2018

The memory usage is 1,120 MiB for a 182 MiB file, which is fine, I suppose.

Or is it?

Not really, if I can get my hands on such a file I'll be interested to investigate.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2018

@Clyybber

This comment has been minimized.

Copy link

commented Dec 13, 2018

Opening that file, I get 806M/565M of memory consumption.

@andreyorst

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2018

Me too, but I hit one GiB on second launch. I don't know why.
image

And for some reason file opens with modified state. Even without user configuration folder.

@Clyybber

This comment has been minimized.

Copy link

commented Dec 13, 2018

@andreyorst Can't reproduce that behaviour, wierd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.