Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geany should check Scintilla status after operations #1569

Open
elextr opened this issue Aug 3, 2017 · 22 comments
Open

Geany should check Scintilla status after operations #1569

elextr opened this issue Aug 3, 2017 · 22 comments
Labels

Comments

@elextr
Copy link
Member

elextr commented Aug 3, 2017

See #1540 Scite gives an error message that Scintilla runs out of memory but Geany just hangs.

All operations should have a SCI_GETSTATUS after them.

@elextr elextr added the bug label Aug 3, 2017
@codebrainz
Copy link
Member

It's not entirely clear that #1540 is indeed an OOM, or that Geany didn't report anything. I would expect at least a pile of CRITICAL debug messages coming from the g_*_if_fail() asserts. I doubt Geany could do much else than that since it's OOM. It would have to keep a parachute buffer to free before taking any action on the OOM condition or something like this.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

To clarify, a search of Geany source shows that SCI_GETSTATUS is never used, so we have no way of knowing if its OOM, thats what this bug is about, #1540 is just the report that caused the search :)

@codebrainz
Copy link
Member

It's OOM when it freezes or crashes :)

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

How do you know? It might just be another bug :)

(and crashes are mostly plugins :)

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

On a slightly more serious note, getting a memory exhausted status from Scintilla doesn't mean its out of memory, just that it failed to allocate a new 300MB buffer, there still may be 299MB left that would allow Geany to save the small files you have changed but not saved, if only Geany gave you the message.

@codebrainz
Copy link
Member

FWIW, when GLib aborts the process on malloc fail, it writes this to the debug messages (if -v is used):

GLib: .../gmem.c:165: failed to allocate XXX bytes

I just tested opening an 18GB file on my Win7 machine that has 16GB of RAM. This is when opening a file, so presumably when Geany is reading it into a string before it gets to Scintilla.

@AdamDanischewski
Copy link

AdamDanischewski commented Aug 4, 2017

This may be beyond the scope of Geany, but it looks like vi was designed to be efficient for use on a 300 baud modem, so it loads lines on demand. It may be worth looking into the source code, yet is a bit hackish.

"Author of Scintilla here. Scintilla does not use a list of lines. The text is stored in a gap buffer (the substance field in the code shown), like EMACS. Line start positions (added by the InsertLine method) are also stored in a gap buffer but with a 'step' which enables modifications in close proximity to affect few elements."

What is inefficient about this is if you have a giant line, Scintilla will slow right down, because it goes through every character in the block to determine where new lines are. So if you copy & paste from an external source, you could potentially see a slow down. As a code editor though, this should rarely be a problem. Like GtkTextView, Scintilla's Editor interface is lacking key binding overlays to allow users to use Vi or Emacs keybindings out of the box.

https://ecc-comp.blogspot.com/2015/05/a-brief-glance-at-how-5-text-editors.html

From what I've read, the main problem has to do with dynamic line sizes, Geany/Scintilla doesn't know where the next line will end so it has to scan every character to keep its loaded undo buffer. I'm not sure why you guys reported that it consumes twice the size of the file size. It sounds like it shouldn't, but I don't have time to test it at the moment.

Not sure if it would be possible to improve geany to using a static line size setting that eliminates the scanning and possibly the dynamic undo buffer to ensure memory is not allocated if its not necessary.

It would also be nice to warn the user if Geany is in danger of running out of memory. Then a user could commit changes as necessary and reset the undo buffer while not continuing to consume more memory.

@codebrainz
Copy link
Member

codebrainz commented Aug 4, 2017

the main problem has to do with dynamic line sizes

Naw, Scintilla uses a contiguous (gap) buffer to store the data. I believe this is what Emacs and other editors do, as it's a fair trade-off for typical text file operations.

Geany/Scintilla doesn't know where the next line will end so it has to scan every character to keep its loaded undo buffer

It does scan every character to locate line-endings, which it caches in a side table to make other operations much more efficient. This isn't really related to undo buffer.

I'm not sure why you guys reported that it consumes twice the size of the file size.

In addition to the side table for the line offset information, it also keeps a separate gap buffer which holds a byte for the styling information for each character. This is why it's at least twice the size in memory. Also when Geany loads a file, I believe it loads it into a string, sends it to Scintilla, and then frees it, so it uses 2x the total size there too. In addition, internally when Scintilla re-allocates its internal buffers/tables, it most likely causes 2x memory overhead (in addition to the gap buffer's extra capacity), so it can copy data from old memory to new memory. I think this is what @elextr is referring to.

Not sure if it would be possible to improve geany to using a static line size setting

It wouldn't, and it would also break files with long-lines (ex. minified JS or whatever).

possibly the dynamic undo buffer to ensure memory is not allocated if its not necessary.

It would be possible to optimize the undo buffer by storing the delta/diff between the current buffer vs the top of the undo buffer on reload, but it would make a different set of tradeoffs for time vs space which may not be as optimal for typical text file editing needs.

It would also be nice to warn the user if Geany is in danger of running out of memory. Then a user could commit changes as necessary and reset the undo buffer while not continuing to consume more memory.

Agree, but there are many cases where it's not possible. Inside GLib when it tries to allocate memory, if it fails, it aborts the process, which is a fairly reasonable thing to do on decent modern operating systems. It may (or not) be possible to gracefully handle OOM conditions inside Scintilla as this PR is about though, assuming when the Scintilla call returns, it leaves enough memory for Geany to allocate a new dialog window and format a string into it's message.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@AdamDanischewski as was said on #1540, Geany's use-case is for editing human program source files, it assumes they are reasonably sized and loads the whole file and keeps style information for the whole file so it doesn't have to be regenerated all the time (hence twice the size). If Geany's use-case was to open the biggest possible file it may be designed differently and would not use Scintilla. That is unlikely to change.

In general its not possible to warn if a program is "running out of memory" before it does (ie an allocate fails) because that can depend on what else is using memory. Its not even possible to report if a program is running out of address space (which is actually what the 32bit windows problem is) because many library functions use memory that Geany has no idea about.

The best we can do is as I posted above, don't crash/hang when an allocate fails (but obviously don't complete the operation that was happening) and let the user try saving their files, it may still work if the allocate that failed was very large and the files they want to save are small.

@codebrainz yeah, Geany does check the error return from g_file_load_contents() and simply does nothing if it fails :)

This issue is however intended to be solely about Scintilla status returns, but I'm sure more issues could be raised for other places return checks are missed.

@codebrainz
Copy link
Member

This issue is however intended to be solely about Scintilla status returns, but I'm sure more issues could be raised for other places return checks are missed.

Yeah, at least there's a chance with Scintilla, and inside GLib, it at least prints a message to the console before crashing.

I just tested with a more reasonably sized file (~500MB) and it does indeed just hang, and the process is only using the typical amount of memory (~10MB), so it surely does sound like a case where Scintilla recovered (and presumably set the status code), and Geany just carried on blindly.

@AdamDanischewski
Copy link

Its not even possible to report if a program is running out of address space (which is actually what the 32bit windows problem is) because many library functions use memory that Geany has no idea about.

@elextr I realize that Geany/Scintilla won't know what the rest of the machine will do, yet what it can do is keep a healthy buffer amount of memory allocated and if it fails to grab more then warn the user. It may take up some more memory as a result but if you make it an option it may be useful for people dealing with large files.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

I wonder if Scintilla has set the status when it calls notifications, and if it still notifys when allocation fails, would be nice if most cases could be captured by a a test in on_notify but I somehow doubt it.

@codebrainz
Copy link
Member

I wonder if Scintilla has set the status when it calls notifications, and if it still notifys when allocation fails, would be nice if most cases could be captured by a a test in on_notify but I somehow doubt it.

I expect all Scintilla calls set the status when non-zero, whether through notification callbacks or not. Geany allocating memory itself using stdlib or glib will obviously not.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@AdamDanischewski

if it fails to grab more then warn the user.

Thats what this issue is for.

To be clear Scintilla does keep a "healthy" amount of memory allocated, but it decreases as you add characters to it until its full then reallocates, and if you do a big copy and paste you can exceed ANY pre-allocation, so there is no point of trying to guess, you always have to handle allocation failure.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@codebrainz the concern is that the on_notify will not be called by Scintilla if the insert failed because it couldn't allocate.

And as I said above, non-Scintilla allocates are not this issue.

@codebrainz
Copy link
Member

Until this issue is properly solved, at a bare minimum we could do like in this branch: master...codebrainz:check-sci-status

If anyone in the future cares to make it handle the results/status on a case-by-case basis and make it able to throw up a dialog box or whatever to warn the user, it could be extended from the code in that branch.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@codebrainz yeah, something like thats a good start.

@AdamDanischewski
Copy link

To be clear Scintilla does keep a "healthy" amount of memory allocated, but it decreases as you add characters to it until its full then reallocates, and if you do a big copy and paste you can exceed ANY pre-allocation, so there is no point of trying to guess, you always have to handle allocation failure.

I'm talking about keeping a certain memory reserve at all times, that is set to something sensible that could be modified in a setting by the user and if the healthy buffer theshold amount cannot be allocated then the warning happens. This would allow the user to "tidy up" and save the program before any crash/freeze event occurs. Thereafter the save the buffers are wiped out and there is enough memory again. Not simply allocating a healthy buffer initially and then writing to it until it runs all the way out and hope that it can allocate more - it could be dangerous letting things go that far.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@AdamDanischewski ok, but there is still no obvious amount to keep in reserve, how much memory is used for things like saving files depends on the size of the file. And since there is an unknown amount of memory allocated in libraries the "reserve" memory would have to be de-allocated to make it available, and then Geany has no control over where it is used.

The best solution is to not crash or hang on memory exhaustion and to try to notify the user to save files. If they save and close small files more memory will progressively become available. If they only have one huuge file open then there is not much to do about it. Also note that the problem currently will only occur on 32 bit builds, which currently means windows, but if someone contributes the build process for 64 bit windows that can be made available.

@AdamDanischewski
Copy link

@elextr You can profile the application for average use cases. Then choose a sensible default setting that can be adjusted by the user if they know their needs better.

@elextr
Copy link
Member Author

elextr commented Aug 4, 2017

@AdamDanischewski well, you can provide a PR that shows it working, but its not the point of this issue.

@AdamDanischewski
Copy link

AdamDanischewski commented Aug 4, 2017

I don't have much time - but if anyone wants to take a crack at finding sensible settings for a healthy memory reserve threshold, to preclude freezes and data loss. I took a quick look via strace, and a strace output parser I found (https://github.com/gmarcais/memtrace), here is what my system looks like for my typical use:

$> strace -e trace=memory geany &> /tmp/mystrace.txt
$> ./memtrace.rb -f /tmp/mystrace.txt 
      max: heap    7675904  (7.32M) mmap   17944616  (17.1M) vm   25620520  (24.4M)

So, for my typical use case having around 50MB should cover what is actually required - but the healthy buffer itself is beyond what is required to run so adding a few more MB (~5MB) should normally be okay. If the user strays outside of the healthy reserve then Geany/Scintilla should try to allocate what will be required + however much more to get back the healthy buffer. If Geany cannot allocate the memory for both then a warning should be thrown telling the user of the situation and allowing the user to take corrective measures (e.g. saving) and preclude data loss.

So to be clear the healthy buffer is not necessarily stopping the user from any operations, it is simply a safety margin to allow for saves and clean shutdown of what Geany is already dealing with before Geany actually runs out of memory.

So a formula for healthy buffer may look something like: Σ(file sizes)*2 + 5MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants