Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LO add-on] Latest nightly ate 24 GB RAM - 2023-03-10 #8012

Closed
marcoagpinto opened this issue Mar 10, 2023 · 76 comments
Closed

[LO add-on] Latest nightly ate 24 GB RAM - 2023-03-10 #8012

marcoagpinto opened this issue Mar 10, 2023 · 76 comments
Labels
office-integration the LibreOffice/OpenOffice add-on

Comments

@marcoagpinto
Copy link
Member

Hello @FredKruse ,

I tried the latest nightly and my computer RAM was almost 100% used (32 GB).

When I noticed that the computer was terribly slow, I opened task manager and LibreOffice was taking 24 GB of the 32 GB of RAM.

I sent you the log file via e-mail.

Could you take a look at it?

Thanks!

@danielnaber danielnaber added the office-integration the LibreOffice/OpenOffice add-on label Mar 11, 2023
@FredKruse
Copy link
Contributor

This error is a mystery to me. Marco sent me a log file stating that the maximum heap space was 7262 MB. The extension limits itself and switches to a fallback mode at approx. 6500 MB, which releases all additional interfaces and deletes the cache. In principle, it is therefore impossible for the extension to allocate 24 GB of storage space. I am completely at a loss. @danielnaber do you have an idea for that?
In the log file, I noticed a problem with the internal cache allocation. I'll try to solve that. @marcoagpinto, I need your help for testing. I'll let you know when there's an update.
However, I don't think this solves the memory problem.

@FredKruse
Copy link
Contributor

By the way, I couldn't reproduce the bug.

@FredKruse
Copy link
Contributor

@marcoagpinto I did a bug fix. Please test it tomorrow.

@danielnaber
Copy link
Member

The extension limits itself and switches to a fallback mode at approx. 6500 MB

How does it do so? Doesn't this depend on the JVM settings in LibreOffice?

@FredKruse
Copy link
Contributor

The application test it: Runtime.getRuntime().maxMemory()

@danielnaber
Copy link
Member

The application test it: Runtime.getRuntime().maxMemory()

And how does it use that value then to limit the memory use? I don't see that in the code.

@FredKruse
Copy link
Contributor

The methods MultiDocumentsHandler.isEnoughHeapSpace() and MultiDocumentsHandler.testHeapSpace() test if more the 90% of heap space is reached and do the switch to another mode.
The method OfficeTools.getCurrentHeapRatio() is used to get the information how near the extension is to the limit (the intervals of testing get shorter when the heap grows).

@danielnaber
Copy link
Member

The methods MultiDocumentsHandler.isEnoughHeapSpace() and MultiDocumentsHandler.testHeapSpace() test if more the 90% of heap space is reached and do the switch to another mode.

I see, but I don't know what the issue could be. Maybe it's not really the JVM that's taking the memory, but LibreOffice?

@marcoagpinto
Copy link
Member Author

@FredKruse

I tested today's nightly, and it kept on crashing.

The RAM usage also increased to 16 GB in just a few minutes, and I had to close LibreOffice.

I sent the log via e-mail.

LT 1 crash_20230313

LT 2 crash_20230313

LT 3 crash_20230313

@marcoagpinto
Copy link
Member Author

@FredKruse

Maybe you are using the total RAM size instead of the heap RAM size?

@FredKruse
Copy link
Contributor

The log-file @marcoagpinto sent to me shows a bug. This has to be fixed, but it will take a little time. I won't be able to look at this until the end of the week.
But I can't say, if the bugs documented in the log are related to the memory problem.

@FredKruse
Copy link
Contributor

@marcoagpinto I still managed to resolve the problems in the log file. Please test it tomorrow. Please send me the log file.

@marcoagpinto
Copy link
Member Author

Thanks, I will test tomorrow night.

@marcoagpinto
Copy link
Member Author

@FredKruse

I have sent you the log file via e-mail.

The memory bug is still present, and the log shows numerous errors.

Thanks!

@FredKruse
Copy link
Contributor

The problems that the log file shows are solved.
@danielnaber But that still doesn't solve the memory problem. It might be related to another bug being discussed in the forum (https://forum.languagetool.org/t/lt-und-lo/8798). When highlighting a word in LO and pressing the right mouse button, LO sometimes freezes. I checked this with VisualVM. Increased CPU activity is shown (a full core or more), but the sampler is empty (I enabled it before). Normally, the thread that causes the CPU load is displayed and you can find the cause. I have no idea what else to test. Do you have any advice for me how to find the problem?

@danielnaber
Copy link
Member

Do you have any advice for me how to find the problem?

Once the problem occurs, you can get a stack trace using jstack: https://stackoverflow.com/questions/4876274/kill-3-to-get-java-thread-dump

@FredKruse
Copy link
Contributor

@marcoagpinto: Did the last change solve the problem? Are there any bug reports in the log-file?
@danielnaber: The thread dump hasn't clarified the problem. After hours of tests, I'm thinking, there is a general problem in the relation between the LO-Spellchecker and the LT-extension. The extension adds the contents of the additional LT-spelling-files to LO as user dictionaries. This seems to be a problem in the mentioned cases above. I like to make a general change to the extension. The spellchecking should be done by LT too. But this will be a mayor change. It costs some time of development and some more of testing. I planned this for LT 6.2. In 6.1 we have to leave it as it is.

@marcoagpinto
Copy link
Member Author

@FredKruse

I tested it days ago, and it only consumed ~7 GB of RAM for my near 600-page thesis.

The size was increasing very slowly, but I believe it is fixed, thanks!

I haven't looked at the log file.

@marcoagpinto
Copy link
Member Author

@FredKruse

I was just checking the ~600-page thesis.

The log seems clean:
LanguageTool_PhD_20230325.log

However, as I scrolled down the document, the RAM usage is increasing and when it reached 16 GB I closed LibreOffice.

lt_RAM_20230325

@marcoagpinto
Copy link
Member Author

@FredKruse

Hello!

Any news regarding this?

Have you checked if you are using the computer RAM in the code instead of the Java RAM?

@FredKruse
Copy link
Contributor

I've just posted an update of the LT extension. It solves a problem with the integration of the additional LT spelling files into LO spell checker. In the past, this had triggered unexpected side effects. I'm uncertain if it solves your problem, but please test the new version and give me feedback.

@marcoagpinto
Copy link
Member Author

@FredKruse

I am very sad… 😢 😢 😢 😢 😢 😢 😢 😢

LT_RAM_20230404

@FredKruse
Copy link
Contributor

@marcoagpinto I'm very sorry, but I have no idea. There are two strange things:
1.) I can't reproduce the problem with my documents.
2.) LT is written in Java. All memory, which is allocated by the software, is allocated inside the Java virtual machine - the heap space. Usually the heap space is very much smaller than 15 GB. My default documents are 300 pages, nearly 100,000 words. Usually, the heap space doesn't grow to more than 1 GB.

Did you test to load your document into LO with deactivated LT? You could open the extension manager, deactivate LT, restart LO and load your document. If the memory grows in the same way, there is a bug in LO and you have to contact the developer.

If the error only occurs when LT is installed/enabled, then we need to figure out how to find a solution.

@marcoagpinto
Copy link
Member Author

@FredKruse

thesis_number_words_20230404

Yes, a week or two ago I removed LT from the extensions and LibreOffice didn't consume much RAM.

I am not sure if this helps, but the thesis has hundreds of figures and dozens of equations and tables.

Thanks!

@marcoagpinto
Copy link
Member Author

@FredKruse

See the attached docx where I tried to kind of simulate the thesis.

The RAM usage increases until 8 GB and then slowly it starts to decrease.

Does it help?

Thanks!

Thesis_150_figures_20230404.docx

TestThesis_20230404

@marcoagpinto
Copy link
Member Author

marcoagpinto commented Apr 4, 2023

@FredKruse

My wild guess is that it may be related to the IEEE bibliography.

After I added it to this DOCX, the RAM usage doubled or tripled.

In my thesis, I have over 400 bibliographical entries in IEEE format (dozens of pages of entries).

But it is just a wild guess.

@marcoagpinto
Copy link
Member Author

Ahhhhh… the spelling mistakes, I had turned on the rule for unknown words to appear underlined in black.

So, LanguageTool had to underline all the words I added to standard.dic, I am uncertain if this information is important to you.

There were dozens, many dozens, probably of words to be underlined in black.

@FredKruse
Copy link
Contributor

I had to change back to an older version because of some side effects. But I did some other optimization. Now, there is no idea left, I fear.

@marcoagpinto
Copy link
Member Author

@FredKruse

I will test it tomorrow night, since the nightly for today doesn't include the changes you have done minutes ago.

Thanks!

@marcoagpinto
Copy link
Member Author

@FredKruse

Heya, Fred,

I created a blank profile on LibreOffice for the test.

I was running my thesis for around 30–40 minutes and the RAM usage was stable at around 7 GB.

I disabled only two rules and enabled three or four.

Now and then I would scroll down.

I reached a figure and LibreOffice closed (probably a bug in LO).

I reopened the thesis and started scrolling down quickly now and then and the RAM usage started increasing until it reached 13-14 GB and I closed LO. I had it open for another around 40 minutes when this happened.

I looked at the log file and, compared to the previous one, it looks clean, except it has Null in the start regarding the cache file:

LT office integration log from Tue Apr 25 20:42:49 BST 2023

LanguageTool 6.2-SNAPSHOT (2023-04-25 16:44:35 +0000, 4ad5344)
OS: Windows 10 10.0 on amd64
LibreOffice 7.5.3.1 (The Document Foundation), en-GB
Java-Version: 1.8.0_371, max. Heap-Space: 7262 MB, LT Heap Space Limit: 6536 MB

CacheIO: getCachePath: cacheFileName == null!
MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 1
Time to generate cache(4): 14971

@FredKruse

Could you give one last try to fix it?

Maybe fixing the null will solve the issue (one can hope)?

Then, we could release a 6.1.1 probably, what do you think, @danielnaber ?

Thanks!

❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️ ❤️

@marcoagpinto
Copy link
Member Author

This thesis is being very demanding, for both of us.

@FredKruse
Copy link
Contributor

Hi Marco, I'm on vacation. I can work on this issued not before end of next week. I think we should release a version 6.1 as it is, because it solves many other bugs and it runs very well with "normal" texts. I feel, to get your thesis run will take awhile.

@marcoagpinto
Copy link
Member Author

@FredKruse

Sure, Fred.

Happy Holidays!

FredKruse added a commit that referenced this issue May 6, 2023
… of SpellChecker service and resolving a bug in text level queue - related to issue #8012
@FredKruse
Copy link
Contributor

I made some optimizations on a deep level of a LO interface. In my test documents, the check now runs much faster and uses less memory. Please test the new version.

@marcoagpinto
Copy link
Member Author

I will test tomorrow's nightly.

Thank you for all your hard work.

@marcoagpinto
Copy link
Member Author

@FredKruse

Heya,

I created a new profile in LibreOffice to start from zero.

When I opened my .odt for the first time, I got:
lt_20230507

Ahhhhh… now I am going to test it.

@marcoagpinto
Copy link
Member Author

@FredKruse

This is terrible.

The memory usage seemed stable in my thesis at around 4,1 GB.

After I added one word to the standard.dic with a right-click, I have the rule that underlines unknown words in black, the memory usage doubled in just a minute or so, and I had to close LibreOffice.

Log:

LT office integration log from Sun May 07 20:16:29 BST 2023

LanguageTool 6.2-SNAPSHOT (2023-05-07 16:44:16 +0000, 525efe4)
OS: Windows 10 10.0 on amd64
LibreOffice 7.5.3.2 (The Document Foundation), en-GB
Java-Version: 1.8.0_371, max. Heap-Space: 7262 MB, LT Heap Space Limit: 6536 MB

java.lang.NullPointerException
	at org.languagetool.openoffice.OfficeDrawTools.isImpressDocument(OfficeDrawTools.java:147)
	at org.languagetool.openoffice.MultiDocumentsHandler.getCurrentDocument(MultiDocumentsHandler.java:275)
	at org.languagetool.openoffice.MultiDocumentsHandler$LtHelper.run(MultiDocumentsHandler.java:2054)

CacheIO: CacheCleanUp: Remove Path from CacheMap: /C:/Users/marco/Desktop/LANGUAGETOOL TESTS/PhD_thesis_marcoagpinto_IST_1Main_V0084unsent.odt
MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 1
Time to generate cache(4): 5212
CacheIO: getCachePath: cacheFileName == null!
MultiDocumentsHandler: removeDoc: Interrupt text level queue for document 1
MultiDocumentsHandler: removeDoc: Interrupt done
Disposed document 1 removed
MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 2
Time to generate cache(4): 9226
Time to generate cache(6): 11091


@FredKruse
Copy link
Contributor

Hi @marcoagpinto, this update solves the null pointer exception above, but not the memory problem.
I think, I could identify the problem. There is bad news: The bug is not solvable inside of LT because the problem exists on the side of LO. The interfaces between LT and LO allocate memory on both sides. LT is written in Java. The memory is limited by the heap space of the java engine. This limit is not reached. Objects that are no longer needed are freed, and the memory with them. On the side of LO, there also exists a mechanism to free no longer used memory. This seems not to work for a special interface (XFlatParagraph). This interface is definitely needed to run the text level check.
I tried to reduce the use of the interface as far as possible. It doesn't seem to be enough for your document.
But there is also good news: I did a little hack for LO 7.6, which enables LT to get some Information, which helps to reduce the use of the interfaces. Maybe this solves the problem.

@marcoagpinto
Copy link
Member Author

@FredKruse

Thanks!

@marcoagpinto
Copy link
Member Author

@FredKruse

I opened a ticket in LibreOffice's Bugzilla:
https://bugs.documentfoundation.org/show_bug.cgi?id=155232

@mikekaganski
Copy link

Hi @FredKruse,

could you please provide some kind of minimal reproducer (in Java, or in Basic, or just list a sequence of UNO calls) to get the problem in LibreOffice? I assume that some service instance is created, then destroyed on your side, but not on LibreOffice side? So possibly some kind of a simple loop of creating and destruction of the service could be such a reproducer.

Either here, or in the bug report that Marco created, would be great. Thank you!

@FredKruse
Copy link
Contributor

Hi @mikekaganski, I built a dummy proofreader as LO extension. It does no real proof and no marking. It gets all XFlatParagraphs using the method "getNextPara" to get an initial paragraph and after that "getParaBefore" and "getParaAfter" from XFlatParagraphIterator and stores it in an ArrayList. The whole procedure is running in a loop. After the whole document is stored as XFlatparagraphs, the list is emptied and the XflatParagraphs are called and stored again. The loop runs 10000 times per paragraph.
You should disable or remove all grammar checkers from your LO installation and install the OXT after that. After restart of LO, load a document containing some hundred paragraphs.
In my tests the java heap space doesn't exceed 800 MB while the used memory of LO grows steady.
Here the file (the zip has to be changed to oxt):
StarterProject.zip

@mikekaganski
Copy link

Thanks @FredKruse.
It seems, your extension doesn't include a source code?

@FredKruse
Copy link
Contributor

Here is the source code:
source.zip

@mikekaganski
Copy link

Indeed, the iterator has this curious m_aFlatParaList thing:

https://git.libreoffice.org/core/+/master/sw/source/core/inc/unoflatpara.hxx#127

... and each and every instance of flat para gets there, such as in getParaAfter:

https://git.libreoffice.org/core/+/master/sw/source/core/unocore/unoflatpara.cxx#536

Why do we need them...

@FredKruse
Copy link
Contributor

As far as I see, there is an insertion of an XFlatParagraph at any operation into the list m_aFlatParaList. But the list will not be emptied anywhere. On the other hand, I couldn't find any reference to that list. So, it should be removed, I think.
@mikekaganski Do you work on LibreOffice and could do the fix?

@marcoagpinto
Copy link
Member Author

@FredKruse @danielnaber

@mikekaganski has committed a patch for LibreOffice 7.6 on Gerrit.

It will be available in 24/48 hours.

Fred, could you in a few days download the latest LibreOffice 7.6 nightly and check if it is fixed and if there is no need any more for a hack in LanguageTool?

Thanks!

@FredKruse
Copy link
Contributor

I created a workaround on the base of the discussion above.
@marcoagpinto Please test it.

@marcoagpinto
Copy link
Member Author

@FredKruse

Sure Fred, I will test it tonight at around 7pm or 8pm.

I will report here the results.

Notice that yesterday I tested yesterday's night, and it was consuming (and increasing) tons of RAM just by having my thesis open.

@marcoagpinto
Copy link
Member Author

marcoagpinto commented May 13, 2023

@FredKruse

I believe that the memory issue is resolved.

I opened the thesis with LanguageTool settings in default (“set to default”), multiple core, activated manually some rules.

The maximum RAM it consumed from the 32 GB was 8,4 GB, but when I finished scrolling through the ~600 pages the RAM usage by LibreOffice was only 6,976.6 MB. I spent over an hour with the thesis opened.

There is still a “null” in the log:

LT office integration log from Sat May 13 19:03:54 BST 2023

LanguageTool 6.2-SNAPSHOT (2023-05-13 16:44:32 +0000, 6648d0c)
OS: Windows 10 10.0 on amd64
LibreOffice 7.5.3.2 (The Document Foundation), en-GB
Java-Version: 1.8.0_371, max. Heap-Space: 7262 MB, LT Heap Space Limit: 6536 MB

CacheIO: getCachePath: cacheFileName == null!
MultiDocumentsHandler: getNumDoc: Document 0 created; docID = 1
Time to generate cache(4): 15012

One thing I did notice is that adding words to the standard.dic with a right-click increased the RAM usage in 100s of MB or even 1 GB per word (still not fixed, or it is normal?). I only added two words to standard.dic and with the unknown words rule enabled and with the colour black.

After adding the words to standard.dic the CPU load increased to the rate of 20 something percent for a long time.

But I guess it is good for the regular user if it passed my thesis test 😄 😋 … my thesis isn't an easy document.

Thank you!

@FredKruse
Copy link
Contributor

I will close this issue as solved. Open a new issue with current versions if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
office-integration the LibreOffice/OpenOffice add-on
Projects
None yet
Development

No branches or pull requests

4 participants