Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Investigate Heap Requirements for Smarti #147

Closed
westei opened this issue Nov 20, 2017 · 4 comments
Closed

Investigate Heap Requirements for Smarti #147

westei opened this issue Nov 20, 2017 · 4 comments
Assignees
Milestone

Comments

@westei
Copy link
Member

westei commented Nov 20, 2017

When running with -Xmx4g a java.lang.OutOfMemoryError: Java heap space was encountered.

As this is unexpected we need to further investigate memory consumption of Smarti. This include

  • implementation of a stress test utility
  • testing with high loads of conversation with different lengths and random messages
  • testing memory footprint of different analysis configurations (especially NLP processing components and models)
  • investigate for possible memory leaks

NOTE: marking this as enhancement with the intention to create additional issues based on investigation results

@westei westei added this to the v0.6.1 milestone Nov 20, 2017
@westei westei self-assigned this Nov 20, 2017
@ghost ghost assigned ja-fra Nov 20, 2017
westei added a commit that referenced this issue Nov 22, 2017
* added a configuration that allows to configure the executor service pool size for processing. The default is set to 2 as requested by #145

solves #147

* changed configuration for the Stanford NLP processing to use the Shift Reduce Parser as this one has a lower memory footprint
* added nlp.stanfordnlp.de.parseMaxlen=40 to prevent parse tree generation for long sentences that could end up in OOM situations

NOTE: Those changes depend on bug-fixes in redlink-nlp and an update to Stanford NLP 3.8.0
@westei
Copy link
Member Author

westei commented Nov 22, 2017

The investigation concluded that their are no memory leaks present. Even long processing runs of >1000 conversations with >5000 messages showed no increase in base memory.

The OOM errors could be traced down to messages containing long sentences (or other Strings e.g. ASKII graphics) that cause the Stanford NLP Parser to require huge amounts of memory.

Several solutions where tested with the following results:

  1. first and foremost ist is necessary to limit the maximum length of tokens a sentence can have for the parser to process it. This can now be done by using the nlp.stanfordnlp.de.parseMaxlen property. The default 30 is good for 4g java heap and the current analysis configuration.
  2. to NOT use the Factored Parser. While this is the default of Stanford NLP it is by far the slowest and need the most Memory. For Smarti the default was set to the PCFG Parser. The Shift Reduce Parser is an alternative that is even faster and uses less memory (configure nlp.stanfordnlp.de.parseModel=edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz to use this parser)
  3. the number of processing threads is now configureable (as requested by Smarti should run with only 2 processing threads #145) and the default is now set to 2 (was 8). For every thread one should reserve about 500m additional heap (so on an 8 core machine it is recommended to run Smarti with 8g heap if 8 processing threads are configured)

With this configuration no OOM errors where encountered. Only when processing ASKII Graphics the PCFG Parser was driving the System to its limits while causes a lot of GC overhead. After processing the System recovered without problems and continued normal.

NOTE: Those changes require the newest SNAPSHOT version of redlink-nlp. As this also update to Stanford NLP 3.8.0 (was 3.6.0) Snarti users will need to update the Stanford NLP jars in the ext folder accordingly (see also according changes to dist/src/main/resources/plugin-info.txt)

@ruKurz
Copy link
Collaborator

ruKurz commented Dec 8, 2017

at least very cool that we have knowledge in that depth.

@ruKurz
Copy link
Collaborator

ruKurz commented Dec 15, 2017

@westei Can you please write a short doc on how you run the stress test for the ressource consumption behavior of Smarti?

@westei
Copy link
Member Author

westei commented Dec 28, 2017

created #179 for the documentation

@westei westei closed this as completed Dec 28, 2017
@ghost ghost removed the ready label Dec 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants