Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent OutOfMemoryErrors in Builds #41061

Closed
cbuescher opened this issue Apr 10, 2019 · 13 comments
Closed

Frequent OutOfMemoryErrors in Builds #41061

cbuescher opened this issue Apr 10, 2019 · 13 comments
Assignees
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@cbuescher
Copy link
Member

There seem to be a few instances of builds endings with OutOfMemoryErrors lately.

Some recent ones:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+artifactory/273/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/3057/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=openjdk12,nodes=immutable&&linux&&docker/350/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu8,nodes=immutable&&linux&&docker/126/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=ubuntu-14.04&&immutable/99/console

Builds end with something like:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
runbld>>> <<<<<<<<<<<< SCRIPT EXECUTION END <<<<<<<<<<<<
runbld>>> DURATION: 1798455ms
runbld>>> STDOUT: 102612179 bytes
runbld>>> STDERR: 0 bytes
runbld>>> WRAPPED PROCESS: FAILURE (1)
@cbuescher cbuescher added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Apr 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@cbuescher cbuescher changed the title OutOfMemoryErrors in Builds Frequent OutOfMemoryErrors in Builds Apr 10, 2019
@cbuescher
Copy link
Member Author

And another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/3062/console

@cbuescher
Copy link
Member Author

This one looks slightly different but I'll add it here to get a better overview of things:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+artifactory/283/console

* What went wrong:
Java heap space

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org
runbld>>> <<<<<<<<<<<< SCRIPT EXECUTION END <<<<<<<<<<<<
runbld>>> DURATION: 2055497ms
runbld>>> STDOUT: 106231627 bytes
runbld>>> STDERR: 0 bytes
runbld>>> WRAPPED PROCESS: FAILURE (1)

@alpar-t
Copy link
Contributor

alpar-t commented Apr 10, 2019

I suspect that this is the Gradle client module that is running out of memory here no the daemon, because there's no demon crashed message.
We got a heap of 64M configured there via a Gradle upgrade in 530e1e3 .
The new test runner might be creating more pressure, will try to increase it.

alpar-t added a commit to alpar-t/elasticsearch that referenced this issue Apr 10, 2019
The Gradle Daemon actually running the build does so  with a 2GB heap.
This PT changes the heap configuration of the gradle process that talks
with the daemon to trigger the builds and relies the messages.

Relates to elastic#41061
@cbuescher
Copy link
Member Author

@polyfractal
Copy link
Contributor

@jrodewig
Copy link
Contributor

@alpar-t
Copy link
Contributor

alpar-t commented Apr 12, 2019

Looking at some failures after #41031 were merged and seems no heapdump was produced which seems to confirm this is a problem on the client.

@jmlrt jmlrt closed this as completed Apr 12, 2019
@danielmitterdorfer
Copy link
Member

We see another instance of this in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=centos-7&&immutable/113/console

It fails with:

Gradle Test Executor 54 finished executing tests.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid32651.hprof ...
Heap dump file created [169289336 bytes in 1.669 secs]

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "pool-1-thread-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

Inspecting the resulting heap dump we see almost all of the memory (124 out of 128MB) taken up by instances of org.gradle.internal.logging.events.StyledTextOutputEvent so to me this seems related to Gradle accumulating log events in the background.

@danielmitterdorfer
Copy link
Member

There is also another instance of this failure in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=oraclelinux-6/113/console. The assessment is identical to the failure above hence I'll omit the details.

@alpar-t
Copy link
Contributor

alpar-t commented Apr 18, 2019

alpar-t added a commit to alpar-t/elasticsearch that referenced this issue Apr 18, 2019
We are no longer using these dependencies.

Relates to elastic#41061 since the class that seems to be leaking is both part
of Gradle and the logging jar.
@mark-vieira
Copy link
Contributor

Certainly the way that test runners ship log output to the client has changed. It's not surprising the Gradle native daemon->client communication creates more garbage. I'm wondering if the recent change to remove --info will also help here. One side-effect of the Gradle test runner migration was that we ended up creating a lot more test output.

alpar-t added a commit that referenced this issue Apr 19, 2019
We are no longer using these dependencies.

Relates to #41061 since the class that seems to be leaking is both part
of Gradle and the logging jar.
alpar-t added a commit that referenced this issue Apr 19, 2019
We are no longer using these dependencies.

Relates to #41061 since the class that seems to be leaking is both part
of Gradle and the logging jar.
@alpar-t
Copy link
Contributor

alpar-t commented May 6, 2019

Looking at build logs it looks like this stopped happening after the cleanup.

@alpar-t alpar-t closed this as completed May 6, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
We are no longer using these dependencies.

Relates to elastic#41061 since the class that seems to be leaking is both part
of Gradle and the logging jar.
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

7 participants