Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux Core Dump with OOM #5470

Closed
pminearo opened this issue Nov 19, 2022 · 6 comments
Closed

Linux Core Dump with OOM #5470

pminearo opened this issue Nov 19, 2022 · 6 comments
Assignees
Labels

Comments

@pminearo
Copy link

Describe the issue
We are getting an OOM in one of our environments. Along with this OOM, we are getting a Linux Core Dump. After doing the following test, we think there is a situation when the MetaSpace is full and an OOM is thrown GraalVM will, also, throw a Linux Core Dump.

  1. Initially we had the following Java parameters: -Xmx1024m -XX:MaxMetaspaceSize=256m. This is where the Linux Core Dump and OOM was thrown.
  2. Changed the Java parameters to -Xmx2048m -XX:MaxMetaspaceSize=256mand did not get either. Process took a little longer, but did finish.
  3. Changed the Java parameters to -Xmx1024m, and removed the -XX:MaxMetaspaceSize to allow it to be set to default (unlimited). We got the OOM, but not the Linux Core Dump
  4. Changing the Java parameters to -Xmx2048m, and leaving -XX:MaxMetaspaceSize set to default (unlimited). Waiting for this test to finish, but (based on 2 above) my suspicion is that neither the OOM, nor the Linux Core Dump will be thrown. There should be plenty of space for the process to finish. Will update this issue if we get some other error.

What we are suspecting is that when the MetaSpace is full (and limited) and an OOM is thrown; a Linux Core Dump is thrown. Not sure if this is just a Centos 7 problem, or seen on other linux distros.

Expected behavior is an OOM would just be thrown, but no Linux Core Dump would happen.

Steps to reproduce the issue
Throw an Out of Memory in the JVM Heap and have the JVM Meta Data Space full at the same time.

Describe GraalVM and your environment:
We are running on an AWS EC2 instance that has the centos 7 image.

[HostName] $ hostnamectl
   Static hostname: [XXX]
         Icon name: computer-vm
           Chassis: vm
        Machine ID: [XXXX]
           Boot ID: [XXXX]
    Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-1062.12.1.el7.x86_64
      Architecture: x86-64

[HostName] $ java -Xinternalversion
OpenJDK 64-Bit Server VM (17.0.4+8-jvmci-22.2-b06) for linux-amd64 JRE (17.0.4+8-jvmci-22.2-b06), built on Jul 20 2022 18:51:07 by "buildslave" with gcc 10.3.0

[HostName] $ echo $JAVA_HOME
/opt/java/graalvm-ce-java17-22.2.0

More details

hs_err_pid22374.log

@pminearo pminearo added the bug label Nov 19, 2022
@dougxc
Copy link
Member

dougxc commented Nov 21, 2022

You can see a bunch of OutOfMemoryErrors in the hs-err log just before the crash:

Event: 4964.473 Thread 0x00007f66400eaee0 Exception <a 'java/lang/OutOfMemoryError'{0x00000000c085a768}> (0x00000000c085a768) 
thrown [src/hotspot/share/runtime/deoptimization.cpp, line 1108]
Event: 4966.362 Thread 0x00007f663407b320 Exception <a 'java/lang/OutOfMemoryError'{0x00000000c085aa08}> (0x00000000c085aa08) 
thrown [src/hotspot/share/gc/shared/memAllocator.cpp, line 136]
Event: 4966.362 Thread 0x00007f663407b320 Exception <a 'java/lang/OutOfMemoryError'{0x00000000c085a768}> (0x00000000c085a768) 
thrown [src/hotspot/share/runtime/deoptimization.cpp, line 1108]
Event: 4970.142 Thread 0x00007f664002eeb0 Exception <a 'java/lang/OutOfMemoryError'{0x00000000c085aa08}> (0x00000000c085aa08) 
thrown [src/hotspot/share/gc/shared/memAllocator.cpp, line 136]

The very last one happens when a Truffle compiler is trying to translate a ResolvedJavaType object from libgraal into HotSpot. Due to limited HotSpot memory, this triggers an OutOfMemoryError. The VM then tries to translate this exception back into libgraal so that it can be propagated further. The VM calls HotSpotJVMCIRuntime.encodeThrowable which in turn calls TranslatedException.encodeThrowable. That sequence is supposed to be robust against all exceptions, falling back to pre-allocated (in the libgraal native image) data structures. I'm yet to figure out why this is not working.

BTW, is there any console output at the time of failure? I would expect some kind of exception stack trace.

@dougxc dougxc self-assigned this Nov 21, 2022
@dougxc
Copy link
Member

dougxc commented Nov 21, 2022

I do see one possible cause for the Linux Core Dump which is where a ResourceArea buffer is allocated to transfer an encoded exception between the 2 heaps: https://github.com/graalvm/labs-openjdk-17/blob/f6b18b596fa5acb1ab7efa10e284d106669040a6/src/hotspot/share/jvmci/jvmciEnv.cpp#L318
I don't believe that allocation is limited by MaxMetaspaceSize so it would only fail if malloc fails.

@pminearo
Copy link
Author

Here is the only Console Output I have. I do have the core.XXXX file, let me know if you want that file.

ds.out.gz

@tkrodriguez
Copy link
Member

One interesting part of the output is the OOM during deopt.

Event: 4966.362 Thread 0x00007f663407b320 Exception <a 'java/lang/OutOfMemoryError'{0x00000000c085a768}> (0x00000000c085a768) 
thrown [src/hotspot/share/runtime/deoptimization.cpp, line 1108]

this could cause some strange executions as the OOM occurs at a place which normally might not be expected to have an OOM. I don't know whether that's a factor or not.

@dougxc
Copy link
Member

dougxc commented Nov 22, 2022

I've opened https://bugs.openjdk.org/browse/JDK-8297431 to increase the robustness of this code path in the JDK.

@dougxc
Copy link
Member

dougxc commented Jan 23, 2024

This should now have been fixed by https://bugs.openjdk.org/browse/JDK-8297431.

@dougxc dougxc closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants