Skip to content

master shutdown due to no memory available #314

Closed
whimboo opened this Issue Oct 1, 2013 · 15 comments

5 participants

@whimboo
whimboo commented Oct 1, 2013

Not sure what exactly happened here but our master shutdown due to memory problems:

A fatal error has been detected by the Java Runtime Environment:

Internal Error (instanceKlass.cpp:1534), pid=3617, tid=239283008
Error: ShouldNotReachHere()

JRE version: Java(TM) SE Runtime Environment (7.0_40-b43) (build 1.7.0_40-b43)
Java VM: Java HotSpot(TM) Client VM (24.0-b56 mixed mode linux-x86 )
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/data/mozmill-ci/hs_err_pid3617.log

If you would like to submit a bug report, please visit:
http://bugreport.sun.com/bugreport/crash.jsp

./start.sh: line 25: 3617 Aborted (core dumped) java -jar -Xms2g -Xmx2g -XX:MaxPermSize=512M -Xincgc $JENKINS_WAR

@whimboo
whimboo commented Oct 1, 2013

I filed a bug against Java on bugs.sun.com. It's reference id is 2607110. Sadly it's not accessible for the public. I will inform you whenever I get updates.

@whimboo
whimboo commented Oct 21, 2013

We have had two other instances for this crash: Oct 18th and Oct 21st. So that's even not 3 days apart! This is a terrible situation at the moment, given that no-one can predict when it will happen again and what specifically is the cause of it. I will have to dig around if I can find some help. Here an excerpt of the crash:

A fatal error has been detected by the Java Runtime Environment:

Internal Error (instanceKlass.cpp:1534), pid=2378, tid=239618880
Error: ShouldNotReachHere()

JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 1.7.0_45-b18)
Java VM: Java HotSpot(TM) Client VM (24.45-b08 mixed mode linux-x86 )
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before retarting Java again

If you would like to submit a bug report, please visit:
http://bugreport.sun.com/bugreport/crash.jsp

--------------- T H R E A D ---------------

Current thread (0xb6a86c00): JavaThread "C1 CompilerThread0" daemon [_thread_in_vm, id=2386, stack(0x0e404000,0x0e485000)]

Stack: [0x0e404000,0x0e485000], sp=0x0e483c70, free space=511k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x503a66] VMError::report_and_die()+0x1a6
V [libjvm.so+0x1db48a] report_should_not_reach_here(char const, int)+0x4a
V [libjvm.so+0x2743be] instanceKlass::remove_dependent_nmethod(nmethod
)+0x3e
V [libjvm.so+0x3df0a4] nmethod::make_not_entrant_or_zombie(unsigned int)+0x394
V [libjvm.so+0x4900fa] NMethodSweeper::process_nmethod(nmethod)+0x19a
V [libjvm.so+0x49082f] NMethodSweeper::sweep_code_cache()+0x26f
V [libjvm.so+0x490ad0] NMethodSweeper::possibly_sweep()+0x70
V [libjvm.so+0x19d5cd] CompileQueue::get()+0x1d
V [libjvm.so+0x1a19cf] CompileBroker::compiler_thread_loop()+0x12f
V [libjvm.so+0x4c6eb8] compiler_thread_entry(JavaThread
, Thread)+0x18
V [libjvm.so+0x4ce9e9] JavaThread::thread_main_inner()+0x109
V [libjvm.so+0x4ceb43] JavaThread::run()+0x123
V [libjvm.so+0x401c09] java_start(Thread
)+0x119
C [libpthread.so.0+0x6d4c] start_thread+0xcc
[..]

Heap
def new generation total 59008K, used 19075K [0x148f0000, 0x188f0000, 0x188f0000)
eden space 52480K, 33% used [0x148f0000, 0x15a56050, 0x17c30000)
from space 6528K, 19% used [0x17c30000, 0x17d6ac40, 0x18290000)
to space 6528K, 0% used [0x18290000, 0x18290000, 0x188f0000)
concurrent mark-sweep generation total 2031616K, used 525341K [0x188f0000, 0x948f0000, 0x948f0000)
concurrent-mark-sweep perm gen total 88048K, used 52837K [0x948f0000, 0x99eec000, 0xb48f0000)
[..]

Internal exceptions (10 events):
Event: 255529.279 Thread 0x059ae800 Threw 0x1532ede8 at /HUDSON/workspace/7u-2-build-linux-i586/jdk7u45/229/hotspot/src/share/vm/prims/jni.cpp:717
Event: 255529.283 Thread 0x059ae800 Threw 0x154169f8 at /HUDSON/workspace/7u-2-build-linux-i586/jdk7u45/229/hotspot/src/share/vm/prims/jni.cpp:717
Event: 255529.283 Thread 0x059ae800 Threw 0x15418928 at /HUDSON/workspace/7u-2-build-linux-i586/jdk7u45/229/hotspot/src/share/vm/prims/jni.cpp:717
[..]

VM Arguments:
jvm_args: -Xms2g -Xmx2g -XX:MaxPermSize=512M -Xincgc
java_command: jenkins.war
Launcher Type: SUN_STANDARD

Environment Variables:
PATH=/data/mozmill-ci/jenkins-env/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
SHELL=/bin/bash
DISPLAY=:0.0

Signal Handlers:
SIGSEGV: [libjvm.so+0x504580], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGBUS: [libjvm.so+0x504580], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGFPE: [libjvm.so+0x3fc670], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGPIPE: [libjvm.so+0x3fc670], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGXFSZ: [libjvm.so+0x3fc670], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGILL: [libjvm.so+0x3fc670], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: [libjvm.so+0x3fdb60], sa_mask[0]=0x00000000, sa_flags=0x10000004
SIGHUP: [libjvm.so+0x3fecb0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGINT: [libjvm.so+0x3fecb0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGTERM: [libjvm.so+0x3fecb0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
SIGQUIT: [libjvm.so+0x3fecb0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

--------------- S Y S T E M ---------------

OS:wheezy/sid

uname:Linux 3.2.0-54-generic-pae #82-Ubuntu SMP Tue Sep 10 20:29:22 UTC 2013 i686
libc:glibc 2.15 NPTL 2.15
rlimit: STACK 8192k, CORE 0k, NPROC 32087, NOFILE 4096, AS infinity
load average:0.07 0.12 0.10

/proc/meminfo:
MemTotal: 4121776 kB
MemFree: 251996 kB
Buffers: 288024 kB
Cached: 991060 kB
SwapCached: 244 kB
Active: 2559656 kB
Inactive: 1053132 kB
Active(anon): 1919720 kB
Inactive(anon): 417520 kB
Active(file): 639936 kB
Inactive(file): 635612 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 3280840 kB
HighFree: 76804 kB
LowTotal: 840936 kB
LowFree: 175192 kB
SwapTotal: 1046524 kB
SwapFree: 1044868 kB
Dirty: 120 kB
Writeback: 0 kB
AnonPages: 2333632 kB
Mapped: 44912 kB
Shmem: 3440 kB
Slab: 218044 kB
SReclaimable: 204424 kB
SUnreclaim: 13620 kB
KernelStack: 4800 kB
PageTables: 9700 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3107412 kB
Committed_AS: 4022828 kB
VmallocTotal: 122880 kB
VmallocUsed: 7760 kB
VmallocChunk: 112052 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10232 kB
DirectMap2M: 903168 kB

CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 37 stepping 1, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, aes, tsc,
tscinvbit

/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 37
model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz
stepping : 1
microcode : 0x15
cpu MHz : 3465.681
cache size : 12288 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss nx rdtscp lm constant_tsc up arc
h_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb dtherm
bogomips : 6931.36
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Memory: 4k page, physical 4121776k(251996k free), swap 1046524k(1044868k free)

vm_info: Java HotSpot(TM) Client VM (24.45-b08) for linux-x86 JRE (1.7.0_45-b18), built on Oct 8 2013 05:47:36 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.
3.0-8)

time: Mon Oct 21 00:51:36 2013
elapsed time: 255529 seconds

@whimboo
whimboo commented Oct 21, 2013

@davehunt I would suggest that we enable core dumps via:

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before retarting Java again

What do you think? I might help us to get further information. It would be active with the next restart of Jenkins.

@davehunt
Mozilla member

Sure, if you think it will help. I wonder if there's a Jenkins issue on file too..?

@whimboo
whimboo commented Oct 21, 2013

I filed a Jenkins issue: https://issues.jenkins-ci.org/browse/JENKINS-20156 Lets see if someone can help us.

@whimboo
whimboo commented Oct 21, 2013

Also ulimit is already set to unlimited. It might be that there was not that much free memory available to save the core dump!

@whimboo
whimboo commented Nov 8, 2013

Crash happened again. But it doesn't looks like more than 4GB has been used by the Java process. But even then its overcommitting its memory by factor 2! We should really get the monitor plugin live on production. @davehunt would be great if you can cover issue #356, which is blocking us here.

@whimboo
whimboo commented Nov 8, 2013

Even after restarting Java the memory used looks weird and something still holds back that memory:

Mem: 8266924k total, 3965764k used, 4301160k free, 420248k buffers
Swap: 1046524k total, 0k used, 1046524k free, 2680080k cached

@whimboo
whimboo commented Nov 8, 2013

Dave pointed me to https://wiki.jenkins-ci.org/display/JENKINS/I%27m+getting+OutOfMemoryError

I will tweak the start.sh file on master for now in case we have to restart the box again today.

@whimboo whimboo added a commit to whimboo/mozmill-ci that referenced this issue Nov 8, 2013
@whimboo whimboo Enable heap dump on OOM (#314) 2632e51
@whimboo
whimboo commented Nov 11, 2013

Heap dumps on OOM PR has been merged and is now alive on staging. We will merge to production tomorrow.

@andreieftimie

We've had another crash.
The log is in /data/mozmill-ci/hs_err_pid26941.log

There is no Error: ShouldNotReachHere(). Instead we get:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb6dbe38b, pid=26941, tid=239287104

We have +HeapDumpOnOutOfMemoryError argument, but I can't find the heap dump.

@whimboo
whimboo commented Feb 1, 2014

We haven't seen this issue for more than a month. So I'm fairly sure we were able to fix it! Yay! Lets close it and if necessary remember to reopen the issue if it reappears.

@whimboo whimboo closed this Feb 1, 2014
@indrayam
indrayam commented Apr 2, 2015

We came across this error in our Jenkins instance. We were on 1.565 at the time using Java 1.7. We have upgraded since then. However, just curious. Do you know what might have solved your problem? Also, I notice that you set the -Xms and -Xmx to the same value? Is that by design?

@whimboo
whimboo commented May 4, 2015

We never figured out the real problem for that shutdown. But that was not necessary given that we never trapped into it again after I did the upgrades. Regarding the memory options we put them in a long time ago. I cannot tell out of my head why those have the same number.

@dalvizu
dalvizu commented Dec 14, 2015

Regarding the memory options we put them in a long time ago. I cannot tell out of my head why those have the same number.

This is a common practice to prevent the jvm heap from resizing itself as a performance tweak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.