Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On JDK8, alloc profiling may cause heap corrupt and crash #694

Open
yanglong1010 opened this issue Dec 9, 2022 · 11 comments
Open

On JDK8, alloc profiling may cause heap corrupt and crash #694

yanglong1010 opened this issue Dec 9, 2022 · 11 comments

Comments

@yanglong1010
Copy link
Contributor

yanglong1010 commented Dec 9, 2022

hi, Andrei

One of our applications has been running stably for more than 3 years. Recently, we used async-profiler to continuously trace the cpu and alloc of the application, and found that it would crash every once in a while (it may be 1 hour after startup, or it may be 1 day later, the time is not very sure). If turn off alloc and only turn on cpu tracing, every thing goes well, never crash again.

java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

OracleJDK hs_err_pid15474.log
OpenJDK hs_err_pid4529.log

OpenJDK hs_err_pid4529.log
RDI=0x00000000877bbf60 is pointing into object: 0x00000000877bbf58
java.util.concurrent.ConcurrentHashMap$Node

  • klass: 'java/util/concurrent/ConcurrentHashMap$Node'
#0  0x00007f8c550e81d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f8c550e98c8 in __GI_abort () at abort.c:90
#2  0x00007f8c549fe109 in os::abort (dump_core=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:1565
#3  0x00007f8c54b97186 in VMError::report_and_die (this=this@entry=0x7f8c52f22920) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/utilities/vmError.cpp:1107
#4  0x00007f8c54a07bbf in JVM_handle_linux_signal (sig=11, info=0x7f8c52f22bb0, ucVoid=0x7f8c52f22a80, abort_if_unrecognized=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541
#5  0x00007f8c549fb5a8 in signalHandler (sig=11, info=0x7f8c52f22bb0, uc=0x7f8c52f22a80) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:4538
#6  <signal handler called>
#7  0x00007f8c547a1fe1 in oopDesc::size_given_klass (this=0x877bbf60, klass=0x474ae0140) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/oop.inline.hpp:405
#8  0x00007f8c54796a16 in do_oop_work<unsigned int> (gc_barrier=true, root_scan=false, p=0x9593df14, this=0x7f8b380938f0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parOopClosures.inline.hpp:118
#9  do_oop_nv (p=0x9593df14, this=0x7f8b380938f0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parOopClosures.inline.hpp:139
#10 InstanceKlass::oop_oop_iterate_nv (this=0x100001610, obj=0x9593df08, closure=0x7f8b380938f0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/instanceKlass.cpp:2352
#11 0x00007f8c54a1e75e in oop_iterate (blk=0x7f8b380938f0, this=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/oop.inline.hpp:734
#12 ParScanThreadState::trim_queues (this=0x7f8b380937f0, max_size=20) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parNewGeneration.cpp:175
#13 0x00007f8c54a215e7 in do_oop_work<unsigned int> (root_scan=true, gc_barrier=true, p=0x89890404, this=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parOopClosures.inline.hpp:125
#14 ParRootScanWithBarrierTwoGensClosure::do_oop (this=0x7f8b380939a0, p=0x89890404) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parNewGeneration.cpp:506
#15 0x00007f8c549da0e9 in do_oop_work<unsigned int> (this=<optimized out>, this=<optimized out>, p=0x89890404) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/genOopClosures.hpp:165
#16 do_oop_nv (p=0x89890404, this=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/genOopClosures.hpp:176
#17 ObjArrayKlass::oop_oop_iterate_nv_m (this=<optimized out>, obj=0x8988a508, closure=0x7f8c52f23970, mr=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/objArrayKlass.cpp:557
#18 0x00007f8c545f827b in oop_iterate (mr=..., blk=0x7f8c52f23970, this=0x8988a508) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/oop.inline.hpp:734
#19 FreeListSpace_DCTOC::walk_mem_region_with_cl_par (this=0x7f8b280c1190, mr=..., bottom=0x8988a508, top=0x89890800, cl=0x7f8c52f23970) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp:751
#20 0x00007f8c545f919d in FreeListSpace_DCTOC::walk_mem_region_with_cl (this=<optimized out>, mr=..., bottom=<optimized out>, top=<optimized out>, cl=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp:751
#21 0x00007f8c54ac2798 in Filtering_DCTOC::walk_mem_region (this=<optimized out>, mr=..., bottom=<optimized out>, top=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/space.cpp:226
#22 0x00007f8c54ac21dc in DirtyCardToOopClosure::do_MemRegion (this=0x7f8b280c1190, mr=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/space.cpp:171
#23 0x00007f8c5457da06 in ClearNoncleanCardWrapper::do_MemRegion (this=this@entry=0x7f8c52f23b10, mr=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/cardTableRS.cpp:212
#24 0x00007f8c54a18fa7 in CardTableModRefBS::process_stride (this=this@entry=0x7f8c4c0202d0, sp=sp@entry=0x7f8c4c022c90, used=..., stride=<optimized out>, n_strides=n_strides@entry=4, cl=cl@entry=0x7f8b380939a0, ct=ct@entry=0x7f8c4c020290, lowest_non_clean=0x7f8c3c14b560,
    lowest_non_clean_base_chunk_index=547815073504, lowest_non_clean_chunk_size=10697) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp:159
#25 0x00007f8c54a1952b in CardTableModRefBS::non_clean_card_iterate_parallel_work (this=0x7f8c4c0202d0, sp=sp@entry=0x7f8c4c022c90, mr=..., cl=cl@entry=0x7f8b380939a0, ct=ct@entry=0x7f8c4c020290, n_threads=<optimized out>)
    at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp:74
#26 0x00007f8c5457d597 in CardTableModRefBS::non_clean_card_iterate_possibly_parallel (this=<optimized out>, sp=sp@entry=0x7f8c4c022c90, mr=..., cl=cl@entry=0x7f8b380939a0, ct=ct@entry=0x7f8c4c020290) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/cardTableModRefBS.cpp:485
#27 0x00007f8c5457d809 in CardTableRS::younger_refs_in_space_iterate (this=0x7f8c4c020290, sp=0x7f8c4c022c90, cl=0x7f8b380939a0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/cardTableRS.cpp:311
#28 0x00007f8c54634b5e in ConcurrentMarkSweepGeneration::younger_refs_iterate (this=<optimized out>, cl=0x7f8b380939a0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp:3199
#29 0x00007f8c5473e928 in GenCollectedHeap::gen_process_roots (this=this@entry=0x7f8c4c019c00, level=<optimized out>, younger_gens_as_roots=younger_gens_as_roots@entry=true, activate_scope=activate_scope@entry=false, so=so@entry=GenCollectedHeap::SO_ScavengeCodeCache,
    only_strong_roots=only_strong_roots@entry=false, not_older_gens=not_older_gens@entry=0x7f8b38093948, older_gens=older_gens@entry=0x7f8b380939a0, cld_closure=cld_closure@entry=0x7f8c52f23df0) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/memory/genCollectedHeap.cpp:740
#30 0x00007f8c54a1b88d in ParNewGenTask::work (this=0x7f8c5012a6b0, worker_id=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_implementation/parNew/parNewGeneration.cpp:629
#31 0x00007f8c54bad1ea in GangWorker::loop (this=0x7f8c4c01e800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/utilities/workgroup.cpp:329
#32 0x00007f8c549fd222 in java_start (thread=0x7f8c4c01e800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:840
#33 0x00007f8c55896dc5 in start_thread (arg=0x7f8c52f24700) at pthread_create.c:308
#34 0x00007f8c551aa76d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

(gdb) x/16wx 0x9593df08 (an oop)
0x9593df08: 0x00000007 0x00000000 0x200002c2 0x877bbf60
0x9593df18: 0x00000000 0x00000000 0x0000000d 0x00000000
0x9593df28: 0x2000ac29 0x00000000 0xe2a6fb68 0x00000184
0x9593df38: 0x0000000d 0x00000000 0x200002c2 0x935a3288

(gdb) x/16wx (0x200002c2l << 3) java.lang.String
0x100001610: 0x55037430 0x00007f8c 0x00000018 0x00000030 (address of char[])
0x100001620: 0x55ae4100 0x00007f8c 0x00001260 0x00000001
0x100001630: 0x398020a0 0x00007f8c 0x00000ea8 0x00000001
0x100001640: 0x00001610 0x00000001 0x00000000 0x00000000

(gdb) x/64bc 0x00007f8c55ae4100
0x7f8c55ae4100: 16 '\020' 0 '\000' -1 '\377' -1 '\377' -17 '\357' 122 'z' 117 'u' 48 '0'
0x7f8c55ae4108: 106 'j' 97 'a' 118 'v' 97 'a' 47 '/' 108 'l' 97 'a' 110 'n'
0x7f8c55ae4110: 103 'g' 47 '/' 83 'S' 116 't' 114 'r' 105 'i' 110 'n' 103 'g'
0x7f8c55ae4118: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000'
0x7f8c55ae4120: 16 '\020' 0 '\000' -1 '\377' -1 '\377' -61 '\303' -7 '\371' -39 '\331' 112 'p'
0x7f8c55ae4128: 106 'j' 97 'a' 118 'v' 97 'a' 47 '/' 108 'l' 97 'a' 110 'n'
0x7f8c55ae4130: 103 'g' 47 '/' 84 'T' 104 'h' 114 'r' 101 'e' 97 'a' 100 'd'
0x7f8c55ae4138: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000'

(gdb) x/32wx 0x877bbf30 (0x877bbf60 should point to char[], but actually points into oop, which means the heap is corrupted)
0x877bbf30: 0x8d846ee0 0x8d846ee0 0x8bb317a0 0x8d326f68
0x877bbf40: 0x8d326f80 0x89c27740 0x89c27740 0x899ff620
0x877bbf50: 0x899ff620 0x00000000 0x88716033 0x00000000
0x877bbf60: 0x2000674c 0x4d15b72b 0x8e95c028 0x877bbf10
0x877bbf70: 0x00000000 0x00000000 0x00000005 0x00000000
0x877bbf80: 0x2012ec84 0x00000080 0x00000080 0x00000080
0x877bbf90: 0x00000080 0x93806ff0 0x00000000 0x93941db8
0x877bbfa0: 0xb3fe8b08 0x00000000 0x88716243 0x00000000

In order to further locate the reason, D-D-H and I add some debug code to OpenJDK. If in send_allocation_outside_tlab_event or send_allocation_in_new_tlab_event, and a safepoint occurs, triggering core dump.
Finally we found that JvmtiEnv::GetStackTrace may enter safepoint, the memory corresponding to the tlab address allocated by the current thread may be allocated to other threads after GC, resulting in Heap corruption.

https://github.com/openjdk/jdk8u.git
commit 04a31b454cd853fb88aafffd411dd113e3f4045f (tag: jdk8u202-b08)

pid31086.bt.log

Thread 264 (Thread 0x7f7d36d75700 (LWP 31772)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f7f631eef8b in os::PlatformEvent::park (this=this@entry=0x7f7d40014c00) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/os/linux/vm/os_linux.cpp:5980
#2  0x00007f7f631a8fa8 in ParkCommon (timo=0, ev=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:424
#3  Monitor::ILock (this=0x7f7f5c009530, Self=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:491
#4  0x00007f7f631a98f6 in lock_without_safepoint_check (Self=0x7f7d401a9800, this=0x7f7f5c009530) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:959
#5  Monitor::lock_without_safepoint_check (this=0x7f7f5c009530) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:965
#6  0x00007f7f63283e46 in SafepointSynchronize::block (thread=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/safepoint.cpp:708
#7  0x00007f7f631a97db in transition_and_fence (to=_thread_in_vm, from=_thread_blocked, thread=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/interfaceSupport.hpp:184
#8  trans_and_fence (to=_thread_in_vm, from=_thread_blocked, this=<synthetic pointer>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/interfaceSupport.hpp:232
#9  ~ThreadBlockInVM (this=<synthetic pointer>, __in_chrg=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/interfaceSupport.hpp:314
#10 Monitor::lock (this=this@entry=0x7f7f5c008c30, Self=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:940
#11 0x00007f7f631a98ab in Monitor::lock (this=this@entry=0x7f7f5c008c30) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutex.cpp:949
#12 0x00007f7f62f87585 in MutexLocker (mutex=0x7f7f5c008c30, this=<synthetic pointer>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/runtime/mutexLocker.hpp:185
#13 InstanceKlass::get_jmethod_id (ik_h=..., method_h=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/instanceKlass.cpp:1778
#14 0x00007f7f63075b9c in jmethod_id (this=0x7f7eb4ebfa88) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/method.hpp:776
#15 JvmtiEnvBase::get_stack_trace (this=<optimized out>, java_thread=java_thread@entry=0x7f7d401a9800, start_depth=<optimized out>, max_count=max_count@entry=2048, frame_buffer=frame_buffer@entry=0x7f7ef404e930, count_ptr=count_ptr@entry=0x7f7d36d72730, this=<optimized out>) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/prims/jvmtiEnvBase.cpp:874
#16 0x00007f7f6306be14 in JvmtiEnv::GetStackTrace (this=0x0, java_thread=0x7f7d401a9800, start_depth=0, max_frame_count=2048, frame_buffer=0x7f7ef404e930, count_ptr=0x7f7d36d72730) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/prims/jvmtiEnv.cpp:1303
#17 0x00007f7f1a796c5b in Profiler::recordSample(void*, unsigned long long, int, Event*) () from /tmp/libasyncProfiler7868072734751652305.so
#18 0x00007f7f1a797b4b in AllocTracer::recordAllocation(void*, int, unsigned long, unsigned long, unsigned long) () from /tmp/libasyncProfiler7868072734751652305.so
#19 <signal handler called>
#20 CollectedHeap::allocate_from_tlab_slow (klass=..., klass@entry=..., thread=thread@entry=0x7f7d401a9800, size=size@entry=4) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_interface/collectedHeap.cpp:300
#21 0x00007f7f62f83a9e in allocate_from_tlab (size=4, thread=0x7f7d401a9800, klass=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:236
#22 common_mem_allocate_noinit (__the_thread__=0x7f7d401a9800, size=4, klass=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:131
#23 common_mem_allocate_init (__the_thread__=0x7f7d401a9800, size=4, klass=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:223
#24 obj_allocate (__the_thread__=0x7f7d401a9800, size=4, klass=...) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp:251
#25 InstanceKlass::allocate_instance (this=this@entry=0x100033a60, __the_thread__=__the_thread__@entry=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/oops/instanceKlass.cpp:1150
#26 0x00007f7f63280575 in OptoRuntime::new_instance_C (klass=0x100033a60, thread=0x7f7d401a9800) at /home/yibo.yl/openjdk/jdk8u/hotspot/src/share/vm/opto/runtime.cpp:245

A possible fix is to use getJavaTraceAsync.

@apangin
Copy link
Collaborator

apangin commented Dec 9, 2022

Thank you for the thorough analysis. Unfortunately, getJavaTraceAsync is not a fix - it produces completely wrong stack traces when called at this point. I have an idea how to fix this with an external stak walker, but this it not simple at all.

@zdyj3170101136
Copy link

感谢您的全面分析。不幸的是,getJavaTraceAsync这不是一个解决方案 - 此时调用时它会产生完全错误的堆栈跟踪。我有一个想法如何用外部 stak walker 来解决这个问题,但这一点也不简单。

is this fixed in async profiler for now ?

@zdyj3170101136
Copy link

@yanglong1010
do you fixed this problem?

@yanglong1010
Copy link
Contributor Author

@yanglong1010 do you fixed this problem?

you might try --cstack vm

@zdyj3170101136
Copy link

@yanglong1010你解决这个问题了吗?

你可以尝试--cstack vm

build failed:

https://github.com/async-profiler/async-profiler/issues/847

@zdyj3170101136
Copy link

@yanglong1010你解决这个问题了吗?

你可以尝试--cstack vm

this is an experimental option, could not use in prod.

so seems for now i should disable allocs in prod.

[INFO] cstack=vm is an experimental option, use with care

@yanglong1010 yanglong1010 changed the title On JDK8, alloc tracing may cause heap corrupt and crash On JDK8, alloc profiling may cause heap corrupt and crash Nov 28, 2023
@zdyj3170101136
Copy link

how do you fixed this in java 8?

or just disable alloc profiling in java8?

@zdyj3170101136
Copy link

is this just happen in java8?

@zdyj3170101136
Copy link

@yanglong1010

@yanglong1010
Copy link
Contributor Author

@zdyj3170101136 Let me take some time to investigate and maybe I can answer you next week.

@yanglong1010
Copy link
Contributor Author

Hi, 杨杰

Denghui Dong and I checked the code of jdk11u, confirm async-profiler's alloc profiling on Java 11 and above is safe.

On Java 8, async-profiler works by modifying CPU instructions of JDK internal allocation related functions, but this approach may cause a block of memory to be allocated by two threads (steps: Thread A allocated a block of memory, during execution of async-profiler/profiler.cpp:getJavaTraceInternal) Garbage Collector reclaimed the memory, Thread B allocated the same block of memory. Thread A and Thread B both write to the same memory, then heap is corrupted).

On Java 11, async-profiler works by using standard JVMTI, there is an internal mechanism to ensure that the corresponding memory will not be reclaimed by GC and will not be reallocated to other threads.

For Java 8, our internal version disabled alloc profiling. I noticed JDK-8173361 fixed the code. It seems that the heap corruption caused on versions before 8u352 will not happen on Java 8u352 and above, but I have not actually verified it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants