New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On JDK8, alloc profiling may cause heap corrupt and crash #694
Comments
Thank you for the thorough analysis. Unfortunately, |
is this fixed in async profiler for now ? |
@yanglong1010 |
you might try --cstack vm |
build failed:
|
this is an experimental option, could not use in prod. so seems for now i should disable allocs in prod.
|
how do you fixed this in java 8? or just disable alloc profiling in java8? |
is this just happen in java8? |
@zdyj3170101136 Let me take some time to investigate and maybe I can answer you next week. |
Hi, 杨杰 Denghui Dong and I checked the code of On On Java 11, async-profiler works by using standard JVMTI, there is an internal mechanism to ensure that the corresponding memory will not be reclaimed by GC and will not be reallocated to other threads. For Java 8, our internal version disabled alloc profiling. I noticed JDK-8173361 fixed the code. It seems that the heap corruption caused on versions before 8u352 will not happen on Java 8u352 and above, but I have not actually verified it. |
hi, Andrei
One of our applications has been running stably for more than 3 years. Recently, we used async-profiler to continuously trace the cpu and alloc of the application, and found that it would crash every once in a while (it may be 1 hour after startup, or it may be 1 day later, the time is not very sure). If turn off alloc and only turn on cpu tracing, every thing goes well, never crash again.
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
OracleJDK hs_err_pid15474.log
OpenJDK hs_err_pid4529.log
OpenJDK hs_err_pid4529.log
RDI=0x00000000877bbf60 is pointing into object: 0x00000000877bbf58
java.util.concurrent.ConcurrentHashMap$Node
(gdb) x/16wx 0x9593df08 (an oop)
0x9593df08: 0x00000007 0x00000000 0x200002c2 0x877bbf60
0x9593df18: 0x00000000 0x00000000 0x0000000d 0x00000000
0x9593df28: 0x2000ac29 0x00000000 0xe2a6fb68 0x00000184
0x9593df38: 0x0000000d 0x00000000 0x200002c2 0x935a3288
(gdb) x/16wx (0x200002c2l << 3) java.lang.String
0x100001610: 0x55037430 0x00007f8c 0x00000018 0x00000030 (address of char[])
0x100001620: 0x55ae4100 0x00007f8c 0x00001260 0x00000001
0x100001630: 0x398020a0 0x00007f8c 0x00000ea8 0x00000001
0x100001640: 0x00001610 0x00000001 0x00000000 0x00000000
(gdb) x/64bc 0x00007f8c55ae4100
0x7f8c55ae4100: 16 '\020' 0 '\000' -1 '\377' -1 '\377' -17 '\357' 122 'z' 117 'u' 48 '0'
0x7f8c55ae4108: 106 'j' 97 'a' 118 'v' 97 'a' 47 '/' 108 'l' 97 'a' 110 'n'
0x7f8c55ae4110: 103 'g' 47 '/' 83 'S' 116 't' 114 'r' 105 'i' 110 'n' 103 'g'
0x7f8c55ae4118: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000'
0x7f8c55ae4120: 16 '\020' 0 '\000' -1 '\377' -1 '\377' -61 '\303' -7 '\371' -39 '\331' 112 'p'
0x7f8c55ae4128: 106 'j' 97 'a' 118 'v' 97 'a' 47 '/' 108 'l' 97 'a' 110 'n'
0x7f8c55ae4130: 103 'g' 47 '/' 84 'T' 104 'h' 114 'r' 101 'e' 97 'a' 100 'd'
0x7f8c55ae4138: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000'
(gdb) x/32wx 0x877bbf30 (0x877bbf60 should point to char[], but actually points into oop, which means the heap is corrupted)
0x877bbf30: 0x8d846ee0 0x8d846ee0 0x8bb317a0 0x8d326f68
0x877bbf40: 0x8d326f80 0x89c27740 0x89c27740 0x899ff620
0x877bbf50: 0x899ff620 0x00000000 0x88716033 0x00000000
0x877bbf60: 0x2000674c 0x4d15b72b 0x8e95c028 0x877bbf10
0x877bbf70: 0x00000000 0x00000000 0x00000005 0x00000000
0x877bbf80: 0x2012ec84 0x00000080 0x00000080 0x00000080
0x877bbf90: 0x00000080 0x93806ff0 0x00000000 0x93941db8
0x877bbfa0: 0xb3fe8b08 0x00000000 0x88716243 0x00000000
In order to further locate the reason, D-D-H and I add some debug code to OpenJDK. If in send_allocation_outside_tlab_event or send_allocation_in_new_tlab_event, and a safepoint occurs, triggering core dump.
Finally we found that JvmtiEnv::GetStackTrace may enter safepoint, the memory corresponding to the tlab address allocated by the current thread may be allocated to other threads after GC, resulting in Heap corruption.
https://github.com/openjdk/jdk8u.git
commit 04a31b454cd853fb88aafffd411dd113e3f4045f (tag: jdk8u202-b08)
pid31086.bt.log
A possible fix is to use getJavaTraceAsync.
The text was updated successfully, but these errors were encountered: