Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV on Java 21 / aarch64 #137

Open
Pluies opened this issue Jan 8, 2024 · 5 comments
Open

SIGSEGV on Java 21 / aarch64 #137

Pluies opened this issue Jan 8, 2024 · 5 comments

Comments

@Pluies
Copy link

Pluies commented Jan 8, 2024

Hey folks!

We've been running Pyroscope v0.12.2 within Trino, and after a recent upgrade to JVM 21 we started getting SIGSEGV errors.

Errors look like:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffffa989b478, pid=1, tid=3987
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.1+12 (21.0.1+12) (build 21.0.1+12-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (21.0.1+12-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x6ce478]  frame::sender_for_entry_frame(RegisterMap*) const+0x128
#
# Core dump will be written. Default location: /data/trino/core.1
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

---------------  S U M M A R Y ------------

Command Line: -Xmx104857M -javaagent:/app/libs/jmx_prometheus_javaagent-0.20.0.jar=9000:/var/lib/trino/prometheus-exporter/prometheus-exporter-config.yaml -Xlog:gc* -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+ExitOnOutOfMemoryError -Djdk.attach.allowAttachSelf=true -XX:ReservedCodeCacheSize=512M -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -Djdk.nio.maxCachedBufferSize=2000000 -XX:+UnlockDiagnosticVMOptions -XX:+UseAESCTRIntrinsics -XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=100 -XX:+UseTransparentHugePages -XX:G1ReservePercent=35 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/heapdumps/ -XX:ErrorFile=/heapdumps/hs_err.logpaid-20u.hprof -Dnode.id=paid-20u-kynty-1-coordinator-0-1704338288 -Dnode.environment=production -Dnode.data-dir=/data/trino -Dplugin.dir=/usr/lib/trino/plugin -Dlog.levels-file=/etc/trino/..2024_01_04_03_17_36.2507751579/log.properties -Dconfig=/etc/trino/..2024_01_04_03_17_36.2507751579/config.properties io.trino.server.TrinoServer

Host: AArch64, 32 cores, 123G, Red Hat Enterprise Linux release 9.3 (Plow)
Time: Thu Jan  4 03:43:33 2024 UTC elapsed time: 1524.811066 seconds (0d 0h 25m 24s)

---------------  T H R E A D  ---------------

Current thread (0x0000ffe36c9970e0):  JavaThread "ContinuousTaskStatusFetcher-20240104_034212_00312_yvutb.87.11.0-3205" daemon [_thread_in_Java, id=3987, stack(0x0000ffe11a65a000,0x0000ffe11a858000) (2040K)]

Stack: [0x0000ffe11a65a000,0x0000ffe11a858000],  sp=0x0000ffe11a852f50,  free space=2019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x6ce478]  frame::sender_for_entry_frame(RegisterMap*) const+0x128
V  [libjvm.so+0x6c96e8]  vframeStreamForte::forte_next()+0x2f8
V  [libjvm.so+0x6c9d4c]  forte_fill_call_trace_given_top(JavaThread*, ASGCT_CallTrace*, int, frame)+0x25c
V  [libjvm.so+0x6ca450]  AsyncGetCallTrace+0x210
C  [libasyncProfiler-linux-arm64-86b5f622ede6435644a9c1857582e54b4d2f2e55.so+0x1eb08]  Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int, StackContext*) [clone .isra.675]+0x75c
C  [libasyncProfiler-linux-arm64-86b5f622ede6435644a9c1857582e54b4d2f2e55.so+0x1ee24]  Profiler::recordSample(void*, unsigned long long, int, Event*)+0x2dc
C  [libasyncProfiler-linux-arm64-86b5f622ede6435644a9c1857582e54b4d2f2e55.so+0x1feb0]  ITimer::signalHandler(int, siginfo_t*, void*)+0x4c
C  [linux-vdso.so.1+0x83c]  __kernel_rt_sigreturn+0x0

(Happy to share the whole hs_err.log if needed 👍 )

I don't know if there's a way to consistently reproduce the issue; it happens randomly once every day or so while running 100 nodes.

I'm thinking this is pyroscope-related as:

  • The native frames mention async-profiler
  • Disabling pyroscope fixes the issue 😅

It may be related to commit async-profiler/async-profiler@d3dde7e in async-profiler to support Java 21, that hasn't yet been ported to Grafana's fork?

@korniltsev
Copy link
Collaborator

It may be related to commit async-profiler/async-profiler@d3dde7e in async-profiler to support Java 21, that hasn't yet been ported to Grafana's fork?

Not only it has not been ported, but it also has not been included in any stable release

@korniltsev
Copy link
Collaborator

I think we may consider releasing a SNAPSHOT version of pyroscope-java with async-profiler build from master branch

@Pluies
Copy link
Author

Pluies commented Jan 24, 2024

@korniltsev async-profiler just released v3.0 that includes this bugfix 🥳

@korniltsev
Copy link
Collaborator

yep, preparing new pyroscope-java release

@gabrieldimech
Copy link

Hi, we are also experiencing a similar issue with agent v 0.14.0 with the following info:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f71e00e0e81, pid=1, tid=8
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.2+13 (21.0.2+13) (build 21.0.2+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (21.0.2+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xe01e81] ShenandoahConcUpdateRefsClosure::do_oop(oopDesc**)+0x21
#
# Core dump will be written. Default location: /data/core.1
#
# An error report file with more information is saved as: /app/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

Note that we also use the Datadog agent however this issue never occured until introducing the pyroscope agent:

java.opts: "-javaagent:/app/datadog/dd-java-agent.jar -Xms{{xms_java_trader}}G -Xmx{{xmx_java_trader}}G -XX:+UseShenandoahGC -javaagent:/app/datadog/pyroscope-agent.jar -Dio.opentelemetry.javaagent.slf4j.simpleLogger.defaultLogLevel=off -XX:+UseStringDeduplication

any ideas would be much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants