-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around JVM Bug in LongGCDisruptionTests #50731
Work around JVM Bug in LongGCDisruptionTests #50731
Conversation
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes elastic#50047
Pinging @elastic/es-distributed (:Distributed/Distributed) |
thread.suspend(); | ||
if (System.nanoTime() - startTime > TimeUnit.SECONDS.toNanos(3L)) { | ||
sawSlowSuspendBug.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this conditional on the JVM version so that we are sure to keep tracking that JVM bug and to drop this fix when it's no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we could, but right now we're seeing this failure on all versions. This is a little unexpected on 8
so I figured we'd apply the fix for 8 as well to identify whether it's the same issue (slow suspend
) there or if it's blocked somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Yannick! |
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes elastic#50047
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes elastic#50047
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes #50047
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes #50047
There is a JVM bug causing `Thread#suspend` calls to randomly take multiple seconds breaking these tests that call the method numerous times in a loop. Increasing the timeout would will not work since we may call `suspend` tens if not hundreds of times and even a small number of them experiencing the blocking will lead to multiple minutes of waiting. This PR detects the specific issue by timing the `Thread#suspend` calls and skips the remainder of the test if it timed out because of the JVM bug. Closes elastic#50047
There is a JVM bug causing
Thread#suspend
calls to randomly takemultiple seconds breaking these tests that call the method numerous times
in a loop. Increasing the timeout would will not work since we may call
suspend
tens if not hundreds of times and even a small number of themexperiencing the blocking will lead to multiple minutes of waiting.
This PR detects the specific issue by timing the
Thread#suspend
calls andskips the remainder of the test if it timed out because of the JVM bug.
Closes #50047