Skip to content

Conversation

@jankratochvil
Copy link
Contributor

@jankratochvil jankratochvil commented Jun 22, 2024

fastdebug:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/runtime/handles.inline.hpp:77), pid=878152, tid=878158
#  assert(_thread->is_in_live_stack((address)this)) failed: not on stack?
#
# JRE version:  (24.0) (fastdebug build )
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.azul.openjdk-git, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x1d20658]  constantPoolHandle::constantPoolHandle(Thread*, ConstantPool*)+0x268

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8334763: --enable-asan: assert(_thread->is_in_live_stack((address)this)) failed: not on stack? (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19843/head:pull/19843
$ git checkout pull/19843

Update a local copy of the PR:
$ git checkout pull/19843
$ git pull https://git.openjdk.org/jdk.git pull/19843/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19843

View PR using the GUI difftool:
$ git pr show -t 19843

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19843.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 22, 2024

👋 Welcome back jkratochvil! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 22, 2024

@jankratochvil This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8334763: --enable-asan: assert(_thread->is_in_live_stack((address)this)) failed: not on stack?

Reviewed-by: kbarrett, stuefe, erikj

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 77 new commits pushed to the master branch:

  • 4e8cbf8: 8335134: Test com/sun/jdi/BreakpointOnClassPrepare.java timeout
  • 3b1ca98: 8334895: OpenJDK fails to configure on linux aarch64 when CDS is disabled after JDK-8331942
  • c35e58a: 8309634: Resolve CONSTANT_MethodRef at CDS dump time
  • 243bae7: 8304693: Remove -XX:-UseVtableBasedCHA
  • 9d986a0: 8335220: C2: Missing check for Opaque4 node in EscapeAnalysis
  • 0e6b0cb: 8334886: jdk/jfr/api/recording/time/TestTimeMultiple.java failed with RuntimeException: getStopTime() > afterStop
  • b6ffb44: 8335135: HttpURLConnection#HttpInputStream does not throw IOException when response is truncated
  • 4ab7e98: 8330842: Support AES CBC with Ciphertext Stealing (CTS) in SunPKCS11
  • 5909d54: 8326820: Metadata artificially kept alive
  • d5375c7: 8333308: javap --system handling doesn't work on internal class names
  • ... and 67 more: https://git.openjdk.org/jdk/compare/7e55ed3b106ed08956d2d38b7c99fb81704667c9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kimbarrett, @dholmes-ora, @tstuefe, @erikj79) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 22, 2024
@openjdk
Copy link

openjdk bot commented Jun 22, 2024

@jankratochvil The following label will be automatically applied to this pull request:

  • build

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the build build-dev@openjdk.org label Jun 22, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 22, 2024

Webrevs

@kimbarrett
Copy link

Note that JDK-8323732 is a duplicate of JDK-8330047 (and is now marked as such), which was fixed in JDK 23.

@jankratochvil jankratochvil changed the title 8334763: Fix --enable-asan assertion is_in_live_stack for fastdebug 8334763: --enable-asan: assert(_thread->is_in_live_stack((address)this)) failed: not on stack? Jun 23, 2024
@tstuefe
Copy link
Member

tstuefe commented Jun 23, 2024

I ran into this too (though, strangely, not for asan but for ubsan) and am happy to see this addressed. I agree with @kimbarrett though that globally disabling use after return seems too big a hammer.

@jankratochvil could you please add a bit of analysis to the JBS issue? Which other uses of is_in_live_thread are affected?

@jankratochvil jankratochvil marked this pull request as draft June 23, 2024 12:29
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 23, 2024
@jankratochvil jankratochvil marked this pull request as ready for review June 23, 2024 13:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 23, 2024
@jankratochvil
Copy link
Contributor Author

I haven't run a testsuite locally with ASAN for it and ASAN is not enabled for GHA. But java -version works.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but can someone explain to me exactly what bit of code ASAN is complaining about and why? Thanks

@jankratochvil
Copy link
Contributor Author

jankratochvil commented Jun 24, 2024

JDK is sometimes verifying StackObj is really on stack. And JDK does that by comparing pointer to these objects against bottom+top stack boundaries. The problem is that when ASAN does detect_stack_use_after_return it will allocate some autovariables (stack variables) in a separately allocated memory block, off the stack. This memory they call "fake stack". Then JDK fails its assertions StackObj is on the stack. So we can teach JDK about "fake stack" than the pointers in "fake stack" are also in fact in the stack. That's all.

@kimbarrett
Copy link

It looks like __builtin_frame_address is just incompatible with ASAN.
https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#compatibility
google/sanitizers#1688
So this seems like it is likely a problem with most/all uses of
os::current_stack_pointer.

With
ASAN_OPTIONS=detect_stack_use_after_return=0
it sometimes compares true.

With
ASAN_OPTIONS=detect_stack_use_after_return=1
it sometimes compares false.

And for both, instead it sometimes crashes with
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Segmentation fault (core dumped)

I also tried the "portable" version of of os::current_stack_pointer
referenced by that PR, and it didn't work either. Having now read
https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn
it's obvious why that wouldn't work either.

I also tried sprinkling around __attribute__((no_sanitize("address"))), and
that didn't change anything.

The "fake stack" mechanism used by the use-after-return sanitizer is just
fundamentally incompatible with any kind of stack bounds checking, and
possibly other things we do in HotSpot.

We could apply the technique from the 2nd commit (using
__asan_addr_is_in_fake_stack) to all uses of os::current_stack_pointer, but
that's not thrilling. And that's assuming there aren't other places we're
doing things with stack pointers that will be broken by the "fake stack"
mechanism.

I'm approaching the conclusion that the original proposal of globally
disabling the use-after-return sanitizer (at least for HotSpot) might indeed
be necessary.

@dholmes-ora
Copy link
Member

That's all.

That's somewhat of an understatement. :)

I'm still unclear how anything ASAN does gets examined by our assertion, but it sounds to me like ASAN is just broken in this area. So I support Kim's conclusion - lets just turn this check off.

@tstuefe
Copy link
Member

tstuefe commented Jun 24, 2024

That's all.

That's somewhat of an understatement. :)

I'm still unclear how anything ASAN does gets examined by our assertion, but it sounds to me like ASAN is just broken in this area. So I support Kim's conclusion - lets just turn this check off.

+1

 - bugreported by Thomas Stuefe
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I made a proposal for a bit clearer comment. Up to you if you take it. Thanks for fixing this.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 24, 2024
 - suggested by Thomas Stuefe
@jankratochvil
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 24, 2024
@openjdk
Copy link

openjdk bot commented Jun 24, 2024

@jankratochvil
Your change (at version effd880) is now ready to be sponsored by a Committer.

@kimbarrett
Copy link

/integrate

In general, hotspot changes require two reviewers (one being an Reviewer).

/reviewers 2

@openjdk
Copy link

openjdk bot commented Jun 24, 2024

@kimbarrett
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated labels Jun 24, 2024
fi
if test "x$TOOLCHAIN_TYPE" = "xclang"; then
ASAN_CFLAGS="$ASAN_CFLAGS -fsanitize-address-use-after-return=never"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is JDK-wide configuration. Do we need that? Or would it be sufficient to limit this to the JVM?
I'm not sure what would happen with fake stacks at the JVM boundary (in either direction). I also don't
know what happens at the boundary with non-JDK native code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more, global disable seems like the only safe thing to do. So never mind my musings
about whether the disable could be limited to the JVM.

fi
if test "x$TOOLCHAIN_TYPE" = "xclang"; then
ASAN_CFLAGS="$ASAN_CFLAGS -fsanitize-address-use-after-return=never"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no change being made for the microsoft toolchain. It seems like the same issues with the fake stack
should arise there.

Copy link
Contributor Author

@jankratochvil jankratochvil Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have tried it in AWS and added the new comment about it as MSVC has it off by default.
Microsoft documentation even implies it is off by default.
But I have filed JDK-8335228 as the MS-Windows OpenJDK build crashes if one explicitly enables -fsanitize-address-use-after-return there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll have to monitor this in future VS updates, which might start enabling it. But this seems okay for now.

Comment on lines +441 to +443
# detect_stack_use_after_return causes ASAN to offload stack-local
# variables to c-heap and therefore breaks assumptions in hotspot
# that rely on data (e.g. Marks) living in thread stacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Now I understand what it is doing. Ouch!

@openjdk openjdk bot added sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated labels Jun 26, 2024
@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Jun 27, 2024
Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

fi
if test "x$TOOLCHAIN_TYPE" = "xclang"; then
ASAN_CFLAGS="$ASAN_CFLAGS -fsanitize-address-use-after-return=never"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll have to monitor this in future VS updates, which might start enabling it. But this seems okay for now.

@kimbarrett
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 27, 2024

@kimbarrett The PR has been updated since the change author (@jankratochvil) issued the integrate command - the author must perform this command again.

@jankratochvil
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 27, 2024
@openjdk
Copy link

openjdk bot commented Jun 27, 2024

@jankratochvil
Your change (at version 08153d9) is now ready to be sponsored by a Committer.

@kimbarrett
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 28, 2024

Going to push as commit b4df380.
Since your change was applied there have been 78 commits pushed to the master branch:

  • cd46c87: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path
  • 4e8cbf8: 8335134: Test com/sun/jdi/BreakpointOnClassPrepare.java timeout
  • 3b1ca98: 8334895: OpenJDK fails to configure on linux aarch64 when CDS is disabled after JDK-8331942
  • c35e58a: 8309634: Resolve CONSTANT_MethodRef at CDS dump time
  • 243bae7: 8304693: Remove -XX:-UseVtableBasedCHA
  • 9d986a0: 8335220: C2: Missing check for Opaque4 node in EscapeAnalysis
  • 0e6b0cb: 8334886: jdk/jfr/api/recording/time/TestTimeMultiple.java failed with RuntimeException: getStopTime() > afterStop
  • b6ffb44: 8335135: HttpURLConnection#HttpInputStream does not throw IOException when response is truncated
  • 4ab7e98: 8330842: Support AES CBC with Ciphertext Stealing (CTS) in SunPKCS11
  • 5909d54: 8326820: Metadata artificially kept alive
  • ... and 68 more: https://git.openjdk.org/jdk/compare/7e55ed3b106ed08956d2d38b7c99fb81704667c9...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 28, 2024
@openjdk openjdk bot closed this Jun 28, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 28, 2024
@openjdk
Copy link

openjdk bot commented Jun 28, 2024

@kimbarrett @jankratochvil Pushed as commit b4df380.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build build-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants