Skip to content

Conversation

@snazarkin
Copy link
Contributor

@snazarkin snazarkin commented Apr 12, 2024

An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the si_addr belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed. Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with asgct_bottom and async-profiler.

Additional testing:

  • MacOS AArch64 server fastdebug gtets
  • MacOS AArch64 server fastdebug jtreg:hotspot:tier4
  • Benchmarking

@apangin and @parttimenerd could you please check the patch on your scenarios??


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8330171: Lazy W^X switch implementation (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18762/head:pull/18762
$ git checkout pull/18762

Update a local copy of the PR:
$ git checkout pull/18762
$ git pull https://git.openjdk.org/jdk.git pull/18762/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18762

View PR using the GUI difftool:
$ git pr show -t 18762

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18762.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 12, 2024

👋 Welcome back snazarki! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 12, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 12, 2024
@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@snazarkin The following labels will be automatically applied to this pull request:

  • graal
  • hotspot
  • serviceability
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Apr 12, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 12, 2024

Webrevs

@VladimirKempik
Copy link

Hello Sergey.
W^X mode was initially forced by Apple to prevent writing to executable memory, as a security feature.
This change just eliminates this security feature at all, doesn't it ?
Basically: "want to write to Executable memory ? ok, here you go"

@snazarkin snazarkin changed the title 8330171: Lazy W^X swtich implementation 8330171: Lazy W^X switch implementation Apr 12, 2024
@theRealAph
Copy link
Contributor

Hello Sergey. W^X mode was initially forced by Apple to prevent writing to executable memory, as a security feature. This change just eliminates this security feature at all, doesn't it ? Basically: "want to write to Executable memory ? ok, here you go"

Yes @VladimirKempik, you are right. No, we should not do this.

Instead, when we enter the VM we could track the current state of W^X and whenever we enter a block that needs to write into code space we would set W if needed. When we leave the VM or when we call back into Java we would set X, if needed. The cost of doing this would be small, but we'd have to find all the blocks that need to write into code space. This might be more effort than we want to make, though.

So where would be need to make the transitions to W? At a guess, whenever we start assembling something, and in all of the methods in nativeInst_aarch64.?pp, and in class Patcher. And to X, in the call stub and a few other places.

That would minimize the transitions exactly to the set of places we actually need.

@VladimirKempik
Copy link

VladimirKempik commented Apr 12, 2024

This ( changes of this PR) can be left as an addition to existing mechanism. Disabled by default and can be enabled with a special (DEVELOP) option. So this can't be enabled on production builds, but can be useful to debug w^x issues on debug builds ( with additional logging)

@snazarkin
Copy link
Contributor Author

snazarkin commented Apr 12, 2024

Thanks @theRealAph, @VladimirKempik

Instead, when we enter the VM we could track the current state of W^X and whenever we enter a block that needs to write into code space we would set W if needed. When we leave the VM or when we call back into Java we would set X, if needed. The cost of doing this would be small, but we'd have to find all the blocks that need to write into code space. This might be more effort than we want to make, though.

It is the way in which it is implemented in the current code. Unfortunately, it is hardly maintainable solution that suffers from issues like [1-5].

As I understand it, your concern is that the implementation doesn't prevent rogue from writing to the code cache with some unsafe api?

  1. https://bugs.openjdk.org/browse/JDK-8302736
  2. https://bugs.openjdk.org/browse/JDK-8327990
  3. https://bugs.openjdk.org/browse/JDK-8327036
  4. https://bugs.openjdk.org/browse/JDK-8304725
  5. https://bugs.openjdk.org/browse/JDK-8307549

@mlbridge
Copy link

mlbridge bot commented Apr 13, 2024

Mailing list message from Andrew Haley on hotspot-dev:

On 4/12/24 18:18, Sergey Nazarkin wrote:

?It is the way in which it is implemented in the current code.

No, it's not. That's not what we do at all.

We don't set W^X when we need it: instead, we set it at certain times in the hope
that it'll be needed. I'm suggesting we should set W^X *exactly* where we need it,
such as at patching methods. Not at VM entry.

Get rid of all the assert_wx_state. If deoptimization needs WXWrite, then it should
set it, not hope for someone else to have done it.

@tstuefe
Copy link
Member

tstuefe commented Apr 13, 2024

I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call?

In other words, how fine granular can we get without incurring too high a cost?

@theRealAph
Copy link
Contributor

I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call?

In other words, how fine granular can we get without incurring too high a cost?

It's expensive. We've seen it cause significant slowdowns in Java->VM transitions.

@mlbridge
Copy link

mlbridge bot commented Apr 14, 2024

Mailing list message from Kim Barrett on hotspot-dev:

On Apr 13, 2024, at 8:35 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:

On 4/12/24 18:18, Sergey Nazarkin wrote:

?It is the way in which it is implemented in the current code.

No, it's not. That's not what we do at all.

We don't set W^X when we need it: instead, we set it at certain times in the hope
that it'll be needed. I'm suggesting we should set W^X *exactly* where we need it,
such as at patching methods. Not at VM entry.

Get rid of all the assert_wx_state. If deoptimization needs WXWrite, then it should
set it, not hope for someone else to have done it.

There was a recent internal-to-Oracle discussion about W^X, which led to
something that I think is along the lines of what @aph is suggesting,
including a prototype. I will poke some of the folks who were more deeply
involved in that discussion (I was just casually following along, and am not
knowledgeable in this area), but it's a weekend now, so it might take them
some time to respond.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240413/343e5f80/signature.asc>

@lewurm
Copy link
Member

lewurm commented Apr 15, 2024

I agree that this PR effectively removes the whole idea behind JIT_MAP and thus is a bad idea security wise. Still it has some value.

@snazarkin do you have numbers on how many transitions are done with your PR vs. the current state when running the same program? That would give us a lower bound on the amount of transitions needed and define a goal for future improvements in that area.

@snazarkin
Copy link
Contributor Author

do you have numbers on how many transitions are done with your PR vs. the current state when running the same program?

With just simple java -version it is ~180 vs ~9500 (new vs old), for java -help ~1120 vs ~86300. For the applications the ration is about the same.

@reinrich
Copy link
Member

What about granting WXWrite only if the current thread is in _thread_in_vm?
That would be more restrictive and roughly equivalent how it currently works. Likely there are some places then that should be granted WXWrite eagerly because they need WXWrite without _thread_in_vm. E.g. the JIT compiler threads should have WXWrite and never WXExec (I assume) which should be checked in the signal handler.

@snazarkin
Copy link
Contributor Author

The patch doesn't protect against native agents, as this is obviously impossible. The current code doesn't do that either. For the bytecode, it doesn't prevent the attacker from abusing unsafe api to modify code cache. However unsafe functions are already considered "safe" and we proactively enable WXWrite as well as move thread to _thread_in_vm state (@reinrich). JITed code can't write to the cache either with or without the patch.

I totally get the sense of loss of security. But is this really the case?

@theRealAph
Copy link
Contributor

The patch doesn't protect against native agents, as this is obviously impossible. The current code doesn't do that either. For the bytecode, it doesn't prevent the attacker from abusing unsafe api to modify code cache. However unsafe functions are already considered "safe" and we proactively enable WXWrite as well as move thread to _thread_in_vm state (@reinrich). JITed code can't write to the cache either with or without the patch.

I totally get the sense of loss of security. But is this really the case?

I think it is. W^X is intended (amongst other things) to protect against the use of gadgets, from buffer overflow exploits in non-java code to ROP programming. At present, in order to generate code and execute it, you first have to be able to make the JIT code writable, then write the code, then make it executable. then jump to the code. And the exploit writer might have to do some or all of this by finding gadgets. If we were to merge this patch then all the attacker would have to do is write code to memory and find a way to jump to it, and the automatic switch-on-segfault in this patch would do the all the work the attacker needs.

It makes far more sense to tag those places that actually need to change W^X access, and only switch there.

You could argue that any switching of W^X on a write to code space, then switching it back on jumping (or returning) to Java code, even what we already do, is effectively the same thing. Kinda, but it's not on just any attempt to write to code space or any attempt to jump into code, it's at the places we choose, and we can be careful to limit those places.

But surely the JDK is not the most vulnerable part of the stack anyway? I'd agree with that, of course, but I don't think that's sufficient reason to decide to bypass an OS security mechanism.

We are trying to reduce the size of the attack surface.

@stooart-mon
Copy link
Contributor

To add a little to @theRealAph 's point, we should avoid painting ourselves into a corner. I don't know how the platform is going to evolve, but I'd be nervous about fighting against the intentions of the protections.

@openjdk
Copy link

openjdk bot commented May 9, 2024

@snazarkin this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout w_x_z
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label May 9, 2024
Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a sweet-spot middle-ground between the two extremes: full-lazy, ideal for performance, and fine-grained execute-by-default, ideal for security. I don't think we should change to full-lazy and remove all the guard rails at this time. I am investigating execute-by-default, and it looks promising.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 6, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 10, 2024

@snazarkin This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 8, 2024

@snazarkin This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org merge-conflict Pull request has merge conflict with target branch serviceability serviceability-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

8 participants