Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8310656: RISC-V: __builtin___clear_cache can fail silently. #14670

Closed
wants to merge 4 commits into from

Conversation

robehn
Copy link
Contributor

@robehn robehn commented Jun 27, 2023

Hi, please consider.

We recently had a bug where user were missing permissions to use this syscall.
Which caused crashing on, according to hs_err on things like "addi x11, x24, 0" with SIGILL.
If it fails it is even possible to execute valid but 'old' instruction which may not lead to a crash, instead the program misbehaves.

To avoid this mess I suggest that we first test the syscall during vm init and we use it directly.
This way we can make sure it never fails.

Tested failing syscall with qemu, tested t1 in qemu, t1 on jh7110 in-progress.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8310656: RISC-V: __builtin___clear_cache can fail silently. (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14670/head:pull/14670
$ git checkout pull/14670

Update a local copy of the PR:
$ git checkout pull/14670
$ git pull https://git.openjdk.org/jdk.git pull/14670/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14670

View PR using the GUI difftool:
$ git pr show -t 14670

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14670.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 27, 2023

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 27, 2023
@openjdk
Copy link

openjdk bot commented Jun 27, 2023

@robehn The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 27, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 27, 2023

Webrevs

@luhenry
Copy link
Member

luhenry commented Jun 27, 2023

For context, this was tracked down for adoptium/adoptium-support#697

@robehn
Copy link
Contributor Author

robehn commented Jun 27, 2023

Thanks!

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got curious. Looks good.

I was not even aware that __builtin___clear_cache had an errno. Is this documented somewhere? We should probably check arm64 too.

Question, would cacheflush(2) have worked too? Instead of the syscall?

src/hotspot/os_cpu/linux_riscv/riscv_flush_icache.cpp Outdated Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented Jun 28, 2023

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8310656: RISC-V: __builtin___clear_cache can fail silently.

Reviewed-by: luhenry, stuefe, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 28, 2023
@RealFYang
Copy link
Member

RealFYang commented Jun 28, 2023

Hi, I don't quite understand how this issue triggers. Could it happen on real hardware platforms like Unmatched or others?
Since __builtin___clear_cache is a GCC built-in function which doesn't return an error code, my previous understanding is that it will never fail to do its job. And the GCC manual doesn't even mention the cases where this could fail. Also I see uses of this built-in funtions on other platforms like AArch64.

void __builtin___clear_cache (void *begin, void *end) [Built-in Function]
This function is used to flush the processor’s instruction cache for the region of memory
between begin inclusive and end exclusive. Some targets require that the instruction
cache be flushed, after modifying memory containing code, in order to obtain
deterministic behavior.
If the target does not require instruction cache flushes, __builtin___clear_cache
has no effect. Otherwise either instructions are emitted in-line to clear the instruction
cache or a call to the __clear_cache function in libgcc is made.

@luhenry
Copy link
Member

luhenry commented Jun 28, 2023

@RealFYang the issue arises when you run Java inside Docker. In that case, Docker will block certain syscalls (like riscv_flush_icache before Docker v23) unless you disable this security feature. __builtin___clear_cache will then silently fail (the syscall will fail with EPERM), and no icache flush will actually happen.

That doesn't happen on QEMU because the riscv_flush_icache syscall never actually gets called (it's "interpreted" by qemu inside the docker container), but it reproduces very reliably on HiFive Unmatched. To reproduce, simply run java -version on an Unmatched board inside a Docker container on stock Ubuntu or Debian (these still ship Docker v20).

@robehn
Copy link
Contributor Author

robehn commented Jun 28, 2023

Got curious. Looks good.

I was not even aware that __builtin___clear_cache had an errno. Is this documented somewhere? We should probably check arm64 too.

Question, would cacheflush(2) have worked too? Instead of the syscall?

Hi Thomas, thanks for looking, and thanks for being curios!

And sorry in advanced for this long answer, I felt that I needed to add some context.

Cross modifying code/CMODX is not specified on RV.
RV allows both coherent and non-coherent I/D cache.
Right now we have to assume non-coherent I/D cache (I have suggest hwprobe should return this).

But there is two pieces we can use, ifence and full IPI shootdown.
ifence is a local hart instruction read fence.
With non-coherent I/D cache ifence is not sufficient.
And ifence is an extension, not on all RV cpus.

Kernel 4.15 got the syscall riscv_flush_icache added with the IPI shootdown.
There is also a hart local version SYS_RISCV_FLUSH_ICACHE_LOCAL, which can be used instead of ifence.

If we ignore the case of having an UP (as you know we removed UP from the VM),
to make sure all threads/harts have a fresh I-cache our only option, at the moment,
is to emit that syscall => IPI.

There is still some cases in the VM I believe is not entirely correct.
The IPI is very expensive and system-wide intrusive, so if we are running on
I/D cache coherent system we should be able to utilize the ifence instruction and avoid (some?) IPI's.

Regarding the cacheflush wrapper it does not seem to implemented and if I tested it I get undef ref when linking. I only see the syscall in that header.

It looks like glibc/gcc don't consider failing syscalls, maybe this was uncommon, but now when everyone is running in different containers it's certainly more common. E.g. cacheflush do not list EPERM as a return value.

Note that there is a vdso wrapper for this syscall, which just do the ecall (syscall instruction), so I fail to see the point in trying to use that. And since we are doing an a full system IPI the speed of the syscall is not essential.

So I feel more comfortable with the VM explicit emitting this syscall, so everyone can see exactly what we are doing.

Signed-off-by: Robbin Ehn <rehn@rivosinc.com>
@robehn
Copy link
Contributor Author

robehn commented Jun 30, 2023

Pushed small update.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Two nits remain.

src/hotspot/cpu/riscv/icache_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/os_cpu/linux_riscv/riscv_flush_icache.cpp Outdated Show resolved Hide resolved
@tstuefe
Copy link
Member

tstuefe commented Jul 1, 2023

@robehn Thanks for the thorough explanation!

So I feel more comfortable with the VM explicit emitting this syscall, so everyone can see exactly what we are doing.

I agree, rather do the real thing explicitly than use - and then have to second-guess - a wrapper.

@robehn
Copy link
Contributor Author

robehn commented Jul 1, 2023

Merge and updated for:
"8311145 Remove check_with_errno duplicates".
and those nits.

Thanks @RealFYang @tstuefe !

@RealFYang Let me know if that flush address update is better!

@RealFYang
Copy link
Member

@robehn : Yes, looks better to me now. Thank you for the update.

@VladimirKempik
Copy link

We had an issue with __builtin___clear_cache and clang-compiled jvm linked with libstdc++ ( instead of libc++) where calling __builtin___clear_cache was resulting in the NOP.
This effectively fixes that case as well.

@robehn
Copy link
Contributor Author

robehn commented Jul 2, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jul 2, 2023

Going to push as commit faf1b82.
Since your change was applied there has been 1 commit pushed to the master branch:

  • 0e3d91d: 8311215: [BACKOUT] JDK-8047998 Abort the vm if MaxNewSize is not the same as NewSize when MaxHeapSize is the same as InitialHeapSize

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 2, 2023
@openjdk openjdk bot closed this Jul 2, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 2, 2023
@openjdk
Copy link

openjdk bot commented Jul 2, 2023

@robehn Pushed as commit faf1b82.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn
Copy link
Contributor Author

robehn commented Jul 2, 2023

/backport jdk21

@openjdk
Copy link

openjdk bot commented Jul 2, 2023

@robehn the backport was successfully created on the branch robehn-backport-faf1b822 in my personal fork of openjdk/jdk21. To create a pull request with this backport targeting openjdk/jdk21:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit faf1b822 from the openjdk/jdk repository.

The commit being backported was authored by Robbin Ehn on 2 Jul 2023 and was reviewed by Ludovic Henry, Thomas Stuefe and Fei Yang.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21:

$ git fetch https://github.com/openjdk-bots/jdk21.git robehn-backport-faf1b822:robehn-backport-faf1b822
$ git checkout robehn-backport-faf1b822
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21.git robehn-backport-faf1b822

@VladimirKempik
Copy link

would you mind backporting it to 17u-dev as well ?

@luhenry
Copy link
Member

luhenry commented Nov 15, 2023

/backport jdk17u-dev

@robehn
Copy link
Contributor Author

robehn commented Nov 17, 2023

would you mind backporting it to 17u-dev as well ?

Sorry missed this.

@luhenry it will not apply clean due to "8311145: Remove check_with_errno duplicates"

@luhenry
Copy link
Member

luhenry commented Nov 17, 2023

Backport to jdk17u-dev in progress at openjdk/jdk17u-dev#1968

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
5 participants