Skip to content

Conversation

@offamitkumar
Copy link
Member

@offamitkumar offamitkumar commented Apr 7, 2025

Unsafe::setMemory intrinsic implementation for s390x.

Stub Code:

StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes)
--------------------------------------------------------------------------------
  0x000003ffb04b63c0:   ogrk	%r1,%r2,%r3
  0x000003ffb04b63c4:   nill	%r1,7
  0x000003ffb04b63c8:   je	0x000003ffb04b6410
  0x000003ffb04b63cc:   nill	%r1,3
  0x000003ffb04b63d0:   je	0x000003ffb04b6460
  0x000003ffb04b63d4:   nill	%r1,1
  0x000003ffb04b63d8:   jlh	0x000003ffb04b64a0
  0x000003ffb04b63dc:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b63e2:   risbgz	%r1,%r3,32,63,62
  0x000003ffb04b63e8:   je	0x000003ffb04b6402
  0x000003ffb04b63ec:   nopr
  0x000003ffb04b63ee:   nopr
  0x000003ffb04b63f0:   sth	%r4,0(%r2)
  0x000003ffb04b63f4:   sth	%r4,2(%r2)
  0x000003ffb04b63f8:   agfi	%r2,4
  0x000003ffb04b63fe:   brct	%r1,0x000003ffb04b63f0
  0x000003ffb04b6402:   nilf	%r3,2
  0x000003ffb04b6408:   ber	%r14
  0x000003ffb04b640a:   sth	%r4,0(%r2)
  0x000003ffb04b640e:   br	%r14
  0x000003ffb04b6410:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b6416:   risbg	%r4,%r4,32,47,16
  0x000003ffb04b641c:   risbg	%r4,%r4,0,31,32
  0x000003ffb04b6422:   risbgz	%r1,%r3,32,63,60
  0x000003ffb04b6428:   je	0x000003ffb04b6446
  0x000003ffb04b642c:   nopr
  0x000003ffb04b642e:   nopr
  0x000003ffb04b6430:   stg	%r4,0(%r2)
  0x000003ffb04b6436:   stg	%r4,8(%r2)
  0x000003ffb04b643c:   agfi	%r2,16
  0x000003ffb04b6442:   brct	%r1,0x000003ffb04b6430
  0x000003ffb04b6446:   nilf	%r3,8
  0x000003ffb04b644c:   ber	%r14
  0x000003ffb04b644e:   stg	%r4,0(%r2)
  0x000003ffb04b6454:   br	%r14
  0x000003ffb04b6456:   nopr
  0x000003ffb04b6458:   nopr
  0x000003ffb04b645a:   nopr
  0x000003ffb04b645c:   nopr
  0x000003ffb04b645e:   nopr
  0x000003ffb04b6460:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b6466:   risbg	%r4,%r4,32,47,16
  0x000003ffb04b646c:   risbgz	%r1,%r3,32,63,61
  0x000003ffb04b6472:   je	0x000003ffb04b6492
  0x000003ffb04b6476:   nopr
  0x000003ffb04b6478:   nopr
  0x000003ffb04b647a:   nopr
  0x000003ffb04b647c:   nopr
  0x000003ffb04b647e:   nopr
  0x000003ffb04b6480:   st	%r4,0(%r2)
  0x000003ffb04b6484:   st	%r4,4(%r2)
  0x000003ffb04b6488:   agfi	%r2,8
  0x000003ffb04b648e:   brct	%r1,0x000003ffb04b6480
  0x000003ffb04b6492:   nilf	%r3,4
  0x000003ffb04b6498:   ber	%r14
  0x000003ffb04b649a:   st	%r4,0(%r2)
  0x000003ffb04b649e:   br	%r14
  0x000003ffb04b64a0:   risbgz	%r1,%r3,32,63,63
  0x000003ffb04b64a6:   je	0x000003ffb04b64c2
  0x000003ffb04b64aa:   nopr
  0x000003ffb04b64ac:   nopr
  0x000003ffb04b64ae:   nopr
  0x000003ffb04b64b0:   stc	%r4,0(%r2)
  0x000003ffb04b64b4:   stc	%r4,1(%r2)
  0x000003ffb04b64b8:   agfi	%r2,2
  0x000003ffb04b64be:   brct	%r1,0x000003ffb04b64b0
  0x000003ffb04b64c2:   nilf	%r3,1
  0x000003ffb04b64c8:   ber	%r14
  0x000003ffb04b64ca:   stc	%r4,0(%r2)
  0x000003ffb04b64ce:   br	%r14

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353500: [s390x] Intrinsify Unsafe::setMemory (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480
$ git checkout pull/24480

Update a local copy of the PR:
$ git checkout pull/24480
$ git pull https://git.openjdk.org/jdk.git pull/24480/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24480

View PR using the GUI difftool:
$ git pr show -t 24480

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24480.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 7, 2025

👋 Welcome back amitkumar! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 7, 2025

@offamitkumar This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353500: [s390x] Intrinsify Unsafe::setMemory

Reviewed-by: lucy, mdoerr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 939 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8353500 8353500: [s390x] Intrinsify Unsafe::setMemory Apr 7, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 7, 2025
@openjdk
Copy link

openjdk bot commented Apr 7, 2025

@offamitkumar The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 7, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 7, 2025

Webrevs

@offamitkumar
Copy link
Member Author

with patch:

with the patch: 

Benchmark                       (aligned)  (size)  Mode  Cnt   Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30   2.351 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30   2.655 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30   2.614 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30   2.783 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30   2.760 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30   2.891 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30   2.697 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30   2.769 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30   3.689 ± 0.016  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30   3.127 ± 0.009  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  15.900 ± 0.046  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30   4.140 ± 0.057  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  53.748 ± 0.872  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30   9.245 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30   2.346 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30   2.647 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30   2.617 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30   2.786 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30   2.755 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30   2.892 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30   2.699 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30   2.765 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30   3.691 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30   3.175 ± 0.053  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  15.892 ± 0.028  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  15.122 ± 0.347  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  53.588 ± 0.315  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  52.775 ± 0.169  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30   2.333 ± 0.216  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30   1.878 ± 0.092  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30   2.301 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30   2.400 ± 0.201  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30   2.666 ± 0.052  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30   2.209 ± 0.084  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30   3.086 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30   2.294 ± 0.217  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30   4.631 ± 0.013  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30   2.164 ± 0.124  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  13.959 ± 0.042  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30   3.078 ± 0.211  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  51.435 ± 0.712  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30   7.879 ± 0.140  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30   2.486 ± 0.169  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30   2.163 ± 0.065  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30   2.307 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30   2.489 ± 0.121  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30   2.653 ± 0.025  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30   2.830 ± 0.161  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30   3.086 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30   3.124 ± 0.189  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30   4.634 ± 0.015  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30   4.552 ± 0.194  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  13.977 ± 0.031  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  14.310 ± 0.177  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  52.244 ± 1.414  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  53.824 ± 0.580  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

without patch:

Benchmark                       (aligned)  (size)  Mode  Cnt   Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30   2.368 ± 0.029  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30   2.647 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30   2.615 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30   2.782 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30   2.760 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30   2.889 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30   2.702 ± 0.017  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30   2.766 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30   3.748 ± 0.045  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30   3.122 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  24.901 ± 0.106  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30  20.841 ± 0.154  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  24.498 ± 0.233  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30  24.290 ± 0.050  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30   2.345 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30   2.648 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30   2.619 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30   2.784 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30   2.756 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30   2.892 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30   2.702 ± 0.011  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30   2.765 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30   3.702 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30   3.121 ± 0.010  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  25.130 ± 0.058  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  24.891 ± 0.128  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  24.385 ± 0.061  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  24.444 ± 0.076  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30  19.611 ± 0.495  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30  18.797 ± 0.126  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30  22.808 ± 0.075  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30  18.797 ± 0.047  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30  22.934 ± 0.114  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30  19.580 ± 0.061  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30  22.798 ± 0.063  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30  18.029 ± 0.689  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30  22.736 ± 0.034  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30  17.799 ± 0.276  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  22.777 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30  19.271 ± 0.017  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  22.758 ± 0.068  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30  22.752 ± 0.057  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30  19.115 ± 0.069  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30  22.795 ± 0.067  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30  22.754 ± 0.057  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30  22.797 ± 0.064  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30  22.803 ± 0.078  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30  22.738 ± 0.044  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30  22.815 ± 0.074  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30  22.732 ± 0.026  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30  22.754 ± 0.063  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30  22.743 ± 0.042  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  23.250 ± 1.193  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  22.838 ± 0.182  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  22.748 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  22.740 ± 0.039  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

__ z_risbg(tmp, size, 32, 128/* risbgz */ + 63, 64 - exact_log2(2 * elem_size), 0); // just do the right shift and set cc
__ z_bre(L_Tail);

__ align(16); // loop alignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align(32) would be more helpful:

  • instruction engine fetches octoword (32 bytes) bundles.
  • Tight loop is < 32 byes -> all in one bundle, does not cross cache line boundary.

// multiple of 2
do_setmemory_atomic_loop(2, dest, size, byteVal, _masm);

__ align(16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this alignment good for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch target alignment. There is no fallthrough path from before this point. Should it be 32?

__ z_ogrk(rScratch1, dest, size);

__ z_nill(rScratch1, 7);
__ z_bre(L_fill8Bytes); // branch if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls use z_braz() to reflect check semantics

@TheRealMDoerr
Copy link
Contributor

Since this is taken from #24254: Maybe you can review that one, too?

@offamitkumar offamitkumar marked this pull request as draft April 8, 2025 10:05
@openjdk openjdk bot removed the rfr Pull request is ready for review label Apr 8, 2025
@offamitkumar offamitkumar marked this pull request as ready for review April 9, 2025 08:52
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 9, 2025
@TheRealMDoerr
Copy link
Contributor

This looks good to me. I suggest measuring performance with the latest version.

@offamitkumar
Copy link
Member Author

Result looks almost similar:

Benchmark                       (aligned)  (size)  Mode  Cnt   Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30   2.349 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30   2.647 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30   2.614 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30   2.779 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30   2.759 ± 0.016  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30   2.887 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30   2.697 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30   2.771 ± 0.034  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30   3.700 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30   3.165 ± 0.042  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  17.266 ± 0.830  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30   4.479 ± 0.019  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  54.563 ± 1.222  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30   9.141 ± 0.069  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30   2.338 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30   2.647 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30   2.618 ± 0.009  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30   2.780 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30   2.752 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30   2.889 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30   2.695 ± 0.002  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30   2.763 ± 0.009  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30   3.684 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30   3.115 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  16.376 ± 0.018  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  15.394 ± 0.080  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  55.838 ± 1.325  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  52.927 ± 0.874  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30   2.281 ± 0.206  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30   2.076 ± 0.147  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30   2.562 ± 0.004  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30   2.020 ± 0.105  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30   2.938 ± 0.052  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30   2.412 ± 0.007  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30   3.349 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30   2.304 ± 0.220  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30   5.005 ± 0.005  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30   2.113 ± 0.110  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  14.160 ± 0.401  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30   3.200 ± 0.170  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  55.619 ± 0.672  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30   7.613 ± 0.186  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30   2.324 ± 0.224  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30   2.483 ± 0.004  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30   2.565 ± 0.005  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30   2.669 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30   2.916 ± 0.031  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30   3.042 ± 0.029  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30   3.360 ± 0.037  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30   3.401 ± 0.074  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30   5.012 ± 0.014  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30   4.592 ± 0.156  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  13.981 ± 0.392  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  14.876 ± 0.894  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  55.273 ± 0.546  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  53.228 ± 1.325  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

@offamitkumar offamitkumar marked this pull request as draft April 16, 2025 04:50
@openjdk openjdk bot removed the rfr Pull request is ready for review label Apr 16, 2025
@offamitkumar
Copy link
Member Author

This result is from shared-machine, but looks like the regression part is fixed.

We got regression because, for Unaligned case, only 1-byte store instruction were getting emitted (i.e. stc). And as the alignment depends on two factors (size and address where we are storing the value). So we can't always exactly tell that this will be an aligned or un-aligned case in the Benchmark.

I will do further testing and will see if more optimization can be done. Then will mark this PR ready for review.

Benchmark                       (aligned)  (size)  Mode  Cnt  Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30  2.893 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30  3.122 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30  3.286 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30  3.401 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30  3.291 ± 0.021  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30  3.455 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30  3.471 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30  3.215 ± 0.033  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30  4.632 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30  3.815 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  9.695 ± 0.036  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30  5.296 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  9.682 ± 0.011  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30  9.508 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30  2.887 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30  3.134 ± 0.024  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30  3.285 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30  3.397 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30  3.297 ± 0.049  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30  3.445 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30  3.471 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30  3.204 ± 0.023  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30  4.630 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30  3.811 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  9.676 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  9.690 ± 0.031  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  9.678 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  4.180 ± 0.010  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30  2.636 ± 0.060  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30  2.379 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30  7.743 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30  2.531 ± 0.113  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30  7.746 ± 0.012  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30  3.183 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30  7.742 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30  2.580 ± 0.095  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30  7.870 ± 0.184  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30  2.523 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  7.757 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30  3.580 ± 0.005  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  7.744 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30  8.090 ± 0.110  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30  2.683 ± 0.025  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30  7.747 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30  7.738 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30  7.745 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30  7.773 ± 0.064  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30  7.736 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30  7.747 ± 0.010  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30  7.748 ± 0.030  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30  7.735 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30  7.747 ± 0.020  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  7.746 ± 0.013  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  7.743 ± 0.012  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  7.741 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  2.739 ± 0.005  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

@offamitkumar
Copy link
Member Author

Thanks! That sounds like mvc should better not be used for Unsafe operations. Seeing no failures in some tests doesn't prove that it's safe.

@TheRealMDoerr But in this case MVC will only be used iff store is unaligned. If they are unaligned then we don't care about the atomicity. In other case, we will use sth, st, stg as per alignment. And current C++ implementation is also emitting mvc instruction for unaligned case. Which is the behaviour this stub will replicate.

If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result.

@TheRealMDoerr
Copy link
Contributor

As I said, mvc usage may be a bug. It was probably not indented that gcc generates it for Unsafe operations. Atomicity is never a problem when filling memory with Bytes. The code is designed to have a defined behavior when hitting signals. That's why UnsafeMemoryAccessMark is used.

@TheRealMDoerr
Copy link
Contributor

If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result.

Are these corner cases relevant at all?

@offamitkumar
Copy link
Member Author

If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result.

Are these corner cases relevant at all?

I am not sure about that. But the hit was significant in case of 255 & 256 byte.

@TheRealMDoerr
Copy link
Contributor

The invariant on other platforms is that all Bytes before the non-writable address have been written when hitting a signal. I don' know if that is really required on s390. It may be a risk to use a different behavior. The code can be used to write memory mapped files or other stuff.
If this behavior is not required, why not use mvc always?

@uweigand
Copy link

The invariant on other platforms is that all Bytes before the non-writable address have been written when hitting a signal. I don' know if that is really required on s390. It may be a risk to use a different behavior. The code can be used to write memory mapped files or other stuff. If this behavior is not required, why not use mvc always?

I thought the reason for not using mvc always is atomicity within array elements? That is, if you're writing an array of 4- or 8-byte values, than change to every one of those array elements should be atomic w.r.t. other CPUs. If that is true, you cannot use mvc. (However, that requirement would not be relevant for arrays of 1-byte values.)

@TheRealMDoerr
Copy link
Contributor

However, that requirement would not be relevant for arrays of 1-byte values.

Correct. Unsafe::setMemory fills a memory region with 1-byte values. So, atomicity can't be a problem.

@theRealAph
Copy link
Contributor

There's a lot of confusion about this. There is no requirement that all bytes before the non-writable address have been written when hitting a signal. Behaving nicely when writing beyond allocated memory is "best effort" only: we're trying to be nice, that's all.

The atomicity requirement is here , in the specification of Unsafe::SetMemory:

     * <p>The stores are in coherent (atomic) units of a size determined
     * by the address and length parameters.  If the effective address and
     * length are all even modulo 8, the stores take place in 'long' units.
     * If the effective address and length are (resp.) even modulo 4 or 2,
     * the stores take place in units of 'int' or 'short'.

@TheRealMDoerr
Copy link
Contributor

TheRealMDoerr commented May 22, 2025

Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). Or is mvc only used in the single Byte aligned case?

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new proposal is probably ok, then.

@theRealAph
Copy link
Contributor

Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). Or is mvc only used in the single Byte aligned case?

Yes, that's right, just for the byte-aligned case.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label May 26, 2025
@offamitkumar
Copy link
Member Author

Tier-1 test are clean with fastdebug-vm;

These are the performance number on my z16 zVM:

Benchmark                       (aligned)  (size)  Mode  Cnt  Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30  2.889 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30  3.115 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30  3.271 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30  3.382 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30  3.295 ± 0.062  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30  3.428 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30  3.482 ± 0.049  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30  3.188 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30  4.612 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30  3.795 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  5.376 ± 0.037  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30  4.846 ± 0.033  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  7.723 ± 0.263  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30  7.299 ± 0.017  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30  2.883 ± 0.017  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30  3.110 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30  3.271 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30  3.385 ± 0.009  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30  3.268 ± 0.024  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30  3.431 ± 0.010  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30  3.459 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30  3.186 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30  4.614 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30  3.799 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  5.282 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  4.891 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  8.038 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  7.890 ± 0.108  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30  3.785 ± 0.062  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30  3.772 ± 0.075  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30  3.433 ± 0.052  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30  3.727 ± 0.172  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30  3.414 ± 0.062  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30  3.313 ± 0.117  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30  3.198 ± 0.015  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30  2.843 ± 0.158  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30  3.278 ± 0.004  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30  2.925 ± 0.113  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  3.800 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30  3.400 ± 0.050  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  7.032 ± 0.120  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30  6.423 ± 0.013  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30  3.645 ± 0.148  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30  3.638 ± 0.152  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30  3.377 ± 0.068  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30  3.692 ± 0.119  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30  3.436 ± 0.027  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30  3.427 ± 0.038  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30  3.192 ± 0.014  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30  3.035 ± 0.046  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30  3.294 ± 0.049  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30  3.042 ± 0.061  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  3.579 ± 0.006  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  3.449 ± 0.035  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  8.633 ± 0.317  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  7.003 ± 0.085  ns/op

@RealLucy
Copy link
Contributor

The atomicity spec cited by @theRealAph severely limits the optimisation options. Depending on the data alignment, you have to use 8, 4, or 2-byte stores. Only for the unaligned case there are no hard restrictions, just the soft "let's be nice" conventions.

With that said, the vector implementation should be ok. It is just not as nice as a byte store loop. There could be as many as 15 uninitialised bytes if just the last byte of a vector store is not writable. I would take that risk.

UnsafeMemoryAccessMark umam(this, true, false);

__ z_vlvgb(Z_V0, byteVal, 0);
__ z_vrepb(Z_V0, Z_V0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use z_vzero(Vreg) to preload the vector register with all zeroes. Saves an instruction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not loading 0 here. This is my intention: with z_vlvgb, putting value of byteVal in the first 0th index of Z_V0 and then with z_vrepb replicating the 0th index value (1 byte) to the whole register.

z_vzero will make sense if we are zeroing out the memory but that's not the case always. We do fill some non-zero 1 byte value in most of the case.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 26, 2025
@TheRealMDoerr
Copy link
Contributor

The large number of conditional branches may cause a regression in real life scenarios with a large variance of sizes and alignments.

@offamitkumar
Copy link
Member Author

The large number of conditional branches may cause a regression in real life scenarios with a large variance of sizes and alignments.

I can try to run the same benchmark with larger sizes. But again it wouldn't replicate the real life scenario. Could you suggest some other benchmark ?

@offamitkumar
Copy link
Member Author

As of now I am not getting any regression in the benchmark. And vector store + mvc is not performing better then the vector store only solution. So I am moving ahead with the integration.

@offamitkumar
Copy link
Member Author

Thanks to all for the help and reviews/suggestion you provided.

/integrate

@openjdk
Copy link

openjdk bot commented May 30, 2025

Going to push as commit 2000551.
Since your change was applied there have been 1027 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 30, 2025
@openjdk openjdk bot closed this May 30, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 30, 2025
@openjdk
Copy link

openjdk bot commented May 30, 2025

@offamitkumar Pushed as commit 2000551.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@theRealAph
Copy link
Contributor

What are all those noprs for?

@offamitkumar
Copy link
Member Author

What are all those noprs for?

Sorry that is old code; nops were inserted for the loop alignment; this is the newer stub code:

- - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StubRoutines::unsafe_setmemory [0x000003ffa84b63c0, 0x000003ffa84b644c] (140 bytes)
--------------------------------------------------------------------------------
BFD: unknown S/390 disassembler option: s390
.long	0x00000000
  0x000003ffa84b63c0:   vlvgb	%v0,%r4,0
  0x000003ffa84b63c6:   vrepb	%v0,%v0,0
  0x000003ffa84b63cc:   aghi	%r3,-32
  0x000003ffa84b63d0:   jl	0x000003ffa84b63ec
  0x000003ffa84b63d4:   vst	%v0,0(%r2)
  0x000003ffa84b63da:   vst	%v0,16(%r2)
  0x000003ffa84b63e0:   aghi	%r2,32
  0x000003ffa84b63e4:   aghi	%r3,-32
  0x000003ffa84b63e8:   jhe	0x000003ffa84b63d4
  0x000003ffa84b63ec:   tmll	%r3,16
  0x000003ffa84b63f0:   je	0x000003ffa84b63fe
  0x000003ffa84b63f4:   vst	%v0,0(%r2)
  0x000003ffa84b63fa:   aghi	%r2,16
  0x000003ffa84b63fe:   tmll	%r3,8
  0x000003ffa84b6402:   je	0x000003ffa84b6410
  0x000003ffa84b6406:   vsteg	%v0,0(%r2),0
  0x000003ffa84b640c:   aghi	%r2,8
  0x000003ffa84b6410:   tmll	%r3,7
  0x000003ffa84b6414:   je	0x000003ffa84b644a
  0x000003ffa84b6418:   tmll	%r3,4
  0x000003ffa84b641c:   je	0x000003ffa84b642a
  0x000003ffa84b6420:   vstef	%v0,0(%r2),0
  0x000003ffa84b6426:   aghi	%r2,4
  0x000003ffa84b642a:   tmll	%r3,2
  0x000003ffa84b642e:   je	0x000003ffa84b643c
  0x000003ffa84b6432:   vsteh	%v0,0(%r2),0
  0x000003ffa84b6438:   aghi	%r2,2
  0x000003ffa84b643c:   tmll	%r3,1
  0x000003ffa84b6440:   je	0x000003ffa84b644a
  0x000003ffa84b6444:   vsteb	%v0,0(%r2),0
  0x000003ffa84b644a:   br	%r14
--------------------------------------------------------------------------------
- - - [END] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

@offamitkumar offamitkumar deleted the not_safe_intrinsic branch June 2, 2025 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants