Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented May 3, 2025

PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement.
AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details.

e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding.

1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24
    EVEX        OP   MR   SIB       DISP       IMM
--------------|----|----|----|---------------|-----|
62 6b c1 40     25   84   ec     40 30 20 10     ff

2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24
For full vector width operation, scalar matches with vector size, hence scale N = 64
effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 
    EVEX       OP   MR SIB    DISP     IMM
-------------|----|---|---|-----------|---|
62 6b c1 40    25  44   ec      01     ff 

Kindly review and share your feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021
$ git checkout pull/25021

Update a local copy of the PR:
$ git checkout pull/25021
$ git pull https://git.openjdk.org/jdk.git pull/25021/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25021

View PR using the GUI difftool:
$ git pr show -t 25021

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25021.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 3, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 3, 2025

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry

Reviewed-by: epeter, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 384 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented May 3, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 3, 2025
@jatin-bhateja jatin-bhateja marked this pull request as ready for review May 4, 2025 07:49
@openjdk openjdk bot added the rfr Pull request is ready for review label May 4, 2025
@mlbridge
Copy link

mlbridge bot commented May 4, 2025

Webrevs

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja Thanks you for looking into this!

The fix looks generally reasonable, thanks for adding all the tests!

@eme64
Copy link
Contributor

eme64 commented May 12, 2025

@jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :)

@jatin-bhateja jatin-bhateja changed the title 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry May 12, 2025
@openjdk openjdk bot removed the rfr Pull request is ready for review label May 12, 2025
@jatin-bhateja
Copy link
Member Author

@jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :)

Please use the latest version

@jatin-bhateja jatin-bhateja changed the title C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry May 12, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label May 12, 2025
@eme64
Copy link
Contributor

eme64 commented May 13, 2025

@jatin-bhateja Ah ok. Last tests had passed, but I'll re-run now with your newest updates.

@eme64
Copy link
Contributor

eme64 commented May 15, 2025

@jatin-bhateja Tests for commit 2 / v01 have all passed 🟢

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks reasonable to me. But I don't understand the x64 change, so we need some from Intel to review here. The Java tests look ok to me :)

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 15, 2025
@sviswa7
Copy link

sviswa7 commented May 19, 2025

Some more places correction needs to be done for address attributes:

  1. evpmovzxbd tuple type needs change from HVM to QVM.
  2. Address attribute missing for two additional instructions taking Address as input/output: vpermb, paddd.
  3. The input_size_in_bits should be EVEX_32bit for cvtsi2ssq, cvtsi2sdq.
  4. The input_size_in_bits should be EVEX_64bit for evpgatherdq, evpscatterdq, evgatherdpd, evscatterdpd.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label May 20, 2025
@sviswa7
Copy link

sviswa7 commented May 20, 2025

Thanks for the update. It looks like you missed changing the input_size_in_bits to EVEX_64bit for evgatherdpd.

@eme64
Copy link
Contributor

eme64 commented May 21, 2025

@jatin-bhateja @sviswa7 Can you explain the impact of the EVEX_HVM, EVEX_QVM etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct?

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for fixing this issue.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 21, 2025
@sviswa7
Copy link

sviswa7 commented May 21, 2025

@jatin-bhateja @sviswa7 Can you explain the impact of the EVEX_HVM, EVEX_QVM etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct?

@eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in Intel SDM Volume 2. So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero.

@sviswa7
Copy link

sviswa7 commented May 21, 2025

@jatin-bhateja @sviswa7 Can you explain the impact of the EVEX_HVM, EVEX_QVM etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct?

@eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in Intel SDM Volume 2. So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero.

For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as 52d752c in a separate future PR.

@eme64
Copy link
Contributor

eme64 commented May 22, 2025

@sviswa7 Thanks for the explanations!
Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors?

I'll re-run testing now, just to be sure.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented May 22, 2025

@sviswa7 Thanks for the explanations! Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors?

I'll re-run testing now, just to be sure.

Hi @eme64 , On targets with AVX512 features, compressed disp8 encoding is not an optional feature, an instruction with a memory operand has a displacement which if is a multiple of scale (N determined using vector length, lane size embedded broadcast flag etc) then EVEX encoding always records compressed displacement i.e. effective displ = displacement / N.

I agree with the suggestion of adding test points in the assembler test tool in a separate follow-up patch as it's an activity on its own, here is the JBS tracker for it https://bugs.openjdk.org/browse/JDK-8357567

@jatin-bhateja
Copy link
Member Author

@sviswa7 Thanks for the explanations! Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors?

I'll re-run testing now, just to be sure.

Hi @eme64 , Please let know if your tests are clean and its good to land this

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing passed, change looks reasonable!

Looking forward to more testing in the future!

@jatin-bhateja
Copy link
Member Author

Thanks @eme64 and @sviswa7

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented May 26, 2025

Going to push as commit 7002233.
Since your change was applied there have been 384 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 26, 2025
@openjdk openjdk bot closed this May 26, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 26, 2025
@openjdk
Copy link

openjdk bot commented May 26, 2025

@jatin-bhateja Pushed as commit 7002233.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants