Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Fix 64 bits rotations with large shift values #89399

Merged
merged 1 commit into from
May 1, 2024

Conversation

npmiller
Copy link
Contributor

ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size.

This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.

cc @Artem-B

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

declare i64 @llvm.fshr.i64(i64, i64, i64)

; SM35: rotl64
define i64 @rotl64(i64 %a, i64 %n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should probably be converted to use llvm/utils/update_llc_test_checks.py
We do care about the arguments and the exact instruction sequences here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^^ we still want to improve the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right! Updated with the script now, it was confusing for a little bit because it doesn't understand -march so I had to swap to --mtriple.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Show resolved Hide resolved
@npmiller npmiller force-pushed the fix-rot-shifts branch 2 times, most recently from 38eca33 to 138f196 Compare April 29, 2024 15:55
@npmiller
Copy link
Contributor Author

@Artem-B I don't have commit permissions, would you mind landing this? I'll try to follow up with the suggestions when I have time.

ROTL and ROTR can take a shift amount larger than the element size, in
which case the effective shift amount should be the shift amount modulo
the element size.

This patch adds the modulo step when the shift amount isn't known at
compile time. Without it the existing implementation would end up
shifting beyond the type size and give incorrect results.
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +78 to +86
; SM35-NEXT: ld.param.u64 %rd1, [rotate64_param_0];
; SM35-NEXT: {
; SM35-NEXT: .reg .b32 %dummy;
; SM35-NEXT: mov.b64 {%dummy,%r1}, %rd1;
; SM35-NEXT: }
; SM35-NEXT: {
; SM35-NEXT: .reg .b32 %dummy;
; SM35-NEXT: mov.b64 {%r2,%dummy}, %rd1;
; SM35-NEXT: }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a minor optimization opportunity for the future -- this could be done as mov.b64 {%r2, %r1}, %rd1.

@Artem-B Artem-B merged commit 7396ab1 into llvm:main May 1, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants