Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it #79104

Merged
merged 17 commits into from
Jan 23, 2024

Conversation

changpeng
Copy link
Contributor

int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should
not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.

  Some upcoming intrinsics use these new types
  Some upcoming intrinsics use these new types
  It could be possible for getIConstantVRegValWithLookThrough to
return std::nullopt.
  It could be possible for getIConstantVRegValWithLookThrough to
return std::nullopt.
…not specify it

Summary:
  int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should
not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.
@@ -1348,6 +1348,14 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
MachineMemOperand::MOVolatile;
return true;
}
case Intrinsic::amdgcn_global_load_tr: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should also be handled in getAdrModeArguments below.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -13,9 +13,8 @@ define amdgpu_kernel void @global_load_tr_b64(ptr addrspace(1) %addr, ptr addrsp
; GFX12-SDAG-W32-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
; GFX12-SDAG-W32-NEXT: v_mov_b32_e32 v2, 0
; GFX12-SDAG-W32-NEXT: s_wait_kmcnt 0x0
; GFX12-SDAG-W32-NEXT: global_load_tr_b64 v[0:1], v2, s[0:1] offset:32 th:TH_LOAD_NT
; GFX12-SDAG-W32-NEXT: global_load_tr_b64 v[0:1], v2, s[0:1] offset:32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

…not specify it

Summary:
  int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should
not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.
@changpeng changpeng merged commit 32073b8 into llvm:main Jan 23, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants