-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it #79104
Conversation
Some upcoming intrinsics use these new types
Some upcoming intrinsics use these new types
It could be possible for getIConstantVRegValWithLookThrough to return std::nullopt.
It could be possible for getIConstantVRegValWithLookThrough to return std::nullopt.
…not specify it Summary: int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic to create the corresponding MemSDNode. And we don't set the non-temporal flag because the intrinsic did not specify it. NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.
@@ -1348,6 +1348,14 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, | |||
MachineMemOperand::MOVolatile; | |||
return true; | |||
} | |||
case Intrinsic::amdgcn_global_load_tr: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should also be handled in getAdrModeArguments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -13,9 +13,8 @@ define amdgpu_kernel void @global_load_tr_b64(ptr addrspace(1) %addr, ptr addrsp | |||
; GFX12-SDAG-W32-NEXT: s_load_b128 s[0:3], s[0:1], 0x24 | |||
; GFX12-SDAG-W32-NEXT: v_mov_b32_e32 v2, 0 | |||
; GFX12-SDAG-W32-NEXT: s_wait_kmcnt 0x0 | |||
; GFX12-SDAG-W32-NEXT: global_load_tr_b64 v[0:1], v2, s[0:1] offset:32 th:TH_LOAD_NT | |||
; GFX12-SDAG-W32-NEXT: global_load_tr_b64 v[0:1], v2, s[0:1] offset:32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…not specify it Summary: int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic to create the corresponding MemSDNode. And we don't set the non-temporal flag because the intrinsic did not specify it. NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.
int_amdgcn_global_load_tr did not specify non-temporal load transpose, thus we should
not genetrate the non-temporal hint for the load. We need to implement getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal flag because
the intrinsic did not specify it.
NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.