Skip to content

Conversation

Artem-B
Copy link
Member

@Artem-B Artem-B commented Oct 2, 2025

#153478 made v2i32 legal on newer GPUs, but we can not lower all operations yet. Expand the trunc operation until we implement efficient lowering.

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Artem Belevich (Artem-B)

Changes

#153478 made v2i32 legal on newer GPUs, but we can not lower all operations yet. Expand the trunc operation until we implement efficient lowering.


Full diff: https://github.com/llvm/llvm-project/pull/161715.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+3)
  • (modified) llvm/test/CodeGen/NVPTX/f32x2-convert-i32x2.ll (+37)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 3ac7c2874408b..48e539037dcc7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -638,6 +638,9 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   // No support for these operations with v2f32/v2i32
   setOperationAction(ISD::INSERT_VECTOR_ELT, {MVT::v2f32, MVT::v2i32}, Expand);
   setOperationAction(ISD::VECTOR_SHUFFLE, {MVT::v2f32, MVT::v2i32}, Expand);
+
+  setOperationAction(ISD::TRUNCATE, MVT::v2i16, Expand);
+
   // Need custom lowering in case the index is dynamic.
   if (STI.hasF32x2Instructions())
     setOperationAction(ISD::EXTRACT_VECTOR_ELT, {MVT::v2f32, MVT::v2i32},
diff --git a/llvm/test/CodeGen/NVPTX/f32x2-convert-i32x2.ll b/llvm/test/CodeGen/NVPTX/f32x2-convert-i32x2.ll
index 18fb87935d17d..4bfae5c437b85 100644
--- a/llvm/test/CodeGen/NVPTX/f32x2-convert-i32x2.ll
+++ b/llvm/test/CodeGen/NVPTX/f32x2-convert-i32x2.ll
@@ -115,5 +115,42 @@ define ptx_kernel void @inlineasm(ptr %p) {
   store <2 x float> %mul, ptr %p, align 8
   ret void
 }
+
+define ptx_kernel void @trunc_v2i32(<2 x i32> %0) {
+; CHECK-SM90A-LABEL: trunc_v2i32(
+; CHECK-SM90A:       {
+; CHECK-SM90A-NEXT:    .reg .b32 %r<7>;
+; CHECK-SM90A-NEXT:    .reg .b64 %rd<2>;
+; CHECK-SM90A-EMPTY:
+; CHECK-SM90A-NEXT:  // %bb.0:
+; CHECK-SM90A-NEXT:    ld.param.v2.b32 {%r1, %r2}, [trunc_v2i32_param_0];
+; CHECK-SM90A-NEXT:    prmt.b32 %r3, %r1, %r2, 0x3340U;
+; CHECK-SM90A-NEXT:    mov.b32 %r4, 0;
+; CHECK-SM90A-NEXT:    prmt.b32 %r5, %r4, 0, 0x3340U;
+; CHECK-SM90A-NEXT:    prmt.b32 %r6, %r5, %r3, 0x5410U;
+; CHECK-SM90A-NEXT:    mov.b64 %rd1, 0;
+; CHECK-SM90A-NEXT:    st.b32 [%rd1], %r6;
+; CHECK-SM90A-NEXT:    ret;
+;
+; CHECK-SM100-LABEL: trunc_v2i32(
+; CHECK-SM100:       {
+; CHECK-SM100-NEXT:    .reg .b32 %r<7>;
+; CHECK-SM100-NEXT:    .reg .b64 %rd<3>;
+; CHECK-SM100-EMPTY:
+; CHECK-SM100-NEXT:  // %bb.0:
+; CHECK-SM100-NEXT:    ld.param.b64 %rd1, [trunc_v2i32_param_0];
+; CHECK-SM100-NEXT:    mov.b64 {%r1, %r2}, %rd1;
+; CHECK-SM100-NEXT:    mov.b32 %r3, 0;
+; CHECK-SM100-NEXT:    prmt.b32 %r4, %r3, 0, 0x3340U;
+; CHECK-SM100-NEXT:    prmt.b32 %r5, %r1, %r2, 0x3340U;
+; CHECK-SM100-NEXT:    prmt.b32 %r6, %r4, %r5, 0x5410U;
+; CHECK-SM100-NEXT:    mov.b64 %rd2, 0;
+; CHECK-SM100-NEXT:    st.b32 [%rd2], %r6;
+; CHECK-SM100-NEXT:    ret;
+  %2 = trunc <2 x i32> %0 to <2 x i8>
+  %3 = shufflevector <2 x i8> zeroinitializer, <2 x i8> %2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  store <4 x i8> %3, ptr null, align 4
+  ret void
+}
 ;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
 ; CHECK: {{.*}}

@Artem-B Artem-B changed the title [NVPTX] expand trunk v2i32->v2i16 [NVPTX] expand trunc v2i32->v2i16 Oct 2, 2025
@Artem-B Artem-B requested a review from Prince781 October 2, 2025 18:21
Copy link
Contributor

@Prince781 Prince781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to expand any_extend and zero_extend: https://godbolt.org/z/dfeeGEWqr

@Artem-B Artem-B changed the title [NVPTX] expand trunc v2i32->v2i16 [NVPTX] expand trunc/ext on v2i32 Oct 2, 2025
@Artem-B
Copy link
Member Author

Artem-B commented Oct 2, 2025

@Prince781 Done. PTAL.

Copy link
Contributor

@Prince781 Prince781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

llvm#153478 made v2i32 legal on newer GPUs, but we can not lower all operations yet.
Expand the `trunc` operation until we implement efficient lowering.
@Artem-B Artem-B merged commit a2b6602 into llvm:main Oct 2, 2025
7 of 8 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
llvm#153478 made v2i32 legal on newer GPUs, but we can not lower all
operations yet. Expand the `trunc/ext` operation until we implement
efficient lowering.
Artem-B added a commit that referenced this pull request Oct 9, 2025
…162391)

Follow-up on #153478 and #161715.

v2i32 register class exists mostly to facilitate v2f32's use of integer
registers. There are no actual instructions that can apply to v2i32
directly (except bitwise logical ops). Everything else must be done
elementwise.
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
…162391)

Follow-up on #153478 and #161715.

v2i32 register class exists mostly to facilitate v2f32's use of integer
registers. There are no actual instructions that can apply to v2i32
directly (except bitwise logical ops). Everything else must be done
elementwise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants