[NVPTX] Improve lowering of v4i8 #67866

Artem-B · 2023-09-29T23:43:45Z

Make it a legal type and plumb through lowering of relevant instructions.

ThomasRaoux

LGTM

ThomasRaoux

Actually trying out the patch in triton it causes some invalid ptx to get emitted (see comment)

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

Verified that NVPTX tests pass with ptxas being able to compiler PTX produced by llc tests.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

To make things work consisstently for v4i8, we need to implement other vector ops.

github-actions · 2023-10-05T20:43:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Removed unused code.

Artem-B · 2023-10-05T22:46:52Z

I still need to test it on live code, though we do not have much code that would end up using v4i8.

The generated PTX checked in llvm/test/CodeGen/NVPTX/i8x4-instructions.ll could use extra scrutiny.

ThomasRaoux · 2023-10-06T01:39:09Z

I ran the patch on our triton kernels and I don't see any functional problems left.

ThomasRaoux

Looks like it required quite a lot of cases to be handled :(
Thanks for doing this, it solves some of the problems triton had with latest LLVM. Changes look good to me.

llvm/test/CodeGen/NVPTX/i8x4-instructions.ll

Artem-B · 2023-10-06T17:42:48Z

I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison.

Artem-B · 2023-10-06T18:47:49Z

I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison.

Yup, there is a problem:

Successfully custom legalized node
 ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>
     with:      t17: v4i8 = bitcast Constant:i32<-128>

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Artem-B · 2023-10-06T19:25:23Z

I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison.

Yup, there is a problem:
Successfully custom legalized node
 ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>
     with:      t17: v4i8 = bitcast Constant:i32<-128>

Resolved by 9821e90

Artem-B · 2023-10-06T23:07:36Z

Found another issue. We merge four independent byte loads with align 1 into a 32-bit load, which fails at runtime on misaligned pointers.

%t0 = type { [17 x i8] }

@shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1

define <4 x i8> @in_v4i8(<4 x i8> %x, <4 x i8> %y) nounwind {
  %v = load <4 x i8>, ptr getelementptr inbounds (i8, ptr addrspacecast (ptr addrspace(3) @shared_storage to ptr), i64 9), align 1
  ret <4 x i8> %v
}

        mov.u64         %rd1, shared_storage;
        cvta.shared.u64         %rd2, %rd1;
        ld.u32  %r1, [%rd2+9];
        st.param.b32    [func_retval0+0], %r1;
        ret;

Artem-B · 2023-10-09T19:21:58Z

clang-format failure on GitHub is weird -- it just silently exits with an error.
I ran the same command locally and fixed one place it was not happy about.

The buildkite failure somewhere in RISC-V appears to be unrelated.

abadams · 2023-10-15T20:14:38Z

I believe this may be causing failures for u/srem. See #69124

Edit: Also causing failures when you sign-extend the result of a <4 x i8> comparison.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

[NVPTX] Improve lowering of v4i8

4771c97

Make it a legal type and plumb through lowering of relevant instructions.

Artem-B requested a review from ThomasRaoux September 29, 2023 23:43

Artem-B mentioned this pull request Sep 29, 2023

[NVPTX] Preserve v16i8 vector loads when legalizing #67322

Closed

ThomasRaoux approved these changes Oct 2, 2023

View reviewed changes

ThomasRaoux requested changes Oct 2, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

More work on fleshing out extractelt/build_vector for v4i8

bda4bd3

Verified that NVPTX tests pass with ptxas being able to compiler PTX produced by llc tests.

Artem-B commented Oct 3, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

ThomasRaoux reviewed Oct 3, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Show resolved Hide resolved

Down the rabbit hole we go.

e55bb97

To make things work consisstently for v4i8, we need to implement other vector ops.

Artem-B added 3 commits October 5, 2023 14:22

Added vector_shuffle lowering to PRMT.

655c6d5

Address clang-format complaints.

f915e5b

Use .lo/ls/hi/hs suffixes for unsigned setp instructions.

ef3d5de

Removed unused code.

Artem-B requested a review from ThomasRaoux October 5, 2023 22:44

Merge branch 'llvm:main' into v4i8reg

6a183b8

ThomasRaoux approved these changes Oct 6, 2023

View reviewed changes

llvm/test/CodeGen/NVPTX/i8x4-instructions.ll Show resolved Hide resolved

Artem-B requested a review from d0k October 6, 2023 17:43

Artem-B commented Oct 6, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Show resolved Hide resolved

Fixed calculation of constant v4i8 values.

9821e90

d0k approved these changes Oct 6, 2023

View reviewed changes

Artem-B and others added 2 commits October 6, 2023 14:17

Updated a test.

3879bdb

Merge branch 'llvm:main' into v4i8reg

8b81e17

Artem-B and others added 2 commits October 6, 2023 16:21

Fixed unaligned load/store of v4i8

899ab5a

Merge branch 'llvm:main' into v4i8reg

695284b

clang-format

7494e8c

Artem-B merged commit cbafb6f into llvm:main Oct 9, 2023
2 of 3 checks passed

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

justinfargnoli reviewed Jan 25, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Show resolved Hide resolved

justinfargnoli mentioned this pull request Jan 30, 2024

[NVPTX] Disable incorrect peephole optimizations #79920

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVPTX] Improve lowering of v4i8 #67866

[NVPTX] Improve lowering of v4i8 #67866

Artem-B commented Sep 29, 2023

ThomasRaoux left a comment

ThomasRaoux left a comment

github-actions bot commented Oct 5, 2023 •

edited

Loading

Artem-B commented Oct 5, 2023

ThomasRaoux commented Oct 6, 2023

ThomasRaoux left a comment

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 9, 2023

abadams commented Oct 15, 2023 •

edited

Loading

[NVPTX] Improve lowering of v4i8 #67866

[NVPTX] Improve lowering of v4i8 #67866

Conversation

Artem-B commented Sep 29, 2023

ThomasRaoux left a comment

Choose a reason for hiding this comment

ThomasRaoux left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 5, 2023 • edited Loading

Artem-B commented Oct 5, 2023

ThomasRaoux commented Oct 6, 2023

ThomasRaoux left a comment

Choose a reason for hiding this comment

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 6, 2023

Artem-B commented Oct 9, 2023

abadams commented Oct 15, 2023 • edited Loading

github-actions bot commented Oct 5, 2023 •

edited

Loading

abadams commented Oct 15, 2023 •

edited

Loading