[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

peterbell10 · 2023-11-17T16:09:43Z

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given load<8 x half> followed by fpext to <8 x float> we get

ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];

Which loads float16 values into float registers without any conversion and the result is simply garbage.

This PR brings v8f16 and v8bf16 into line with the other vector types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar

github-actions · 2023-11-17T16:13:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given `load<8 x half>` followed by `fpext to <8 x float>` we get ``` ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8]; ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15]; ``` Which loads float16 values into float registers without any conversion and the result is simply garbage. This PR brings `v8f16` and `v8bf16` into line with the other vector types and expanding it to load + cvt.

jlebar · 2023-11-17T16:50:36Z

Oh wow that's a bad bug.

@manman-ren

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given `load<8 x half>` followed by `fpext to <8 x float>` we get ``` ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8]; ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15]; ``` Which loads float16 values into float registers without any conversion and the result is simply garbage. This PR brings `v8f16` and `v8bf16` into line with the other vector types by expanding it to load + cvt. cc @manman-ren @Artem-B @jlebar

@manman-ren

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given `load<8 x half>` followed by `fpext to <8 x float>` we get ``` ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8]; ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15]; ``` Which loads float16 values into float registers without any conversion and the result is simply garbage. This PR brings `v8f16` and `v8bf16` into line with the other vector types by expanding it to load + cvt. cc @manman-ren @Artem-B @jlebar

peterbell10 force-pushed the nvptx-illegal-extload branch from d1de0e2 to f9bc0e6 Compare November 17, 2023 16:30

jlebar approved these changes Nov 17, 2023

View reviewed changes

ThomasRaoux merged commit 4263b2e into llvm:main Nov 17, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

peterbell10 commented Nov 17, 2023 •

edited

github-actions bot commented Nov 17, 2023 •

edited

jlebar commented Nov 17, 2023

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

Conversation

peterbell10 commented Nov 17, 2023 • edited

github-actions bot commented Nov 17, 2023 • edited

jlebar commented Nov 17, 2023

peterbell10 commented Nov 17, 2023 •

edited

github-actions bot commented Nov 17, 2023 •

edited