Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 #72672

Merged
merged 1 commit into from
Nov 17, 2023

Conversation

peterbell10
Copy link
Contributor

@peterbell10 peterbell10 commented Nov 17, 2023

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given load<8 x half> followed by fpext to <8 x float> we get

ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];

Which loads float16 values into float registers without any conversion and the result is simply garbage.

This PR brings v8f16 and v8bf16 into line with the other vector types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar

Copy link

github-actions bot commented Nov 17, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen.
Given `load<8 x half>` followed by `fpext to <8 x float>` we get

```
ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];
```

Which loads float16 values into float registers without any
conversion and the result is simply garbage.

This PR brings `v8f16` and `v8bf16` into line with the other vector
types and expanding it to load + cvt.
@jlebar
Copy link
Member

jlebar commented Nov 17, 2023

Oh wow that's a bad bug.

@ThomasRaoux ThomasRaoux merged commit 4263b2e into llvm:main Nov 17, 2023
2 of 3 checks passed
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given
`load<8 x half>` followed by `fpext to <8 x float>` we get

```
ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];
```

Which loads float16 values into float registers without any conversion
and the result is simply garbage.

This PR brings `v8f16` and `v8bf16` into line with the other vector
types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
In triton-lang/triton#2483 I've encountered a bug in the NVPTX codegen. Given
`load<8 x half>` followed by `fpext to <8 x float>` we get

```
ld.shared.v4.b16 	{%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 	{%f5, %f6, %f7, %f8}, [%r15];
```

Which loads float16 values into float registers without any conversion
and the result is simply garbage.

This PR brings `v8f16` and `v8bf16` into line with the other vector
types by expanding it to load + cvt.

cc @manman-ren @Artem-B @jlebar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants