Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPUAtomicOptimizer: re-enable uniform path for fadd/fsub with result #97554

Closed
jayfoad opened this issue Jul 3, 2024 · 1 comment · Fixed by #97604
Closed

AMDGPUAtomicOptimizer: re-enable uniform path for fadd/fsub with result #97554

jayfoad opened this issue Jul 3, 2024 · 1 comment · Fixed by #97604

Comments

@jayfoad
Copy link
Contributor

jayfoad commented Jul 3, 2024

See #96479: the uniform path for fadd/fsub was disabled because it can return the wrong value in the first active lane of the result in the presence of NaNs and signed zeros.

We should find a way to fix and re-enable it, e.g. by overwriting the first active lane of %y * +0.0 with -0.0. And maybe this can be optimized more if certain values are known not to be zero or nan.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 3, 2024

@llvm/issue-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

See #96479: the uniform path for fadd/fsub was disabled because it can return the wrong value in the first active lane of the result in the presence of NaNs and signed zeros.

We should find a way to fix and re-enable it, e.g. by overwriting the first active lane of %y * +0.0 with -0.0. And maybe this can be optimized more if certain values are known not to be zero or nan.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this issue Jul 14, 2024
…lt (llvm#97604)

Fix various problems to do with the first active lane of the result of
optimized fp atomics, as explained in the comment.

Fixes llvm#97554
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this issue Jul 16, 2024
…lt (llvm#97604)

Fix various problems to do with the first active lane of the result of
optimized fp atomics, as explained in the comment.

Fixes llvm#97554
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants