Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization of uaddo(x, x) #57330

Open
chfast opened this issue Aug 24, 2022 · 6 comments
Open

Missed optimization of uaddo(x, x) #57330

chfast opened this issue Aug 24, 2022 · 6 comments

Comments

@chfast
Copy link
Contributor

chfast commented Aug 24, 2022

_Bool src(unsigned x, unsigned* acc) {
    unsigned s = x + x;
    *acc = s;
    return (s < x);
}

_Bool tgt(unsigned x, unsigned* acc) {
    return __builtin_uadd_overflow(x, x, acc);
}

https://godbolt.org/z/PsGoGGWj8

The uaddo is not created for the case of x + x. This would save us single cmp instruction. This may be because the x + x is optimized to x << 1.

Discovered while analyzing #57316.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 24, 2022

@llvm/issue-subscribers-backend-x86

@RKSimon
Copy link
Collaborator

RKSimon commented Aug 24, 2022

@rotateright Do you think we'd be better off handling this in InstCombine?

https://alive2.llvm.org/ce/z/ZHhUPZ suggests we don't match the general pattern either

@chfast
Copy link
Contributor Author

chfast commented Aug 24, 2022

This pattern is matched on IR level but only in CodeGenPrepare (so after opt -O3). I have no idea why. https://alive2.llvm.org/ce/z/3z32e3

@rotateright
Copy link
Contributor

The choice to not canonicalize to the intrinsics in IR (move the transform to CGP) was made with:
https://reviews.llvm.org/D8889

Even in CGP, we've made several adjustments since then to get/avoid different asm for various targets.

It's possible that we've progressed enough in analysis that the decision can be revisited, but it requires looking at the output for several different patterns on multiple targets to make sure nothing regresses.

@rotateright
Copy link
Contributor

Filed a beginner bug -- #57338 -- to reduce the icmp:
https://alive2.llvm.org/ce/z/sTrumT

@rotateright
Copy link
Contributor

This example (even if it seems unlikely in practice) also demonstrates the difficulty of reconciling IR and codegen on patterns like this:
https://godbolt.org/z/TxKqhbecE
After we decouple the math from the overflow calc, it could be hard to bring them back together (for example, they moved to different basic blocks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants