Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimizations in conditionally selected constants #53006

Closed
uncleasm opened this issue Jan 5, 2022 · 4 comments
Closed

Missed optimizations in conditionally selected constants #53006

uncleasm opened this issue Jan 5, 2022 · 4 comments

Comments

@uncleasm
Copy link

uncleasm commented Jan 5, 2022

Compared to gcc, clang produces slightly more verbose code for a sequence

int test1(int a) { return a ? -1 : 1; }
        xor     eax, eax
        test    edi, edi
        sete    al
        add     eax, eax
        add     eax, -1
        ret

AFAIK, the optimal sequence is

neg edi
sbb eax, eax
or eax, 1

which could be encoded from

int test0(int a) { return a ? -1 : 0; }
        neg     edi
        sbb     eax, eax
        ret

int test1b(int a) { return test0(a) | 1; }

As a side note, gcc can produce the expected output from test1b but not from test1.

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 6, 2022

@rotateright
Copy link
Contributor

We have combining/lowering to create the neg+sbb already. It could be enhanced to match a new pattern and tack on the 'or' as the final op.

@rotateright
Copy link
Contributor

This won't do anything for this exact example, but here's a proposal to improve the existing x86 lowering for select via SBB:
https://reviews.llvm.org/D116765

rotateright added a commit that referenced this issue Jan 7, 2022
select (X != 0), -1, Y --> 0 - X; or (sbb), Y
select (X != 0), Y, -1 --> X - 1; or (sbb), Y

We already had these x86 carry-flag transforms, but one was over-specified to
handle a "0" select arm only. That's just a special-case of the more general
pattern (the 'or' will be deleted if Y is zero).

This is part of solving #53006, but it misses that example because some other
combine has already converted that exact pattern into math ops.

Differential Revision: https://reviews.llvm.org/D116765
@rotateright
Copy link
Contributor

rotateright commented Jan 9, 2022

We should be using the sbb hack on this example and more often in general now. Whether that translates to better real-world performance is an open question though.

There's a false dependency hazard with sbb that might exist on some (intel) uarch, and it might be better to use cmov as seen in the example from #53071.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants