Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce bswap to smaller type #53867

Closed
chfast opened this issue Feb 15, 2022 · 5 comments
Closed

Reduce bswap to smaller type #53867

chfast opened this issue Feb 15, 2022 · 5 comments
Assignees

Comments

@chfast
Copy link
Contributor

chfast commented Feb 15, 2022

In some cases where shift is followed by bswap the bswap type may be reduced.
E.g.

  %2 = zext i16 %0 to i64
  %3 = shl nuw i64 %2, 48
  %4 = tail call i64 @llvm.bswap.i64(i64 %3)
  %5 = trunc i64 %4 to i16

should just be

  %2 = tail call i16 @llvm.bswap.i16(i16 %0)

https://godbolt.org/z/fq9e591EM
https://alive2.llvm.org/ce/z/rskDKL

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 16, 2022

CC @rotateright @LebedevRI

https://alive2.llvm.org/ce/z/Qif2dE

Should we limit this to just being fed from a trunc or should this be a SimplifyDemandedBits driven fold?

----------------------------------------
define i64 @src(i16 %0) {
%1:
  %2 = zext i16 %0 to i64
  %3 = shl nuw i64 %2, 48
  %4 = bswap i64 %3
  %5 = and i64 %4, 65535
  ret i64 %5
}
=>
define i64 @tgt(i16 %0) {
%1:
  %2 = bswap i16 %0
  %3 = zext i16 %2 to i64
  ret i64 %3
}
Transformation seems to be correct!

@LebedevRI
Copy link
Member

As a rule-of-thumb, relying on a trunc is always fragile, i think demandedbits may be better here.

@RKSimon RKSimon self-assigned this Feb 18, 2022
RKSimon added a commit that referenced this issue Feb 19, 2022
Test based off issues #51391 and #53867 - we're going to end up needing InstCombine + DAG variants of this fold as DAG can create BSWAP nodes as part of load folding
@RKSimon
Copy link
Collaborator

RKSimon commented Feb 25, 2022

Optimized codegen is handled in DAG by 370ebc9 but we should be trying to simplify IR in InstCombine as well

@rotateright
Copy link
Contributor

Narrowing bswap in IR:
https://reviews.llvm.org/D122166

See also canonicalizing bswap+shift in IR:
https://reviews.llvm.org/D122010

rotateright added a commit that referenced this issue Mar 22, 2022
This is the IR counterpart to 370ebc9
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
@rotateright
Copy link
Contributor

We get the motivating C source tests in IR now (and codegen), so we can close this report.
It is possible to go further (using known bits of the shift amount for example), so if that matters, please open a new issue.

rotateright added a commit that referenced this issue Mar 23, 2022
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.

Original commit message:

This is the IR counterpart to 370ebc9
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants