Reduce bswap to smaller type #53867

chfast · 2022-02-15T19:14:01Z

In some cases where shift is followed by bswap the bswap type may be reduced.
E.g.

  %2 = zext i16 %0 to i64
  %3 = shl nuw i64 %2, 48
  %4 = tail call i64 @llvm.bswap.i64(i64 %3)
  %5 = trunc i64 %4 to i16

should just be

  %2 = tail call i16 @llvm.bswap.i16(i16 %0)

https://godbolt.org/z/fq9e591EM
https://alive2.llvm.org/ce/z/rskDKL

RKSimon · 2022-02-16T20:46:54Z

CC @rotateright @LebedevRI

https://alive2.llvm.org/ce/z/Qif2dE

Should we limit this to just being fed from a trunc or should this be a SimplifyDemandedBits driven fold?

----------------------------------------
define i64 @src(i16 %0) {
%1:
  %2 = zext i16 %0 to i64
  %3 = shl nuw i64 %2, 48
  %4 = bswap i64 %3
  %5 = and i64 %4, 65535
  ret i64 %5
}
=>
define i64 @tgt(i16 %0) {
%1:
  %2 = bswap i16 %0
  %3 = zext i16 %2 to i64
  ret i64 %3
}
Transformation seems to be correct!

LebedevRI · 2022-02-16T20:50:12Z

As a rule-of-thumb, relying on a trunc is always fragile, i think demandedbits may be better here.

Test based off issues #51391 and #53867 - we're going to end up needing InstCombine + DAG variants of this fold as DAG can create BSWAP nodes as part of load folding

RKSimon · 2022-02-25T17:23:02Z

Optimized codegen is handled in DAG by 370ebc9 but we should be trying to simplify IR in InstCombine as well

rotateright · 2022-03-21T19:09:13Z

Narrowing bswap in IR:
https://reviews.llvm.org/D122166

See also canonicalizing bswap+shift in IR:
https://reviews.llvm.org/D122010

This is the IR counterpart to 370ebc9 which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166

rotateright · 2022-03-22T15:19:33Z

We get the motivating C source tests in IR now (and codegen), so we can close this report.
It is possible to go further (using known bits of the shift amount for example), so if that matters, please open a new issue.

The first attempt at this missed a validity check. This version includes a test of the narrow source type for modulo-16-bits. Original commit message: This is the IR counterpart to 370ebc9 which provided a bswap narrowing fix for issue #53867. Here we can be more general (although I'm not sure yet what would happen for illegal types in codegen - too rare to worry about?): https://alive2.llvm.org/ce/z/3-CPfo This will be more effective if we have moved the shift after the bswap as proposed in D122010, but it is independent of that patch. Differential Revision: https://reviews.llvm.org/D122166

chfast added the llvm:optimizations label Feb 15, 2022

RKSimon mentioned this issue Feb 18, 2022

Missing combine of shl+bswap to rol #51391

Closed

RKSimon self-assigned this Feb 18, 2022

RKSimon added a commit that referenced this issue Feb 19, 2022

[X86] Add bswap(shl()) test

ebeb191

Test based off issues #51391 and #53867 - we're going to end up needing InstCombine + DAG variants of this fold as DAG can create BSWAP nodes as part of load folding

RKSimon assigned rotateright Mar 18, 2022

rotateright closed this as completed Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce bswap to smaller type #53867

Reduce bswap to smaller type #53867

chfast commented Feb 15, 2022

RKSimon commented Feb 16, 2022

LebedevRI commented Feb 16, 2022

RKSimon commented Feb 25, 2022

rotateright commented Mar 21, 2022

rotateright commented Mar 22, 2022

Reduce bswap to smaller type #53867

Reduce bswap to smaller type #53867

Comments

chfast commented Feb 15, 2022

RKSimon commented Feb 16, 2022

LebedevRI commented Feb 16, 2022

RKSimon commented Feb 25, 2022

rotateright commented Mar 21, 2022

rotateright commented Mar 22, 2022