-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv 64-bit popcount uses inefficient constant materialization #86207
Comments
@llvm/issue-subscribers-backend-risc-v Author: Eli Friedman (efriedma-quic)
Consider:
Targeting rv64, this generates:
There are 4 constant integers involved in this computation: 0x5555555555555555, 0x3333333333333333, 0x0F0F0F0F0F0F0F0F, and 0x0101010101010101. The way we're materializing the constants is not efficient. In isolation, each of these takes 4 instructions to materialize, which I think is optimal... but the constants are related to each other. |
When materializing constants, we can't know its context and it's hard to know the connection of constants. llvm-project/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Lines 8669 to 8676 in 72c729f
But some targets may be able to materialize these constants easily, so I think this should be a custom lowering in RISCV target. |
You could write a generic pass that collects all the constants in a block and checks whether one constant can be produced using a shift+xor of another constant. Not sure how generally useful such a pass would be. |
Interesting idea, but is it computationally feasible to find bitwise relations between constants? uintN_t mask1 = ((uintN_t)-1 / 0xFF) * 0x55;
uintN_t mask2 = ((uintN_t)-1 / 0xFF) * 0x33;
uintN_t mask4 = ((uintN_t)-1 / 0xFF) * 0x0F;
uintN_t multiplier = ((uintN_t)-1 / 0xFF); And we should be not restricted to bitwise operations only to derive constants like these, and hence, if multiplications are allowed, I can use 3 multiplications instead of 6 bitwise operations you suggested to derive all necessary constants. It comes to the question of: Are these simplifications worth it, especially regarding the general optimization levels ("-O2" and "-Os")? |
For some targets like RISCV, it is costly to materialize constants used in lowering `ISD::CTPOP`/`ISD::VP_CTPOP`. We can query the materialization cost via `TargetTransformInfo::getIntImmCost` and if the cost is larger than 2, we should construct the constant via two instructions. This fixes llvm#86207.
…stly For RISCV, it is costly to materialize constants used in lowering `ISD::CTPOP`/`ISD::VP_CTPOP`. We can query the materialization cost via `RISCVMatInt::getIntMatCost` and if the cost is larger than 2, we should construct the constant via two instructions. This fixes llvm#86207.
Consider:
Targeting rv64, this generates:
There are 4 constant integers involved in this computation: 0x5555555555555555, 0x3333333333333333, 0x0F0F0F0F0F0F0F0F, and 0x0101010101010101. The way we're materializing the constants is not efficient. In isolation, each of these takes 4 instructions to materialize, which I think is optimal... but the constants are related to each other.
0x3333333333333333 == (0x0F0F0F0F0F0F0F0F ^ (0x0F0F0F0F0F0F0F0F << 2))
.0x5555555555555555 == (0x3333333333333333 ^ (0x3333333333333333 << 1))
.0x0101010101010101 == (0x0F0F0F0F0F0F0F0F & (0x0F0F0F0F0F0F0F0F >> 3))
.The text was updated successfully, but these errors were encountered: