Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv 64-bit popcount uses inefficient constant materialization #86207

Open
efriedma-quic opened this issue Mar 21, 2024 · 4 comments
Open

riscv 64-bit popcount uses inefficient constant materialization #86207

efriedma-quic opened this issue Mar 21, 2024 · 4 comments

Comments

@efriedma-quic
Copy link
Collaborator

Consider:

int a(unsigned long long x) { return __builtin_popcountll(x); }

Targeting rv64, this generates:

a:
        srli    a1, a0, 1
        lui     a2, 349525
        addiw   a2, a2, 1365
        slli    a3, a2, 32
        add     a2, a2, a3
        and     a1, a1, a2
        sub     a0, a0, a1
        lui     a1, 209715
        addiw   a1, a1, 819
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a2, a0, a1
        srli    a0, a0, 2
        and     a0, a0, a1
        add     a0, a0, a2
        srli    a1, a0, 4
        add     a0, a0, a1
        lui     a1, 61681
        addiw   a1, a1, -241
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a0, a0, a1
        lui     a1, 4112
        addiw   a1, a1, 257
        slli    a2, a1, 32
        add     a1, a1, a2
        mul     a0, a0, a1
        srli    a0, a0, 56
        ret

There are 4 constant integers involved in this computation: 0x5555555555555555, 0x3333333333333333, 0x0F0F0F0F0F0F0F0F, and 0x0101010101010101. The way we're materializing the constants is not efficient. In isolation, each of these takes 4 instructions to materialize, which I think is optimal... but the constants are related to each other. 0x3333333333333333 == (0x0F0F0F0F0F0F0F0F ^ (0x0F0F0F0F0F0F0F0F << 2)). 0x5555555555555555 == (0x3333333333333333 ^ (0x3333333333333333 << 1)). 0x0101010101010101 == (0x0F0F0F0F0F0F0F0F & (0x0F0F0F0F0F0F0F0F >> 3)).

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Eli Friedman (efriedma-quic)

Consider:
int a(unsigned long long x) { return __builtin_popcountll(x); }

Targeting rv64, this generates:

a:
        srli    a1, a0, 1
        lui     a2, 349525
        addiw   a2, a2, 1365
        slli    a3, a2, 32
        add     a2, a2, a3
        and     a1, a1, a2
        sub     a0, a0, a1
        lui     a1, 209715
        addiw   a1, a1, 819
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a2, a0, a1
        srli    a0, a0, 2
        and     a0, a0, a1
        add     a0, a0, a2
        srli    a1, a0, 4
        add     a0, a0, a1
        lui     a1, 61681
        addiw   a1, a1, -241
        slli    a2, a1, 32
        add     a1, a1, a2
        and     a0, a0, a1
        lui     a1, 4112
        addiw   a1, a1, 257
        slli    a2, a1, 32
        add     a1, a1, a2
        mul     a0, a0, a1
        srli    a0, a0, 56
        ret

There are 4 constant integers involved in this computation: 0x5555555555555555, 0x3333333333333333, 0x0F0F0F0F0F0F0F0F, and 0x0101010101010101. The way we're materializing the constants is not efficient. In isolation, each of these takes 4 instructions to materialize, which I think is optimal... but the constants are related to each other. 0x3333333333333333 == (0x0F0F0F0F0F0F0F0F ^ (0x0F0F0F0F0F0F0F0F &lt;&lt; 2)). 0x5555555555555555 == (0x3333333333333333 ^ (0x3333333333333333 &lt;&lt; 1)). 0x0101010101010101 == (0x0F0F0F0F0F0F0F0F &amp; (0x0F0F0F0F0F0F0F0F &gt;&gt; 3)).

@wangpc-pp
Copy link
Contributor

When materializing constants, we can't know its context and it's hard to know the connection of constants.
The solution may be using your formulas instead of using APInt directly here:

// This is the "best" algorithm from
// http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
SDValue Mask55 =
DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x55)), dl, VT);
SDValue Mask33 =
DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x33)), dl, VT);
SDValue Mask0F =
DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x0F)), dl, VT);

But some targets may be able to materialize these constants easily, so I think this should be a custom lowering in RISCV target.

@efriedma-quic
Copy link
Collaborator Author

You could write a generic pass that collects all the constants in a block and checks whether one constant can be produced using a shift+xor of another constant. Not sure how generally useful such a pass would be.

@Explorer09
Copy link

Interesting idea, but is it computationally feasible to find bitwise relations between constants?
I mean, for example, the constants used in bit count algorithms can be genericized to these, using division:

uintN_t mask1 = ((uintN_t)-1 / 0xFF) * 0x55;
uintN_t mask2 = ((uintN_t)-1 / 0xFF) * 0x33;
uintN_t mask4 = ((uintN_t)-1 / 0xFF) * 0x0F;
uintN_t multiplier = ((uintN_t)-1 / 0xFF);

And we should be not restricted to bitwise operations only to derive constants like these, and hence, if multiplications are allowed, I can use 3 multiplications instead of 6 bitwise operations you suggested to derive all necessary constants. It comes to the question of: Are these simplifications worth it, especially regarding the general optimization levels ("-O2" and "-Os")?

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Mar 26, 2024
For some targets like RISCV, it is costly to materialize constants
used in lowering `ISD::CTPOP`/`ISD::VP_CTPOP`.

We can query the materialization cost via `TargetTransformInfo::getIntImmCost`
and if the cost is larger than 2, we should construct the constant
via two instructions.

This fixes llvm#86207.
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Mar 28, 2024
…stly

For RISCV, it is costly to materialize constants used in lowering
`ISD::CTPOP`/`ISD::VP_CTPOP`.

We can query the materialization cost via `RISCVMatInt::getIntMatCost`
and if the cost is larger than 2, we should construct the constant
via two instructions.

This fixes llvm#86207.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment