Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code generation for 256-bit integer addition containing constant bits in arm64 #60571

Closed
aqjune opened this issue Feb 6, 2023 · 5 comments

Comments

@aqjune
Copy link
Contributor

aqjune commented Feb 6, 2023

Hello all,

It seems llc is producing suboptimal code for 256-bit int add when target is ARM64.

https://godbolt.org/z/MfEMGKE8n
The command I used is llc --mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 -O3 (which is also used in the Godbolt instance).

The above code evaluates [x3;x2;x1;x0] + [y3;y2;y1;0x1] and stores the results into the memory.

The generated code is

        adds    x8, x1, x4
        adcs    x9, x2, x5
        adc     x10, x3, x6
        adds    x11, x0, #1
        adcs    x8, x8, xzr
        stp     x11, x8, [x7]
        adcs    x8, x9, xzr
        cinc    x9, x10, hs
        stp     x8, x9, [x7, #16]
        ret

.. which doesn't quite seem optimal.
I think it can be further reduced to something like this (please ignore register names here, they are arbitrary):

adds x15, x0, #1
adcs x16, x1, x2
adcs x17, x3, x4
adc x18, x5, x6
stp x15, x16, [x7]
stp x17, x18, [x7, #16]
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 6, 2023

@llvm/issue-subscribers-backend-aarch64

@aqjune
Copy link
Contributor Author

aqjune commented Feb 6, 2023

If the low 64 bits of y are not set to 00..01, llc 's output looks okay: https://godbolt.org/z/rsfoWzv4j

@aqjune
Copy link
Contributor Author

aqjune commented Feb 7, 2023

I can see that IR Dump After AArch64 Instruction Selection (aarch64-isel) lowers to the sequence.

@aqjune
Copy link
Contributor Author

aqjune commented Feb 7, 2023

It is DAGCombine that does this sequence of transformations:

x + (((1  y1<<64) | y2<<128) | y3<<192)
=>
x + (((y1<<64 | 1) | y2<<128) | y3<<192)
=>
x + (((y1<<64 | y2<<128) | 1 ) | y3<<192)
=>
x + (((y1<<64 | y2<<128) | y3<<192) | 1)
=>
(x + ((y1<<64 | y2<<128) | y3<<192)) + 1

at DAGCombiner.cpp, line 2516:

    // Reassociate (add (or x, c), y) -> (add add(x, y), c)) if (or x, c) is
    // equivalent to (add x, c).
    // Reassociate (add (xor x, c), y) -> (add add(x, y), c)) if (xor x, c) is
    // equivalent to (add x, c).
    auto ReassociateAddOr = [&](SDValue N0, SDValue N1) {
      if (isADDLike(N0, DAG) && N0.hasOneUse() &&
          isConstantOrConstantVector(N0.getOperand(1), /* NoOpaque */ true)) {
        return DAG.getNode(ISD::ADD, DL, VT,
                           DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),
                           N0.getOperand(1));
      }
      return SDValue();
    };
    if (SDValue Add = ReassociateAddOr(N0, N1))
      return Add;
    if (SDValue Add = ReassociateAddOr(N1, N0))
      return Add;

I think this transformation isn't beneficial if the add is not a legal operation in the architecture.

@aqjune
Copy link
Contributor Author

aqjune commented Feb 15, 2023

Made a patch here: https://reviews.llvm.org/D144116

aqjune added a commit that referenced this issue Mar 8, 2023
…t if benefit is unclear

This patch resolves suboptimal code generation reported by #60571 .

DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144116
@aqjune aqjune closed this as completed Mar 8, 2023
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Mar 17, 2023
…t if benefit is unclear

This patch resolves suboptimal code generation reported by llvm/llvm-project#60571 .

DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants