New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal code generation for 256-bit integer addition containing constant bits in arm64 #60571
Comments
@llvm/issue-subscribers-backend-aarch64 |
If the low 64 bits of y are not set to 00..01, llc 's output looks okay: https://godbolt.org/z/rsfoWzv4j |
I can see that |
It is DAGCombine that does this sequence of transformations: x + (((1 y1<<64) | y2<<128) | y3<<192)
=>
x + (((y1<<64 | 1) | y2<<128) | y3<<192)
=>
x + (((y1<<64 | y2<<128) | 1 ) | y3<<192)
=>
x + (((y1<<64 | y2<<128) | y3<<192) | 1)
=>
(x + ((y1<<64 | y2<<128) | y3<<192)) + 1 at // Reassociate (add (or x, c), y) -> (add add(x, y), c)) if (or x, c) is
// equivalent to (add x, c).
// Reassociate (add (xor x, c), y) -> (add add(x, y), c)) if (xor x, c) is
// equivalent to (add x, c).
auto ReassociateAddOr = [&](SDValue N0, SDValue N1) {
if (isADDLike(N0, DAG) && N0.hasOneUse() &&
isConstantOrConstantVector(N0.getOperand(1), /* NoOpaque */ true)) {
return DAG.getNode(ISD::ADD, DL, VT,
DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),
N0.getOperand(1));
}
return SDValue();
};
if (SDValue Add = ReassociateAddOr(N0, N1))
return Add;
if (SDValue Add = ReassociateAddOr(N1, N0))
return Add; I think this transformation isn't beneficial if the add is not a legal operation in the architecture. |
Made a patch here: https://reviews.llvm.org/D144116 |
…t if benefit is unclear This patch resolves suboptimal code generation reported by #60571 . DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid. However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s. This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D144116
…t if benefit is unclear This patch resolves suboptimal code generation reported by llvm/llvm-project#60571 . DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid. However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s. This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D144116
Hello all,
It seems llc is producing suboptimal code for 256-bit int add when target is ARM64.
https://godbolt.org/z/MfEMGKE8n
The command I used is
llc --mtriple=arm64-unknown-unknown -mcpu=neoverse-n1 -O3
(which is also used in the Godbolt instance).The above code evaluates
[x3;x2;x1;x0] + [y3;y2;y1;0x1]
and stores the results into the memory.The generated code is
.. which doesn't quite seem optimal.
I think it can be further reduced to something like this (please ignore register names here, they are arbitrary):
The text was updated successfully, but these errors were encountered: