-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[GlobalIsel] Combine ADDO #82927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GlobalIsel] Combine ADDO #82927
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-globalisel Author: Thorsten Schütt (tschuett) ChangesPerform the requested arithmetic and produce a carry output in addition to the normal result. Clang has them as builtins (__builtin_add_overflow_p). The middle end has intrinsics for them (sadd_with_overflow). AArch64: ADDS Add and set flags On Neoverse V2, they run at half the throughput of basic arithmetic and have a limited set of pipelines. Patch is 145.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82927.diff 13 Files Affected:
|
For @uaddo.select.i64 known bits seems to fail. |
For AMDGPU, there seems be a lot of noise and some size improvements. |
For AArch64, when the condition register of scalar select is def'd by an overflow op, then we could select less instructions. |
|
||
/// Represents overflowing add operations. | ||
/// G_UADDO, G_SADDO | ||
class GAddCarryOut : public GBinOpCarryOut { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an odd interference with the more general GAddSubCarryOut
. Do you really need this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common pattern is to assert that only the expected opcode is in MI
. I use cast<GAddCarryOut>
. I don't want unnoticed sub to come in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GAddSubCarryOut
has a isSub()
method for that purpose.
} | ||
|
||
// We want do fold the [u|s]addo. | ||
if (!MRI.hasOneNonDBGUse(Carry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not obvious why multiple uses prevent folding / make it unprofitable. Could you clarify the comment?
Same for Dst above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will do. If we have more than one use, then we are generating new instructions and keep the old ones.
(match (wip_match_opcode G_SADDO, G_UADDO):$root, | ||
[{ return Helper.matchAddOverflow(*${root}, ${matchinfo}); }]), | ||
(apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the current direction, but if we're going to tablegen as much as possible, then we would need separate rules for each of the transformations (constant folding / swapping constant to RHS / replacing with G_ADD etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early exit after RHS, then LHS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still try the known bits optimizations, even when both are std::nullopt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing this line with:
GAddSubCarryOut *Add = cast<GAddSubCarryOut>(&MI);
seems dangerous and defeats the purpose of the assert in the cast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be moved to pure tablegen as a separate pattern
I removed bit width and another else-after-return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think it would be better to split these different combines into separate tablegen defined roots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try some non-s1 typed cases
Perform the requested arithmetic and produce a carry output in addition to the normal result. Clang has them as builtins (__builtin_add_overflow_p). The middle end has intrinsics for them (sadd_with_overflow). AArch64: ADDS Add and set flags On Neoverse V2, they run at half the throughput of basic arithmetic and have a limited set of pipelines.
LLT DstTy = MRI.getType(Dst); | ||
LLT CarryTy = MRI.getType(Carry); | ||
|
||
// We want do fold the [u|s]addo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "want to", but the comment seems unrelated to the code. Why would multiple uses of the top level instruction prevent combining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the result Dst
has multiple uses, then we cannot replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can and we should. The combine will still be correct and beneficial. You can add tests for the multiple-use case.
As a general rule, if you are matching a pattern, you only need one-use checks for the inner nodes in the pattern, not the outer node. The reason for the checks is to ensure that when you remove the outer node, the inner nodes also get removed.
return true; | ||
} | ||
|
||
// We want do fold the [u|s]addo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
B.buildConstant(Dst, Result); | ||
B.buildConstant(Carry, Overflow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the vector-typed case don't you need to build new splat constants here? The patch is missing vector-typed tests for all these folds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 does not support vectorized overflow ops. buildConstant
builds under the hood scalars or build vectors. Support for G_SPLAT_VECTOR is still missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 does not support vectorized overflow ops.
You should still add the tests. Your combine-overflow.mir test runs pre-legalization so any MIR should be allowed there.
buildConstant builds under the hood scalars or build vectors.
Ah! I didn't know that, thanks.
return true; | ||
} | ||
|
||
// Fold (addo x, 0) -> x, no borrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "carry" not "borrow"
((IsSigned && AddLHS->getFlag(MachineInstr::MIFlag::NoSWrap)) || | ||
(!IsSigned && AddLHS->getFlag(MachineInstr::MIFlag::NoUWrap)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((IsSigned && AddLHS->getFlag(MachineInstr::MIFlag::NoSWrap)) || | |
(!IsSigned && AddLHS->getFlag(MachineInstr::MIFlag::NoUWrap)))) { | |
AddLHS->getFlag(IsSigned ? MachineInstr::MIFlag::NoSWrap : MachineInstr::MIFlag::NoUWrap)) { |
bool Overflow; | ||
APInt NewC = IsSigned ? MaybeAddRHS->sadd_ov(*MaybeRHS, Overflow) | ||
: MaybeAddRHS->uadd_ov(*MaybeRHS, Overflow); | ||
if (!Overflow && isConstantLegalOrBeforeLegalizer(DstTy)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to check isConstantLegalOrBeforeLegalizer(DstTy)
because it's the same type as MaybeRHS
and MaybeAddRHS
which we already know are constants.
MatchInfo = [=](MachineIRBuilder &B) { | ||
B.buildSAddo(Dst, Carry, RHS, LHS); | ||
}; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: personally I think early return hurts symmetry here. I would prefer if/else followed by a single "return".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anti-symmetry is a result of another else-after-return violation.
Ping @tschuett please address post-commit review. At least:
|
Perform the requested arithmetic and produce a carry output in addition to the normal result.
Clang has them as builtins (__builtin_add_overflow_p). The middle end has intrinsics for them (sadd_with_overflow).
AArch64: ADDS Add and set flags
On Neoverse V2, they run at half the throughput of basic arithmetic and have a limited set of pipelines.