-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64] Combine SEXT_INREG(CSET) to CSETM. #156429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Combine SEXT_INREG(CSET) to CSETM. #156429
Conversation
Add the following patterns to performSignExtendInRegCombine: SIGN_EXTEND_INREG (CSEL 0, 1, cc), i1 --> CSEL 0, -1, cc SIGN_EXTEND_INREG (CSEL 1, 0, cc), i1 --> CSEL -1, 0, cc The combined forms can be matched to a CSETM.
} | ||
|
||
// Sign extend of CSET -> CSETM. | ||
if (Opc == AArch64ISD::CSEL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having a hasOneUse() check for the CSEL? Unlikely perhaps but somebody might want the zero and sign extended results at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the CSEL has more uses, I think this shouldn't make things worse: it shouldn't increase the instruction count since we avoid the sign extension, and should help reduce the latency of the sign-extended result since we compute it from the CSEL operands, right? Are you concerned about some other effect that emitting both CSET and CSETM might have? (I'm happy to add the hasOneUse() check, I'm just confirming I'm not missing anything. :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not backed by any science I just wasn't sure if having two instructions requiring the same CC flags might hinder scheduling more? It seems an unlikely scenario so I'm happy to defer the decision until we have a real world use case to analyse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that I believe we may already generate multiple instructions requiring the same CC flags elsewhere, for example when selecting CSINC from (add x, CSEL (...)) patterns, and it wasn't obvious to me how doing this in a combine would hypothetically be worse.
If you're happy with the current approach I'll go ahead and merge it on the basis that we can tighten the conditions if this turns out to be an issue. :)
} | ||
|
||
// Sign extend of CSET -> CSETM. | ||
if (Opc == AArch64ISD::CSEL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not backed by any science I just wasn't sure if having two instructions requiring the same CC flags might hinder scheduling more? It seems an unlikely scenario so I'm happy to defer the decision until we have a real world use case to analyse.
Add the following patterns to performSignExtendInRegCombine:
The combined forms can be matched to a CSETM.
I'd like to eventually propose enabling
convertSelectOfConstantsToMath
, assuming others think that would be a good idea, primarily to improve a few cases such as this. However, doing so currently introduces a few code gen regressions. This patch fixes some of them, and hopefully is a worthwhile addition on its own (please let me know otherwise).