Skip to content

Conversation

rj-jesus
Copy link
Contributor

@rj-jesus rj-jesus commented Sep 2, 2025

Add the following patterns to performSignExtendInRegCombine:

  • SIGN_EXTEND_INREG (CSEL 0, 1, cc), i1 --> CSEL 0, -1, cc
  • SIGN_EXTEND_INREG (CSEL 1, 0, cc), i1 --> CSEL -1, 0, cc

The combined forms can be matched to a CSETM.

I'd like to eventually propose enabling convertSelectOfConstantsToMath, assuming others think that would be a good idea, primarily to improve a few cases such as this. However, doing so currently introduces a few code gen regressions. This patch fixes some of them, and hopefully is a worthwhile addition on its own (please let me know otherwise).

Add the following patterns to performSignExtendInRegCombine:
  SIGN_EXTEND_INREG (CSEL 0, 1, cc), i1 --> CSEL 0, -1, cc
  SIGN_EXTEND_INREG (CSEL 1, 0, cc), i1 --> CSEL -1, 0, cc

The combined forms can be matched to a CSETM.
}

// Sign extend of CSET -> CSETM.
if (Opc == AArch64ISD::CSEL &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having a hasOneUse() check for the CSEL? Unlikely perhaps but somebody might want the zero and sign extended results at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the CSEL has more uses, I think this shouldn't make things worse: it shouldn't increase the instruction count since we avoid the sign extension, and should help reduce the latency of the sign-extended result since we compute it from the CSEL operands, right? Are you concerned about some other effect that emitting both CSET and CSETM might have? (I'm happy to add the hasOneUse() check, I'm just confirming I'm not missing anything. :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not backed by any science I just wasn't sure if having two instructions requiring the same CC flags might hinder scheduling more? It seems an unlikely scenario so I'm happy to defer the decision until we have a real world use case to analyse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that I believe we may already generate multiple instructions requiring the same CC flags elsewhere, for example when selecting CSINC from (add x, CSEL (...)) patterns, and it wasn't obvious to me how doing this in a combine would hypothetically be worse.

If you're happy with the current approach I'll go ahead and merge it on the basis that we can tighten the conditions if this turns out to be an issue. :)

}

// Sign extend of CSET -> CSETM.
if (Opc == AArch64ISD::CSEL &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not backed by any science I just wasn't sure if having two instructions requiring the same CC flags might hinder scheduling more? It seems an unlikely scenario so I'm happy to defer the decision until we have a real world use case to analyse.

@rj-jesus rj-jesus merged commit 8989ec5 into llvm:main Sep 3, 2025
9 checks passed
@rj-jesus rj-jesus deleted the rjj/aarch64-fold-sext-inreg-cset-to-csetm branch September 3, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants