-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV Add some vsetvli insertion test cases with vmv.s.x+reduction. NFC #75544
Conversation
These test cases where intended to get a single vsetvli by using the vmv.s.x intrinsic with the same LMUL as the reduction. This works for FP, but does not work for integer. I believe llvm#71501 will break this for FP too. Hopefully the vsetvli pass can be taught to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For the record, the difference between integer and float is very weird here.
Congrats, you nerd sniped me. :) I dug into this, and it turns out to be a difference caused by the LMUL1 subreg stuff (which we apparently only do for int?), and a gap in the backwards walk code. A draft patch follows. (Sorry for the huge inline comment, attaching files appears broken today?) Note that the fixme inline needs addressed before this can become a real patch, and there's definitely some style cleanup needed.
|
One of the requirements to be able to delete a vsetvli in the backwards pass is that the preceding vsetvli must have the same AVL. This handles the case where the AVLs are registers by using MachineRegisterInfo to check if there are any definitions between the two vsetvlis. The Dominates helper was taken from MachineDominatorTree and scans through the instructions in the block which is less than ideal. But it's only called whenever the two registers are the same, which should be rare. This also replaces the equally-zero check with the existing hasEquallyZeroAVL function, which is needed to handle the case where the AVLs are the same. Based off the draft patch in llvm#75544 (comment).
Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions, and shows how the LMUL reducing combine for VFMV_S_F_VL and VMV_S_X_VL results in the discrepancy in the test added in llvm#75544. Split out from llvm#71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.
These test cases where intended to get a single vsetvli by using the vmv.s.x intrinsic with the same LMUL as the reduction. This works for FP, but does not work for integer.
I believe #71501 will break this for FP too. Hopefully the vsetvli pass can be taught to fix this.