-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Missing opportunities to optimize RVV instructions #80392
Comments
@llvm/issue-subscribers-backend-risc-v Author: Wang Pengcheng (wangpc-pp)
In the SelectionDAG level, we have several code paths to generate RVV pseudos:
1. RVV intrinsics -> RVV pseudos.
2. ISD nodes -> RVV pseudos.
3. RISCVISD nodes -> RVV pseudos.
4. RVV intrinsics -> RISCVISD nodes -> RVV pseudos.
5. ISD nodes -> RISCVISD nodes -> RVV pseudos.
6. etc.
Most of the optimizations for RVV are based on RISCVISD nodes, so we may miss some opportunities to optimize some codes. vuint8m1_t dup(uint8_t* data) {
return __riscv_vmv_v_x_u8m1(*data, __riscv_vsetvlmax_e8m1());
}
vuint8m1_t dup2(uint8_t* data) {
return __riscv_vlse8_v_u8m1(data, 0, __riscv_vsetvlmax_e8m1());
} dup:
vsetvli a1, zero, e8, m1, ta, ma
vlse8.v v8, (a0), zero
ret
dup2:
vsetvli a1, zero, e8, m1, ta, ma
vlse8.v v8, (a0), zero
ret These two snippets are of same assemblies because we lower intrinsics of vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) {
int vl = __riscv_vsetvlmax_e8m1();
vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl);
return __riscv_vadd_vv_u16m2(a, c, vl);
}
vuint16m2_t vwaddu(vuint16m2_t a, vuint8m1_t b) {
return __riscv_vwaddu_wv_u16m2(a, b, __riscv_vsetvlmax_e16m2());
} vadd:
vsetvli a0, zero, e16, m2, ta, ma
vzext.vf2 v12, v10
vadd.vv v8, v8, v12
ret
vwaddu:
vsetvli a0, zero, e8, m1, ta, ma
vwaddu.wv v8, v8, v10
ret We can't optimize typedef vuint8m1_t v16xi8 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen)));
typedef vuint16m2_t v16xi32 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen * 2)));
v16xi32 add(v16xi32 a, v16xi8 b) {
v16xi32 c = __riscv_vzext_vf2_u16m2(b, 16);
return a + c;
} add:
vsetivli zero, 16, e16, m2, ta, ma
vzext.vf2 v12, v10
vadd.vv v8, v12, v8
ret I think we need to an universal representation (RISCVISD?) to do optimizations. But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes? |
The last example can be optimized with full use of ISD nodes instead of mixing in intrinsics.
|
RISCVFoldMasks and #71764 is an effort to move some of the SelectionDAG code out into MIR passes.
I can't remember where I first heard this argument, but I think there was a question as to whether or not intrinsics should be optimised away? Since there might be the expectation that if the user writes |
Thanks! I think my unawareness of this just shows these potential missed optimizations. 😄 |
Yeah! Thanks for mentioning these works!
As my example shows, we have already broken this convention for |
Not directly related to this, but I'm not sure |
|
In the SelectionDAG level, we have several code paths to generate RVV pseudos:
Most of the optimizations for RVV are based on RISCVISD nodes, so we may miss some opportunities to optimize some codes.
For example (https://godbolt.org/z/f1jWEfhG7):
These two snippets are of same assemblies because we lower intrinsics of
vmv.v.x
toRISCVISD::VMV_V_X
first, and then we can optimize it to zero-stride load if profitable.But, this is not common for other cases:
We can't optimize
vzext.vf2+vadd.vv
tovwaddu.wv
, because we lower these intrinsics to RVV pseudos directly.Of cource, there is the same problem for
ISD->RVV pseudos
path:I think we need to an universal representation (RISCVISD?) to do optimizations. But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?
The text was updated successfully, but these errors were encountered: