Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for no…
…n-adjacent load/store Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrr t0,vlenb csrr a6,vlenb slli t1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.vi v4,v4,1 slli t3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7) ----- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) ----- spill slli a3,a3,1 addi t4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleu a3,a6,.L3 csrr a4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) ---- spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vv v4,v24,v20 bgtu a7,a6,.L13 csrr a1,vlenb slli a1,a1,2 add a1,a1,sp li a4,-1 csrr t0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) ---- spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 slli t1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vv v0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vv v12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addw a0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrr a6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srli a2,a6,1 vmv.v.x v4,a1 vid.v v12 slli a3,a3,1 vand.vi v0,v12,1 addi t1,a2,-1 vmseq.vi v0,v0,1 slli a6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minu a4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vv v2,v16,v6 bgtu a4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,-1 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v6,v4,v2,v0 vredsum.vs v6,v6,v1 vmul.vx v0,v12,a3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmv.x.s a4,v6 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v4,v4,v2,v0 vredsum.vs v4,v4,v1 vmv.x.s a0,v4 addw a0,a0,a4 ret .L4: li a0,0 ret No spillings. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix big LMUL issue. (get_store_value): New function. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test.
- Loading branch information