[RISCV] Missing opportunities to optimize RVV instructions #80392

wangpc-pp · 2024-02-02T05:47:46Z

In the SelectionDAG level, we have several code paths to generate RVV pseudos:

RVV intrinsics -> RVV pseudos.
ISD nodes -> RVV pseudos.
RISCVISD nodes -> RVV pseudos.
RVV intrinsics -> RISCVISD nodes -> RVV pseudos.
ISD nodes -> RISCVISD nodes -> RVV pseudos.
etc.

Most of the optimizations for RVV are based on RISCVISD nodes, so we may miss some opportunities to optimize some codes.
For example (https://godbolt.org/z/f1jWEfhG7):

vuint8m1_t dup(uint8_t* data) {
    return __riscv_vmv_v_x_u8m1(*data, __riscv_vsetvlmax_e8m1());
}

vuint8m1_t dup2(uint8_t* data) {
    return __riscv_vlse8_v_u8m1(data, 0, __riscv_vsetvlmax_e8m1());
}

dup:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
        ret
dup2:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
        ret

These two snippets are of same assemblies because we lower intrinsics of vmv.v.x to RISCVISD::VMV_V_X first, and then we can optimize it to zero-stride load if profitable.
But, this is not common for other cases:

vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) {
    int vl = __riscv_vsetvlmax_e8m1();
    vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl);
    return __riscv_vadd_vv_u16m2(a, c, vl);
}

vuint16m2_t vwaddu(vuint16m2_t a, vuint8m1_t b) {
    return __riscv_vwaddu_wv_u16m2(a, b, __riscv_vsetvlmax_e16m2());
}

vadd:
        vsetvli a0, zero, e16, m2, ta, ma
        vzext.vf2       v12, v10
        vadd.vv v8, v8, v12
        ret
vwaddu:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.wv       v8, v8, v10
        ret

We can't optimize vzext.vf2+vadd.vv to vwaddu.wv, because we lower these intrinsics to RVV pseudos directly.
Of cource, there is the same problem for ISD->RVV pseudos path:

typedef vuint8m1_t v16xi8 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen)));
typedef vuint16m2_t v16xi32 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen * 2)));

v16xi32 add(v16xi32 a, v16xi8 b) {
    v16xi32 c = __riscv_vzext_vf2_u16m2(b, 16);
    return a + c;
}

add:
        vsetivli        zero, 16, e16, m2, ta, ma
        vzext.vf2       v12, v10
        vadd.vv v8, v12, v8
        ret

I think we need to an universal representation (RISCVISD?) to do optimizations. But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-02-02T05:51:14Z

@llvm/issue-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

In the SelectionDAG level, we have several code paths to generate RVV pseudos: 1. RVV intrinsics -> RVV pseudos. 2. ISD nodes -> RVV pseudos. 3. RISCVISD nodes -> RVV pseudos. 4. RVV intrinsics -> RISCVISD nodes -> RVV pseudos. 5. ISD nodes -> RISCVISD nodes -> RVV pseudos. 6. etc.

Most of the optimizations for RVV are based on RISCVISD nodes, so we may miss some opportunities to optimize some codes.
For example (https://godbolt.org/z/f1jWEfhG7):

vuint8m1_t dup(uint8_t* data) {
    return __riscv_vmv_v_x_u8m1(*data, __riscv_vsetvlmax_e8m1());
}

vuint8m1_t dup2(uint8_t* data) {
    return __riscv_vlse8_v_u8m1(data, 0, __riscv_vsetvlmax_e8m1());
}

dup:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
        ret
dup2:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
        ret

These two snippets are of same assemblies because we lower intrinsics of vmv.v.x to RISCVISD::VMV_V_X first, and then we can optimize it to zero-stride load if profitable.
But, this is not common for other cases:

vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) {
    int vl = __riscv_vsetvlmax_e8m1();
    vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl);
    return __riscv_vadd_vv_u16m2(a, c, vl);
}

vuint16m2_t vwaddu(vuint16m2_t a, vuint8m1_t b) {
    return __riscv_vwaddu_wv_u16m2(a, b, __riscv_vsetvlmax_e16m2());
}

vadd:
        vsetvli a0, zero, e16, m2, ta, ma
        vzext.vf2       v12, v10
        vadd.vv v8, v8, v12
        ret
vwaddu:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.wv       v8, v8, v10
        ret

We can't optimize vzext.vf2+vadd.vv to vwaddu.wv, because we lower these intrinsics to RVV pseudos directly.
Of cource, there is the same problem for ISD->RVV pseudos path:

typedef vuint8m1_t v16xi8 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen)));
typedef vuint16m2_t v16xi32 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen * 2)));

v16xi32 add(v16xi32 a, v16xi8 b) {
    v16xi32 c = __riscv_vzext_vf2_u16m2(b, 16);
    return a + c;
}

add:
        vsetivli        zero, 16, e16, m2, ta, ma
        vzext.vf2       v12, v10
        vadd.vv v8, v12, v8
        ret

I think we need to an universal representation (RISCVISD?) to do optimizations. But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?

topperc · 2024-02-02T06:00:21Z

The last example can be optimized with full use of ISD nodes instead of mixing in intrinsics.

v16xi32 add(v16xi32 a, v16xi8 b) {
    v16xi32 c = __builtin_convertvector(b, v16xi32);
    return a + c;
}

lukel97 · 2024-02-02T06:06:55Z

But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?

RISCVFoldMasks and #71764 is an effort to move some of the SelectionDAG code out into MIR passes.

We can't optimize vzext.vf2+vadd.vv to vwaddu.wv, because we lower these intrinsics to RVV pseudos directly.

I can't remember where I first heard this argument, but I think there was a question as to whether or not intrinsics should be optimised away? Since there might be the expectation that if the user writes __riscv_vzext_vf2_u16m2 then there should be a vzext.vf2 in the resulting code.

wangpc-pp · 2024-02-02T06:47:20Z

The last example can be optimized with full use of ISD nodes instead of mixing in intrinsics.
v16xi32 add(v16xi32 a, v16xi8 b) {
    v16xi32 c = __builtin_convertvector(b, v16xi32);
    return a + c;
}

Thanks! I think my unawareness of this just shows these potential missed optimizations. 😄

wangpc-pp · 2024-02-02T06:50:04Z

But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?

RISCVFoldMasks and #71764 is an effort to move some of the SelectionDAG code out into MIR passes.

Yeah! Thanks for mentioning these works!

We can't optimize vzext.vf2+vadd.vv to vwaddu.wv, because we lower these intrinsics to RVV pseudos directly.

I can't remember where I first heard this argument, but I think there was a question as to whether or not intrinsics should be optimised away? Since there might be the expectation that if the user writes __riscv_vzext_vf2_u16m2 then there should be a vzext.vf2 in the resulting code.

As my example shows, we have already broken this convention for vmv.v.x intrinsics now.

topperc · 2024-02-02T06:58:22Z

Not directly related to this, but I'm not sure HasOptimizedZeroStrideLoad should default to true.

wangpc-pp · 2024-02-02T07:43:40Z

Not directly related to this, but I'm not sure HasOptimizedZeroStrideLoad should default to true.

See https://reviews.llvm.org/D137699. cc @preames

github-actions bot added the new issue label Feb 2, 2024

EugeneZelenko added backend:RISC-V missed-optimization and removed new issue labels Feb 2, 2024

wangpc-pp mentioned this issue Feb 5, 2024

RISC-V vector intrinsics construct unshared constant-zero temporaries #78647

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Missing opportunities to optimize RVV instructions #80392

[RISCV] Missing opportunities to optimize RVV instructions #80392

wangpc-pp commented Feb 2, 2024

llvmbot commented Feb 2, 2024

topperc commented Feb 2, 2024

lukel97 commented Feb 2, 2024 •

edited

Loading

wangpc-pp commented Feb 2, 2024

wangpc-pp commented Feb 2, 2024

topperc commented Feb 2, 2024

wangpc-pp commented Feb 2, 2024

[RISCV] Missing opportunities to optimize RVV instructions #80392

[RISCV] Missing opportunities to optimize RVV instructions #80392

Comments

wangpc-pp commented Feb 2, 2024

llvmbot commented Feb 2, 2024

topperc commented Feb 2, 2024

lukel97 commented Feb 2, 2024 • edited Loading

wangpc-pp commented Feb 2, 2024

wangpc-pp commented Feb 2, 2024

topperc commented Feb 2, 2024

wangpc-pp commented Feb 2, 2024

lukel97 commented Feb 2, 2024 •

edited

Loading