Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Shrink vslidedown when lowering fixed extract_subvector #65598

Merged
merged 3 commits into from
Sep 11, 2023

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 7, 2023

As noted in #65392 (comment), when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it.

I've split this out into a separate PR rather than include it in #65392, with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As noted in
llvm#65392 (comment), when
lowering an extract of a fixed length vector from another vector, we don't need
to perform the vslidedown on the full vector type. Instead we can extract the
smallest subregister that contains the subvector to be extracted and perform
the vslidedown with a smaller LMUL. E.g, with +Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we
only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and
reuses it.

I've split this out into a separate PR rather than include it in llvm#65392, with
the hope that we'll be able to generalize it later.
@lukel97 lukel97 force-pushed the extract-subvector-fixed-shrink branch from adaa712 to 07e5fed Compare September 11, 2023 16:18
@lukel97 lukel97 merged commit e33f3f0 into llvm:main Sep 11, 2023
1 of 2 checks passed
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 18, 2023
If we know the VL and offset of a vslidedown_vl, we can work out the minimum
number of registers it's going to operate across. We can reuse the logic from
extract_vector_elt to perform it in a smaller type and reduce the LMUL.

The aim is to generalize llvm#65598 and hopefully extend this to vslideup_vl too so
that we can get the same optimisation for insert_subvector and
insert_vector_elt.

One observation from adding this is that the vslide*_vl nodes all take a mask
operand, but currently anything other than vmset_vl will fail to select, as all
the patterns expect true_mask. So we need to create a new vmset_vl instead of
using extract_subvector on the existing vmset_vl.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 18, 2023
If we know the VL and offset of a vslidedown_vl, we can work out the minimum
number of registers it's going to operate across. We can reuse the logic from
extract_vector_elt to perform it in a smaller type and reduce the LMUL.

The aim is to generalize llvm#65598 and hopefully extend this to vslideup_vl too so
that we can get the same optimisation for insert_subvector and
insert_vector_elt.

One observation from adding this is that the vslide*_vl nodes all take a mask
operand, but currently anything other than vmset_vl will fail to select, as all
the patterns expect true_mask. So we need to create a new vmset_vl instead of
using extract_subvector on the existing vmset_vl.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…#65598)

As noted in
llvm#65392 (comment),
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.

I've split this out into a separate PR rather than include it in llvm#65392,
with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 21, 2023
Similar to llvm#65598, if we're using a vslideup to insert a fixed length vector
into another vector, then we can work out the minimum number of registers it
will need to slide up across given the minimum VLEN, and shrink the type
operated on to reduce LMUL accordingly.

This is somewhat dependent on llvm#65916, since it introduces a subregister copy
that triggers a crash with -early-live-intervals in one of the tests.
lukel97 added a commit that referenced this pull request Sep 21, 2023
…65997)

Similar to #65598, if we're using a vslideup to insert a fixed length
vector into another vector, then we can work out the minimum number of
registers it will need to slide up across given the minimum VLEN, and
shrink the type operated on to reduce LMUL accordingly.

This is somewhat dependent on #66211 , since it introduces a subregister
copy that triggers a crash with -early-live-intervals in one of the
tests.

Stacked upon #66211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants