Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Extract subregister if VLEN is known when lowering extract_subvector #65392

Closed
wants to merge 3 commits into from

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 5, 2023

If we know VLEN at compile time, then we can workout what subregister an index into a fixed length vector will be at.
We can use this information when lowering extract_subvector to perform the vslidedown on a smaller subregister. This allows us to use a smaller LMUL, or if the extract is aligned to a vector register then we can avoid the slide
altogether.

The logic here is a bit tangled with the scalable path: If people find this too unwieldy, I can separate it out and duplicate it for the fixed case.

This technique could be applied to extract_vector_elt, insert_vector_elt and insert_subvector too.

This is stacked upon #65391

This is partly a precommit for an upcoming patch, and partly to remove the
fixed length LMUL restriction similarly to what was done in
https://reviews.llvm.org/D158270, since it's no longer that relevant.
This patch refactors extract_subvector to lower to extract_subreg directly, and
to shortcut whenever the index is 0 when extracting a scalable vector. This
doesn't change any of the existing behaviour, but makes an upcoming patch that
extends the scalable path slightly easier to read.
…bvector

If we know VLEN at compile time, then we can workout what subregister an index
into a fixed length vector will be at.
We can use this information when lowering extract_subvector to perform the
vslidedown on a smaller subregister. This allows us to use a smaller LMUL, or
if the extract is aligned to a vector register then we can avoid the slide
altogether.

The logic here is a bit tangled with the scalable path: If people find this too
unwieldy, I can separate it out and duplicate it for the fixed case.

This technique could be applied to extract_vector_elt, insert_vector_elt and
insert_subvector too.
@@ -152,6 +152,11 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
unsigned VLen = getMaxRVVVectorSizeInBits();
return VLen == 0 ? 65536 : VLen;
}
std::optional<unsigned> getRealKnownVLen() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest: getExactVLen().

// which register of a LMUL group contains the specific subvector as we only
// know the minimum register size. Therefore we must slide the vector group
// down the full amount.
if (SubVecVT.isFixedLengthVector() && !KnownVLen) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't fully general.

Consider: 8 x i64 on V (a lmul4 type), extract the second <2 x i64> quarter. We can still do know this is the first LMUL2 sub-register. This doesn't allow us to slide less, but it does allow us to reduce the lmul for the slide.

unsigned EffectiveIdx = OrigIdx;
unsigned Vscale = *KnownVLen / RISCV::RVVBitsPerBlock;
if (SubVecVT.isFixedLengthVector()) {
assert(KnownVLen);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert isn't needed. KnownVLen was already accessed on line 8670.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC * on a null optional is just UB, you need to use value() to have an exception thrown. This part is a bit hairy though, I'm avoiding value() so I don't have to define Vscale twice. Any ideas on a better way to structure this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any specific ideas, but I definitely wouldn't rely on UB. A library might put an assert in * rather than throwing an exception.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 11, 2023
As noted in
llvm#65392 (comment), when
lowering an extract of a fixed length vector from another vector, we don't need
to perform the vslidedown on the full vector type. Instead we can extract the
smallest subregister that contains the subvector to be extracted and perform
the vslidedown with a smaller LMUL. E.g, with +Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we
only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and
reuses it.

I've split this out into a separate PR rather than include it in llvm#65392, with
the hope that we'll be able to generalize it later.
lukel97 added a commit that referenced this pull request Sep 11, 2023
As noted in
#65392 (comment),
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.

I've split this out into a separate PR rather than include it in #65392,
with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…#65598)

As noted in
llvm#65392 (comment),
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:

v2i64 = extract_subvector nxv4i64, 2

is currently lowered as

vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2

This patch shrinks the vslidedown to LMUL=2:

vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2

Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.

lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.

I've split this out into a separate PR rather than include it in llvm#65392,
with the hope that we'll be able to generalize it later.

This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
@lukel97 lukel97 marked this pull request as draft October 4, 2023 20:29
@lukel97
Copy link
Contributor Author

lukel97 commented Oct 4, 2023

Marking this as a draft as it may need reworked after #65598, #66087, and related PRs to reduce LMUL across vslidedowns and vslideups have landed.

@lukel97 lukel97 closed this Nov 29, 2023
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jan 30, 2024
…T_SUBVECTOR

This is a revival of llvm#65392. When we lower an extract_subvector, we extract the
subregister that the subvector is contained in first and then do a vslidedown
with LMUL=1.  We can currently only do this for scalable vectors though because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the subvector
could lie in any arbitrary subregister, so we have to do a vslidedown with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before computing
the subregister, and extending the LMUL=1 path to handle fixed length vectors.
lukel97 added a commit that referenced this pull request Feb 13, 2024
…T_SUBVECTOR (#79949)

This is a revival of #65392. When we lower an extract_subvector, we
extract the
subregister that the subvector is contained in first and then do a
vslidedown
with LMUL=1. We can currently only do this for scalable vectors though
because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the
subvector
could lie in any arbitrary subregister, so we have to do a vslidedown
with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we
can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before
computing
the subregister, and extending the LMUL=1 path to handle fixed length
vectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants