Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

vfmv.s.f zeroing of elements > 0 #171

Closed
solomatnikov opened this issue May 9, 2019 · 13 comments
Closed

vfmv.s.f zeroing of elements > 0 #171

solomatnikov opened this issue May 9, 2019 · 13 comments

Comments

@solomatnikov
Copy link

The spec says:

The other elements in the destination vector register ( 0 < index < VLEN/SEW) are zeroed.

I think it should be:

The other elements in the destination vector register group ( 0 < index < LMUL*VLEN/SEW) are zeroed.

@aswaterman
Copy link
Member

A few paragraphs up, it says

The instructions ignore LMUL and vector register groups

so I think it is correct as-is.

@solomatnikov
Copy link
Author

What's the reason for this inconsistency?

I.e. according to current version of the spec other instructions zero up to LMUL*VLEN/SEW.

I don't think it's a good idea to require zeroing past vl should be required but if it is required it should be consistent.

@aswaterman
Copy link
Member

I see your point, but the instructions that write mask registers, e.g. vmand.mm, also work this way: only one vector register is written, regardless of LMUL.

@solomatnikov
Copy link
Author

At most VLEN bits of the mask register are used for masking of vector ops, regardless of LMUL, so vmand.mm is consistent with that.

vfmv.s.f is presumably used for FP value, not mask bit(s).

@solomatnikov
Copy link
Author

Plus: #157 (comment)

In particular: zero-past-VL and preserve-past-VL are both useful behaviors in some situations, and it's easy to imagine a software developer accidentally relying on whichever one the development machine provides. So we could end up in the situation where software runs only under one discipline or the other, risking the possibility of the adoption of a de-facto standard.

@aswaterman
Copy link
Member

I can see why having this instruction operate on a register group instead of a single register could sometimes be useful. @kasanovic will have to explain why he thought acting only on a single register was the best definition.

@aswaterman
Copy link
Member

aswaterman commented May 9, 2019

Well, here's one reason... these instructions are a good match to the reduction operations, which consume a scalar input and produce a scalar output (i.e. they only write a vector register, not a vector register group). To initialize that scalar input for the first step of the reduction to a nonzero value, you'd use vfmv or vmv. If those instructions wrote a whole vector register group, it would reduce the usable capacity of the regfile for scalars.

@solomatnikov
Copy link
Author

Well, here's one reason... these instructions are a good match to the reduction operations, which consume a scalar input and produce a scalar output (i.e. they only write a vector register, not a vector register group). To initialize that scalar input for the first step of the reduction to a nonzero value, you'd use vfmv or vmv. If those instructions wrote a whole vector register group, it would reduce the usable capacity of the regfile for scalars.

This does not look like a common case, i.e. as far as I understand these reduction instructions are only needed after/at the end of reduction loop and doing multiple of them at the same time is not likely to help.

@kasanovic
Copy link
Collaborator

Andrew is right that the reason is that these are intended for scalars, which don't need to occupy a whole vector register group. Performing multiple reductions after a loop is not that uncommon as is captures some matrix tiling scenarios. Later, there will also probably be other instructions that use vector-register-resident scalars as inputs, and we don't want to blow a whole vector register group on them.

@solomatnikov
Copy link
Author

Andrew is right that the reason is that these are intended for scalars, which don't need to occupy a whole vector register group. Performing multiple reductions after a loop is not that uncommon as is captures some matrix tiling scenarios. Later, there will also probably be other instructions that use vector-register-resident scalars as inputs, and we don't want to blow a whole vector register group on them.

If this is an important case, then scalar registers should be used as input to the reduction instructions just like they are used for vector-scalar FMAs in SGEMM.

@kasanovic
Copy link
Collaborator

This would require reductions to write the scalar f registers which would lead to a loss of decoupling in many designs.

@solomatnikov
Copy link
Author

Motivation for this design should be explained in the spec.

Also, it would be useful to have an example of code in https://github.com/riscv/riscv-v-spec/blob/master/vector-examples.adoc. I still don't see how this irregularity in the vector ISA spec would make substantial difference in performance.

@kasanovic
Copy link
Collaborator

resolved by change to undisturbed tails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants