-
Notifications
You must be signed in to change notification settings - Fork 270
vfmv.s.f zeroing of elements > 0 #171
Comments
A few paragraphs up, it says
so I think it is correct as-is. |
What's the reason for this inconsistency? I.e. according to current version of the spec other instructions zero up to I don't think it's a good idea to require zeroing past |
I see your point, but the instructions that write mask registers, e.g. |
At most
|
Plus: #157 (comment)
|
I can see why having this instruction operate on a register group instead of a single register could sometimes be useful. @kasanovic will have to explain why he thought acting only on a single register was the best definition. |
Well, here's one reason... these instructions are a good match to the reduction operations, which consume a scalar input and produce a scalar output (i.e. they only write a vector register, not a vector register group). To initialize that scalar input for the first step of the reduction to a nonzero value, you'd use |
This does not look like a common case, i.e. as far as I understand these reduction instructions are only needed after/at the end of reduction loop and doing multiple of them at the same time is not likely to help. |
Andrew is right that the reason is that these are intended for scalars, which don't need to occupy a whole vector register group. Performing multiple reductions after a loop is not that uncommon as is captures some matrix tiling scenarios. Later, there will also probably be other instructions that use vector-register-resident scalars as inputs, and we don't want to blow a whole vector register group on them. |
If this is an important case, then scalar registers should be used as input to the reduction instructions just like they are used for vector-scalar FMAs in SGEMM. |
This would require reductions to write the scalar f registers which would lead to a loss of decoupling in many designs. |
Motivation for this design should be explained in the spec. Also, it would be useful to have an example of code in https://github.com/riscv/riscv-v-spec/blob/master/vector-examples.adoc. I still don't see how this irregularity in the vector ISA spec would make substantial difference in performance. |
resolved by change to undisturbed tails. |
The spec says:
I think it should be:
The other elements in the destination vector register group ( 0 < index < LMUL*VLEN/SEW) are zeroed.
The text was updated successfully, but these errors were encountered: