vfmv.s.f zeroing of elements > 0 #171

solomatnikov · 2019-05-09T22:44:46Z

The spec says:

The other elements in the destination vector register ( 0 < index < VLEN/SEW) are zeroed.

I think it should be:

The other elements in the destination vector register group ( 0 < index < LMUL*VLEN/SEW) are zeroed.

aswaterman · 2019-05-09T22:47:35Z

A few paragraphs up, it says

The instructions ignore LMUL and vector register groups

so I think it is correct as-is.

solomatnikov · 2019-05-09T22:56:45Z

What's the reason for this inconsistency?

I.e. according to current version of the spec other instructions zero up to LMUL*VLEN/SEW.

I don't think it's a good idea to require zeroing past vl should be required but if it is required it should be consistent.

aswaterman · 2019-05-09T22:58:48Z

I see your point, but the instructions that write mask registers, e.g. vmand.mm, also work this way: only one vector register is written, regardless of LMUL.

solomatnikov · 2019-05-09T23:05:48Z

At most VLEN bits of the mask register are used for masking of vector ops, regardless of LMUL, so vmand.mm is consistent with that.

vfmv.s.f is presumably used for FP value, not mask bit(s).

solomatnikov · 2019-05-09T23:16:01Z

Plus: #157 (comment)

In particular: zero-past-VL and preserve-past-VL are both useful behaviors in some situations, and it's easy to imagine a software developer accidentally relying on whichever one the development machine provides. So we could end up in the situation where software runs only under one discipline or the other, risking the possibility of the adoption of a de-facto standard.

aswaterman · 2019-05-09T23:24:35Z

I can see why having this instruction operate on a register group instead of a single register could sometimes be useful. @kasanovic will have to explain why he thought acting only on a single register was the best definition.

aswaterman · 2019-05-09T23:33:23Z

Well, here's one reason... these instructions are a good match to the reduction operations, which consume a scalar input and produce a scalar output (i.e. they only write a vector register, not a vector register group). To initialize that scalar input for the first step of the reduction to a nonzero value, you'd use vfmv or vmv. If those instructions wrote a whole vector register group, it would reduce the usable capacity of the regfile for scalars.

solomatnikov · 2019-05-10T01:06:04Z

Well, here's one reason... these instructions are a good match to the reduction operations, which consume a scalar input and produce a scalar output (i.e. they only write a vector register, not a vector register group). To initialize that scalar input for the first step of the reduction to a nonzero value, you'd use vfmv or vmv. If those instructions wrote a whole vector register group, it would reduce the usable capacity of the regfile for scalars.

This does not look like a common case, i.e. as far as I understand these reduction instructions are only needed after/at the end of reduction loop and doing multiple of them at the same time is not likely to help.

kasanovic · 2019-05-10T01:28:42Z

Andrew is right that the reason is that these are intended for scalars, which don't need to occupy a whole vector register group. Performing multiple reductions after a loop is not that uncommon as is captures some matrix tiling scenarios. Later, there will also probably be other instructions that use vector-register-resident scalars as inputs, and we don't want to blow a whole vector register group on them.

solomatnikov · 2019-05-10T01:41:53Z

Andrew is right that the reason is that these are intended for scalars, which don't need to occupy a whole vector register group. Performing multiple reductions after a loop is not that uncommon as is captures some matrix tiling scenarios. Later, there will also probably be other instructions that use vector-register-resident scalars as inputs, and we don't want to blow a whole vector register group on them.

If this is an important case, then scalar registers should be used as input to the reduction instructions just like they are used for vector-scalar FMAs in SGEMM.

kasanovic · 2019-05-10T01:43:29Z

This would require reductions to write the scalar f registers which would lead to a loss of decoupling in many designs.

solomatnikov · 2019-05-10T18:07:16Z

Motivation for this design should be explained in the spec.

Also, it would be useful to have an example of code in https://github.com/riscv/riscv-v-spec/blob/master/vector-examples.adoc. I still don't see how this irregularity in the vector ISA spec would make substantial difference in performance.

kasanovic · 2019-09-20T16:28:34Z

resolved by change to undisturbed tails.

kasanovic closed this as completed Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vfmv.s.f zeroing of elements > 0 #171

vfmv.s.f zeroing of elements > 0 #171

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

solomatnikov commented May 9, 2019

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

aswaterman commented May 9, 2019 •

edited

Loading

solomatnikov commented May 10, 2019

kasanovic commented May 10, 2019

solomatnikov commented May 10, 2019

kasanovic commented May 10, 2019

solomatnikov commented May 10, 2019

kasanovic commented Sep 20, 2019

vfmv.s.f zeroing of elements > 0 #171

vfmv.s.f zeroing of elements > 0 #171

Comments

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

solomatnikov commented May 9, 2019

solomatnikov commented May 9, 2019

aswaterman commented May 9, 2019

aswaterman commented May 9, 2019 • edited Loading

solomatnikov commented May 10, 2019

kasanovic commented May 10, 2019

solomatnikov commented May 10, 2019

kasanovic commented May 10, 2019

solomatnikov commented May 10, 2019

kasanovic commented Sep 20, 2019

aswaterman commented May 9, 2019 •

edited

Loading