Non-rounding averaging add and subtract #739

dsharletg · 2021-09-22T23:16:17Z

I noticed that in the v1.0 spec, rounding add and subtract averaging instructions are present (vaadd[u] and vasub[u]), but there do not appear to be any non-rounding versions. Both are very useful, because they can be used in trees to produce averaging trees of minimal bias. For example, (a + 2b + c + 2)/4 = (b + (a + c)/2 + 1)/2 = vaadd(b, missing_non_rounding_averaging(a, c)). With only the rounding vaadd, we can only compute (a + 2b + c + 3)/4 = vaadd(b, vaadd(a, c)), which has worse error.

The text was updated successfully, but these errors were encountered:

nick-knight · 2021-09-22T23:47:28Z

Is it possible to vary the fixed-point rounding mode
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-rounding-mode-register-vxrm
to simulate the desired behavior?

dsharletg · 2021-09-22T23:50:03Z

I didn't realize the roundoff_signed functions were configurable. However, I still don't think that helps. We have large trees of these averaging ops to implement particular kernels, with the rounding mode of each one varying and carefully chosen to minimize bias and error. It seems like it would be slow/impractical to change the rounding mode frequently (possibly for each individual instruction).

nick-knight · 2021-09-22T23:51:35Z

Let me try asking a different way. What are the precise semantics of your proposed missing_non_rounding_averaging(a, c)?

dsharletg · 2021-09-22T23:53:25Z

We would basically just want ceil_avg(a, b) = (a + b + 1)>>1 and floor_avg(a, b) = (a + b)>>1, so we can use both of these together in larger expressions. For example:

floor_avg(floor_avg(a, c), b) = (a + 2b + c + 0)>>2
floor_avg(ceil_avg(a, c), b) = (a + 2b + c + 1)>>2
ceil_avg(floor_avg(a, c), b) = (a + 2b + c + 2)>>2
ceil_avg(ceil_avg(a, c), b) = (a + 2b + c + 3)>>2

nick-knight · 2021-09-23T00:47:42Z

Sorry, I'm still not completely clear. You seem to expect rational expressions like (a+b)/2 to be interpreted as integers, so you must be assuming some kind of rounding rule when a+b is odd. Here are four possibilities:

Round to positive infinity. This can be implemented by vaadd with vxrm == rnu
Round to negative infinity. This can be implemented by vaadd with vxrm == rdn
Round towards zero. I don't know a single-instruction sequence.
Round away from zero. I don't know a single-instruction sequence.

I admit I don't have experience in computing "averaging trees of minimal bias". I'll step back and let others attempt to tackle your question.

dsharletg · 2021-09-23T02:41:24Z

In my examples, ceil_avg is equivalent to vaadd with vxrm == rnu, and floor_avg is equivalent to vaadd with vxrm == rdn.

The issue is I think we would need to flipping the rounding mode very frequently, which seems like a performance issue. In general, statefulness like this in instructions is quite a bit of a pain to deal with IMHO.

dsharletg · 2021-09-23T02:47:04Z

I edited the previous comment to just use shifts (which I assume to have no clever rounding, just truncating).

dsharletg · 2021-09-23T03:30:34Z

I see now that shifts are also affected by this rounding register... :( IMO, this really makes the instruction set harder to use for a lot of the code I'm used to writing on ARM and other targets.

nick-knight · 2021-09-23T04:49:31Z

Toggling of the vxrm CSR is performed by scalar instructions, which can often be scheduled far in advance of the associated vector instructions. This is comparable to changing the vector length (vl CSR), and the overheads of that have been discussed at length elsewhere. Moreover, although I don't know the details of your use-case, I suspect the pattern of vxrm changes are often known statically, so can be encoded by an immediate (csrwi). I believe this property could be particularly relevant in the case of out-of-order implementations.

I certainly agree that it would reduce static and dynamic code size if vxrm were encoded directly in the instruction. However, in my opinion, it is unsound reasoning to conclude a performance issue based solely on instruction count, at least without a particular implementation in mind.

dsharletg · 2021-09-23T05:19:54Z

It's not really just about performance. There's also the practical matter of it just being kind of a pain to have global state and instructions that are not pure functions:

Assembly is harder to read. To know the behavior, you have to find the last time the control register was set, which may be before some control flow.
This is an extra constraint for the compiler and HW to consider when attempting to reorder instructions.
The only reasonable thing for compilers to do is emit code to change the state register every time one of the affected instructions is used (and then maybe try to optimize redundant setting of the state registers?).

nick-knight · 2021-09-23T15:10:16Z

I completely agree with your practical concerns. This was already an issue for scalar floating-point's dynamic rounding mode and exception flags (frm and fflags CSRs).

Almost every vector instruction depends on global state --- the vl and vtype CSRs --- and this is already a major headache for the C intrinsics. The current LLVM implementation emits a state change (vset{i}vli) before every instruction, and then performs a succession of dizzying passes in an attempt to elide redundant ones. I spend a fraction of my working hours filing tickets about corner-cases where LLVM's vset{i}vli-elision fails. So your point certainly resonates with me.

As far as I'm aware, the C intrinsics interface hasn't even agreed on how to handle vxrm:
riscv-non-isa/rvv-intrinsic-doc#36
riscv-non-isa/rvv-intrinsic-doc#46
riscv-non-isa/rvv-intrinsic-doc#84
riscv-non-isa/rvv-intrinsic-doc#27 (comment)

kasanovic · 2021-09-23T20:22:13Z

Understand and agree with the complaint generally (except that saturation mode is part of instruction encoding). Rounding mode was put in CSR just for encoding space reasons. Also note that in a vector machine, the mode change is only single-cycle occupancy whereas vector operation could run for quite a few cycles, so actual execution time overhead can be low/zero. There were requests for more fixed-point support and plan was to add more later as extensions, and these could include static round versions, but these could chew up a lot of encoding space unless carefully selected. Compiler teams have had success removing extraneous vsetvli instructions using the suggested approach (as Nick mentions), so would definitely try that path.

There are discussions in riscv/riscv-v-spec#739. The reason that ISA makes the vx round mode only available in CSR is for encoding space. I think we can make it non-preserved across calls to use it efficiently.

After discussion in #287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in #256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in #256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s ﬂoating-point status ﬂags, nor depend on the state of its caller’s floating-point status ﬂags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn’t violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739

…fied upon entry After discussion in #287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in #256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in #256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s ﬂoating-point status ﬂags, nor depend on the state of its caller’s floating-point status ﬂags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn't violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739

…fied upon entry After discussion in riscv-non-isa#287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in riscv-non-isa#256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in riscv-non-isa#256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s ﬂoating-point status ﬂags, nor depend on the state of its caller’s floating-point status ﬂags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn't violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739

Findecanor · 2023-05-09T09:26:58Z

BTW. Table 5 in section 3.8 of the spec about the fixed-point rounding modes is misleading, and I think should be clarified.

While rounding mode "rdn" truncates the bits, in the context of division in 2's complement arithmetic, it is flooring, not truncating. It is what you'd want to use to get "no rounding" after shift, like srl/sra in the base set.

Rounding mode "rnu" is the "truncating" mode, giving vaadd[u] behaviour like [v]div[u].
Edit: My mistake. "rtz" rounds up only when the value is negative.

nick-knight · 2023-05-09T14:23:03Z

Agree there is confusion about the term truncation, whose meaning changes when passing from sign-magnitude to two's complement.

Disagree about the relation between RNU and RTZ: they are distinct.

kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Nov 9, 2021

This was referenced Mar 18, 2022

riscv-cc: Describe that vxrm/vxsat is not preserved across calls riscv-non-isa/riscv-elf-psabi-doc#256

Closed

[RFC] Propose rename current fixed-point intrinsic APIs for instructions which read VXRM riscv-non-isa/rvv-intrinsic-doc#144

Closed

kito-cheng mentioned this issue Jun 20, 2022

Define vxrm/vxsat is not preserved across calls and unspecified upon entry field upon entry riscv-non-isa/riscv-elf-psabi-doc#294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-rounding averaging add and subtract #739

Non-rounding averaging add and subtract #739

dsharletg commented Sep 22, 2021

nick-knight commented Sep 22, 2021

dsharletg commented Sep 22, 2021

nick-knight commented Sep 22, 2021

dsharletg commented Sep 22, 2021 •

edited

Loading

nick-knight commented Sep 23, 2021 •

edited

Loading

dsharletg commented Sep 23, 2021 •

edited

Loading

dsharletg commented Sep 23, 2021

dsharletg commented Sep 23, 2021

nick-knight commented Sep 23, 2021 •

edited

Loading

dsharletg commented Sep 23, 2021

nick-knight commented Sep 23, 2021

kasanovic commented Sep 23, 2021

Findecanor commented May 9, 2023 •

edited

Loading

nick-knight commented May 9, 2023

Non-rounding averaging add and subtract #739

Non-rounding averaging add and subtract #739

Comments

dsharletg commented Sep 22, 2021

nick-knight commented Sep 22, 2021

dsharletg commented Sep 22, 2021

nick-knight commented Sep 22, 2021

dsharletg commented Sep 22, 2021 • edited Loading

nick-knight commented Sep 23, 2021 • edited Loading

dsharletg commented Sep 23, 2021 • edited Loading

dsharletg commented Sep 23, 2021

dsharletg commented Sep 23, 2021

nick-knight commented Sep 23, 2021 • edited Loading

dsharletg commented Sep 23, 2021

nick-knight commented Sep 23, 2021

kasanovic commented Sep 23, 2021

Findecanor commented May 9, 2023 • edited Loading

nick-knight commented May 9, 2023

dsharletg commented Sep 22, 2021 •

edited

Loading

nick-knight commented Sep 23, 2021 •

edited

Loading

dsharletg commented Sep 23, 2021 •

edited

Loading

nick-knight commented Sep 23, 2021 •

edited

Loading

Findecanor commented May 9, 2023 •

edited

Loading