-
Notifications
You must be signed in to change notification settings - Fork 270
Non-rounding averaging add and subtract #739
Comments
Is it possible to vary the fixed-point rounding mode |
I didn't realize the roundoff_signed functions were configurable. However, I still don't think that helps. We have large trees of these averaging ops to implement particular kernels, with the rounding mode of each one varying and carefully chosen to minimize bias and error. It seems like it would be slow/impractical to change the rounding mode frequently (possibly for each individual instruction). |
Let me try asking a different way. What are the precise semantics of your proposed |
We would basically just want
|
Sorry, I'm still not completely clear. You seem to expect rational expressions like
I admit I don't have experience in computing "averaging trees of minimal bias". I'll step back and let others attempt to tackle your question. |
In my examples, The issue is I think we would need to flipping the rounding mode very frequently, which seems like a performance issue. In general, statefulness like this in instructions is quite a bit of a pain to deal with IMHO. |
I edited the previous comment to just use shifts (which I assume to have no clever rounding, just truncating). |
I see now that shifts are also affected by this rounding register... :( IMO, this really makes the instruction set harder to use for a lot of the code I'm used to writing on ARM and other targets. |
Toggling of the I certainly agree that it would reduce static and dynamic code size if |
It's not really just about performance. There's also the practical matter of it just being kind of a pain to have global state and instructions that are not pure functions:
|
I completely agree with your practical concerns. This was already an issue for scalar floating-point's dynamic rounding mode and exception flags ( Almost every vector instruction depends on global state --- the As far as I'm aware, the C intrinsics interface hasn't even agreed on how to handle |
Understand and agree with the complaint generally (except that saturation mode is part of instruction encoding). Rounding mode was put in CSR just for encoding space reasons. Also note that in a vector machine, the mode change is only single-cycle occupancy whereas vector operation could run for quite a few cycles, so actual execution time overhead can be low/zero. There were requests for more fixed-point support and plan was to add more later as extensions, and these could include static round versions, but these could chew up a lot of encoding space unless carefully selected. Compiler teams have had success removing extraneous vsetvli instructions using the suggested approach (as Nick mentions), so would definitely try that path. |
There are discussions in riscv/riscv-v-spec#739. The reason that ISA makes the vx round mode only available in CSR is for encoding space. I think we can make it non-preserved across calls to use it efficiently.
There are discussions in riscv/riscv-v-spec#739. The reason that ISA makes the vx round mode only available in CSR is for encoding space. I think we can make it non-preserved across calls to use it efficiently.
After discussion in #287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in #256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in #256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s floating-point status flags, nor depend on the state of its caller’s floating-point status flags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn’t violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739
…fied upon entry After discussion in #287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in #256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in #256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s floating-point status flags, nor depend on the state of its caller’s floating-point status flags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn't violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739
…fied upon entry After discussion in riscv-non-isa#287, we realized that keeping the convention of vxsat and vxrm to be unspecified made the vector extension become unusable on the software side, because that might become an ABI breakage once we define that later, so we would like to define that within psABI 1.0 release. Here is two potential proposals for vxrm and vxsat in riscv-non-isa#256: - Same definition as fcsr, and followed the same convention as fenv. - Define as "not preserved across calls and unspecified upon entry". Based on the discussion in riscv-non-isa#256 and few off list discussions, we have better understanding of the usage for vxrm and vxsat registers now, so we propose take the second proposal: “not preserved across calls and the value is unspecified upon entry” as the ABI for the vxrm and vxsat according following reasons: - Using same policy as floating point environment might cause unnecessary overhead for maintain the convention described in C language specification : "a function call does not alter its caller's floating-point control modes, clear its caller’s floating-point status flags, nor depend on the state of its caller’s floating-point status flags unless the function is so documented"[1]; all existing libraries are *NOT* documented as might change fixed point rounding mode, that means we must backup and restore the content of vxrm and vxsat at those library functions when we implement those functions with vector fixed-point operations to ensure we didn't violate the convention. - Another key difference between fixed-point rounding mode and floating point environment is fixed-point didn't have it own default rounding mode, and define one isn't meaningful because fixed-point rounding modes are likely to be setting to specific mode for most fixed-point algorithm, and might be changed frequently within a single algorithm implementation[2] - unlike floating point rounding modes are unlikely to change in most situation. Based on the above reason, we believe that is the best choice for vxrm and vxsat. [1] ISO/IEC 9899:2011: 7.6.3 [2] riscv/riscv-v-spec#739
BTW. Table 5 in section 3.8 of the spec about the fixed-point rounding modes is misleading, and I think should be clarified. While rounding mode "rdn" truncates the bits, in the context of division in 2's complement arithmetic, it is flooring, not truncating. It is what you'd want to use to get "no rounding" after shift, like srl/sra in the base set. Rounding mode "rnu" is the "truncating" mode, giving vaadd[u] behaviour like [v]div[u]. |
Agree there is confusion about the term truncation, whose meaning changes when passing from sign-magnitude to two's complement. Disagree about the relation between RNU and RTZ: they are distinct. |
I noticed that in the v1.0 spec, rounding add and subtract averaging instructions are present (
vaadd[u]
andvasub[u]
), but there do not appear to be any non-rounding versions. Both are very useful, because they can be used in trees to produce averaging trees of minimal bias. For example,(a + 2b + c + 2)/4
=(b + (a + c)/2 + 1)/2
=vaadd(b, missing_non_rounding_averaging(a, c))
. With only the roundingvaadd
, we can only compute(a + 2b + c + 3)/4
=vaadd(b, vaadd(a, c))
, which has worse error.The text was updated successfully, but these errors were encountered: