Skip to content

Several discrepancies in the implementation of reduction functions #3834

@zephyr111

Description

@zephyr111

In the documentation, we can read that the reduction function returns a type bigger than the input (certainly to avoid overflow):

uniform int16 reduce_add(int8 x)
uniform unsigned int16 reduce_add(unsigned int8 x)
uniform int32 reduce_add(int16 x)
uniform unsigned int32 reduce_add(unsigned int16 x)
uniform int64 reduce_add(int32 x)
uniform unsigned int64 reduce_add(unsigned int32 x)
uniform int64 reduce_add(int64 x)
uniform unsigned int64 reduce_add(unsigned int64 x)

Note that this is not the case for the 64-bit type since it is the biggest supported type so far.

In the stdlib implementation, we can see that functions __reduce_add_intxx are called internally. For example, here is 3 of them:

__declspec(safe) static inline uniform unsigned int16 reduce_add(unsigned int8 x) {
    return __reduce_add_int8(__mask ? x : (int8)0);
}
__declspec(safe) static inline uniform int32 reduce_add(int16 x) { return __reduce_add_int16(__mask ? x : (int16)0); }
__declspec(safe) static inline uniform int64 reduce_add(int32 x) {
    // Zero out the values for lanes that aren't running
    return __reduce_add_int32(__mask ? x : 0);
}

The thing is the underlying functions does not actually use the matching proper types in all cases. See builtins.isph:

EXT inline READNONE uniform int16 __reduce_add_int8(varying int8);
EXT inline READNONE uniform int16 __reduce_add_int16(varying int16);
EXT inline READNONE uniform int32 __reduce_add_int32(varying int32);
EXT inline READNONE uniform int64 __reduce_add_int64(varying int64);

We can see that only the int8 type is correctly supported here.

Fixing this will certainly impact performance (not that much overall since reductions are generally performed once after loops), but the current code is clearly bogus IMHO (at least, this is neither documented nor expected regarding the output types). Do you agree?


There is another issue in the implementation: the unsigned version just call the signed version so big unsigned integers are converted to negative value and I highly doubt the result is correct in this case. See stdlib.ispc:

__declspec(safe) static inline uniform unsigned int64 reduce_add(unsigned int32 x) {
    // Set values for non-running lanes to zero so they don't affect the
    // result.
    return __reduce_add_int32(__mask ? x : 0);
}

IMHO, unsigned functions (like __reduce_add_uint32 for the above implementation) are missing.

Metadata

Metadata

Assignees

Labels

PerformanceAll issues related to performance/code generation

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions