In the documentation, we can read that the reduction function returns a type bigger than the input (certainly to avoid overflow):
uniform int16 reduce_add(int8 x)
uniform unsigned int16 reduce_add(unsigned int8 x)
uniform int32 reduce_add(int16 x)
uniform unsigned int32 reduce_add(unsigned int16 x)
uniform int64 reduce_add(int32 x)
uniform unsigned int64 reduce_add(unsigned int32 x)
uniform int64 reduce_add(int64 x)
uniform unsigned int64 reduce_add(unsigned int64 x)
Note that this is not the case for the 64-bit type since it is the biggest supported type so far.
In the stdlib implementation, we can see that functions __reduce_add_intxx are called internally. For example, here is 3 of them:
__declspec(safe) static inline uniform unsigned int16 reduce_add(unsigned int8 x) {
return __reduce_add_int8(__mask ? x : (int8)0);
}
__declspec(safe) static inline uniform int32 reduce_add(int16 x) { return __reduce_add_int16(__mask ? x : (int16)0); }
__declspec(safe) static inline uniform int64 reduce_add(int32 x) {
// Zero out the values for lanes that aren't running
return __reduce_add_int32(__mask ? x : 0);
}
The thing is the underlying functions does not actually use the matching proper types in all cases. See builtins.isph:
EXT inline READNONE uniform int16 __reduce_add_int8(varying int8);
EXT inline READNONE uniform int16 __reduce_add_int16(varying int16);
EXT inline READNONE uniform int32 __reduce_add_int32(varying int32);
EXT inline READNONE uniform int64 __reduce_add_int64(varying int64);
We can see that only the int8 type is correctly supported here.
Fixing this will certainly impact performance (not that much overall since reductions are generally performed once after loops), but the current code is clearly bogus IMHO (at least, this is neither documented nor expected regarding the output types). Do you agree?
There is another issue in the implementation: the unsigned version just call the signed version so big unsigned integers are converted to negative value and I highly doubt the result is correct in this case. See stdlib.ispc:
__declspec(safe) static inline uniform unsigned int64 reduce_add(unsigned int32 x) {
// Set values for non-running lanes to zero so they don't affect the
// result.
return __reduce_add_int32(__mask ? x : 0);
}
IMHO, unsigned functions (like __reduce_add_uint32 for the above implementation) are missing.
In the documentation, we can read that the reduction function returns a type bigger than the input (certainly to avoid overflow):
Note that this is not the case for the 64-bit type since it is the biggest supported type so far.
In the stdlib implementation, we can see that functions
__reduce_add_intxxare called internally. For example, here is 3 of them:The thing is the underlying functions does not actually use the matching proper types in all cases. See
builtins.isph:We can see that only the
int8type is correctly supported here.Fixing this will certainly impact performance (not that much overall since reductions are generally performed once after loops), but the current code is clearly bogus IMHO (at least, this is neither documented nor expected regarding the output types). Do you agree?
There is another issue in the implementation: the unsigned version just call the signed version so big unsigned integers are converted to negative value and I highly doubt the result is correct in this case. See
stdlib.ispc:IMHO, unsigned functions (like
__reduce_add_uint32for the above implementation) are missing.