Several discrepancies in the implementation of reduction functions

In the documentation, we can read that the reduction function returns a type bigger than the input (certainly to avoid overflow):
```ispc
uniform int16 reduce_add(int8 x)
uniform unsigned int16 reduce_add(unsigned int8 x)
uniform int32 reduce_add(int16 x)
uniform unsigned int32 reduce_add(unsigned int16 x)
uniform int64 reduce_add(int32 x)
uniform unsigned int64 reduce_add(unsigned int32 x)
uniform int64 reduce_add(int64 x)
uniform unsigned int64 reduce_add(unsigned int64 x)
```
Note that this is not the case for the 64-bit type since it is the biggest supported type so far.

In the stdlib implementation, we can see that functions `__reduce_add_intxx` are called internally. For example, here is 3 of them:
```ispc
__declspec(safe) static inline uniform unsigned int16 reduce_add(unsigned int8 x) {
    return __reduce_add_int8(__mask ? x : (int8)0);
}
__declspec(safe) static inline uniform int32 reduce_add(int16 x) { return __reduce_add_int16(__mask ? x : (int16)0); }
__declspec(safe) static inline uniform int64 reduce_add(int32 x) {
    // Zero out the values for lanes that aren't running
    return __reduce_add_int32(__mask ? x : 0);
}
```

The thing is the underlying functions does not actually use the matching proper types in all cases. See `builtins.isph`:
```ispc
EXT inline READNONE uniform int16 __reduce_add_int8(varying int8);
EXT inline READNONE uniform int16 __reduce_add_int16(varying int16);
EXT inline READNONE uniform int32 __reduce_add_int32(varying int32);
EXT inline READNONE uniform int64 __reduce_add_int64(varying int64);
```
We can see that only the `int8` type is correctly supported here.

Fixing this will certainly impact performance (not that much overall since reductions are generally performed once after loops), but the current code is clearly bogus IMHO (at least, this is neither documented nor expected regarding the output types). Do you agree?

-----

There is another issue in the implementation: the unsigned version just call the signed version so big unsigned integers are converted to negative value and I highly doubt the result is correct in this case. See `stdlib.ispc`:

```ispc
__declspec(safe) static inline uniform unsigned int64 reduce_add(unsigned int32 x) {
    // Set values for non-running lanes to zero so they don't affect the
    // result.
    return __reduce_add_int32(__mask ? x : 0);
}
```

IMHO, unsigned functions (like `__reduce_add_uint32` for the above implementation) are missing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several discrepancies in the implementation of reduction functions #3834

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Several discrepancies in the implementation of reduction functions #3834

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions