Overflow when running reductions on float16 columns in pandas Series #22841
Labels
Dtype Conversions
Unexpected or buggy dtype conversions
Numeric Operations
Arithmetic, Comparison, and Logical operations
Reduction Operations
sum, mean, min, max, etc.
When running reductions on dataframe columns of dtype float16, we ran into a surprising behaviour:
After investigation, we found that the accumulator used in (for example) mean() and sum() is not big enough and it eventually overflows. In both nansum() and nanmean() functions (to which sum() and mean() delegate their work), when the data type is 'float', the accumulator is downcasted to the original dtype of the data:
pandas/pandas/core/nanops.py
Line 333 in af7b0ba
pandas/pandas/core/nanops.py
Line 352 in af7b0ba
In our case, because the original dtype is float16, the accumulator is downcasted to float16, for example in nansum():
(Similar code in
nanmean
.)However, pandas did not always behave like that. The current behaviour was added in the following commits, to solve other bugs:
Also, for the "int" codepaths, the accumulator is never downcast and always set to float64. It is only for the "float" cases that the size of the accumulator is set to be the same as the dtype of the column.
We'd be willing to submit a pull request, but we're not sure what the best fix here would be. Should we just always have a float64 accumulator in these functions for the float cases, instead of downcasting it? If not, what would a good fix look like?
By the way, things are a bit different in numpy, with the accumulator being set to float64 in more cases, and with the option for users to specify the dtype of the accumulator (and at the same time the output). Having the same option in pandas would have allowed us to at least work around this, by requesting a float64 accumulator. What there a decision made in pandas not to offer a
dtype
argument to sum, etc. like numpy does? Otherwise, we could implement that also in the pull request.(Cc @chrish42)
The text was updated successfully, but these errors were encountered: