ENH: missing value handling in basic statistics #5416

josef-pkt · 2018-12-10T17:21:51Z

Currently we don't have much support for missing values outside of the model option for rowwise deletion.

#2630 is for improving descriptive statistics.

For univariate statistics like mean, std including robust statistics like mad or the quantile based skew and kurtosis, it would be better to have case/elementwise instead of rowwise deletion.

One problem is that simple vectorization will not work anymore if columns have nans/missing values in different rows. For simple statistics there are nan-aware functions, but we might need either a mask solution like MaskedArray statistics uses (essentially with 0/1 weights after changing to something finite) or looping over columns/series.

The text was updated successfully, but these errors were encountered:

rlucas7 · 2018-12-11T01:25:08Z

This CV post might also be relevant? (in tsa though)
https://stats.stackexchange.com/questions/381378/abbreviations-statsmodels-handling-nan

josef-pkt added type-enh comp-robust comp-stats labels Dec 10, 2018

josef-pkt mentioned this issue Dec 10, 2018

Descriptive Statistics failing with pandas dataframes #2630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: missing value handling in basic statistics #5416

ENH: missing value handling in basic statistics #5416

josef-pkt commented Dec 10, 2018 •

edited

rlucas7 commented Dec 11, 2018

ENH: missing value handling in basic statistics #5416

ENH: missing value handling in basic statistics #5416

Comments

josef-pkt commented Dec 10, 2018 • edited

rlucas7 commented Dec 11, 2018

josef-pkt commented Dec 10, 2018 •

edited