BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

seth-p · 2014-08-21T01:13:34Z

Note that for a constant series, Series.skew() returns 0, while rolling/expanding_skew() return NaN.

In [532]: s = Series([1]*4)

In [533]: s.skew()
Out[533]: 0

In [534]: expanding_skew(s)
Out[534]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [535]: rolling_skew(s, 4)
Out[535]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

While rolling/expanding_kurt() similarly return NaN for a constant series, Series.kurt() produces an error.

In [541]: expanding_kurt(s)
Out[541]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [542]: rolling_kurt(s, 4)
Out[542]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [543]: s.kurt()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-543-e41f2ec435bd> in <module>()
----> 1 s.kurt()

C:\Python34\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
   3783                                               skipna=skipna)
   3784                 return self._reduce(f, axis=axis,
-> 3785                                     skipna=skipna, numeric_only=numeric_only)
   3786             stat_func.__name__ = name
   3787             return stat_func

C:\Python34\lib\site-packages\pandas\core\series.py in _reduce(self, op, axis, skipna, numeric_only, filter_type, **kwds)
   2007                 filter_type=None, **kwds):
   2008         """ perform a reduction operation """
-> 2009         return op(_values_from_object(self), skipna=skipna, **kwds)
   2010
   2011     def _reindex_indexer(self, new_index, indexer, copy):

C:\Python34\lib\site-packages\pandas\core\nanops.py in _f(*args, **kwargs)
     46                                 'this dtype'.format(f.__name__.replace('nan',
     47                                                                        '')))
---> 48             return f(*args, **kwargs)
     49         return _f
     50

C:\Python34\lib\site-packages\pandas\core\nanops.py in nankurt(values, axis, skipna)
    504     D = _zero_out_fperr(D)
    505
--> 506     result = (((count * count - 1.) * D / (B * B) - 3 * ((count - 1.) ** 2)) /
    507               ((count - 2.) * (count - 3.)))
    508     if isinstance(result, np.ndarray):

ZeroDivisionError: float division by zero

The text was updated successfully, but these errors were encountered:

seth-p · 2014-08-21T01:22:23Z

My 2c: there really shouldn't be two different implementations of skew/kurtosis functions.

jreback · 2014-09-10T00:01:36Z

@seth-p maybe edit the top section with a code-reference so that when this is addressed the section can be fixed (as you have commented for #7926)

seth-p · 2014-09-16T02:37:23Z

@jreback, afraid I'm not sure I follow your last comment. Do you mean the following?

As a test for a fix, un-comment the following lines in test_moments.py:

        #(mom.expanding_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.expanding_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed

        #(mom.rolling_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.rolling_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed

jreback · 2014-09-16T14:46:18Z

@seth-p I realize now why you commented these out

jaimefrio · 2014-09-30T05:42:31Z

The solution to this is to modify the behavior of Series.skew() and Series.kurt() so that they return np.nan instead of 0, right?

seth-p · 2014-09-30T05:52:01Z

I think so, assuming they are supposed to return unbiased estimates. Though to be honest I haven't examined them closely beyond observing the inconsistencies noted above.

jaimefrio · 2014-09-30T17:20:19Z

I was looking into this to remove the commented tests for the rolling versions. There are a bunch of tests making sure that the return value is 0, and the code itself explicitly sets things to 0. So even if it is a bug, it was clearly considered a feature when the code was written. Not sure whether changing the behavior can be happily done without breaking someone's code...

jreback · 2014-09-30T17:33:50Z

relatd to this: #7928

I wouldn't return nan unless you have all missing values or its empty. 0 is the calculated value no?

seth-p · 2014-09-30T17:34:06Z

I think, ideally, the commented-out consistency tests (#8086 (comment)) should be there. Obviously some of the cases are 0 / NaN, but others aren't.

Separately, I think should add to _test_moments_consistency() additional parameters skew_unbiased=None, kurt_unbiased=None and skew_biased=None, kurt_biased=None, and consistency checks between them and the corresponding lower-order moments (biased or unbiased, as appropriate), i.e. comparable to this test for biased variance estimates:

                if var is var_biased:
                    # check that biased var(x) == mean(x^2) - mean(x)^2
                    mean_x2 = mean(x * x)
                    assert_equal(var_x, mean_x2 - (mean_x * mean_x))

seth-p · 2014-09-30T17:40:44Z

Regarding 0 vs NaN, even if for now we are keeping the disparate behavior ofrolling/expanding_kurt/skew() and Series.skew/kurt(), I would leave those commented-out tests there (still commented out), as an aspirational goal...

seth-p · 2014-09-30T17:54:45Z

@jreback, if need to divide by a denominator that is 0, then I think should return NaN (analogous to calculating an unbiased standard deviation where need to divide by N-1 when N=1).

jreback · 2014-09-30T18:04:16Z

@seth-p yep that too!
I believe var/std do that as well
shouldn't be filled by 0 by default (though is ATM)

seth-p · 2014-09-30T18:10:16Z

@jreback, no, at least in master, calculating unbiased (ddof=1, the default) var/std will produce NaN for a single non-NaN value (since it wants to divide by N-1 = 0). This should be the case for rolling/expanding_var(), ewmvar() (with bias=False, the default), and Series.var(). If that's not the case, and any of these produces 0 for a single non-NaN value, then I missed something in my tests...

jreback · 2014-09-30T18:12:05Z

@seth-p what I mean is that var/std DO provide a nan when dividing by 0. These should all be consistent (hence this issue).

seth-p · 2014-09-30T18:15:53Z

Ah, sorry, I misunderstood you. Yes, all the var/std functions do produce a NaN when dividing by 0...

jreback · 2014-09-30T18:18:11Z

as far as the tests go, yes I think will have to change those tests and test for nan in those edge cases.

TomAugspurger · 2018-07-06T22:56:28Z

In [133]: s.rolling(4).kurt()
Out[133]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [134]: s.kurt()
Out[134]: 0

seth-p changed the title ~~BUG: expanding/rolling_skew() inconsistent with Series.skew()~~ BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() Aug 21, 2014

seth-p mentioned this issue Aug 21, 2014

BUG: Inconsistencies in calculating second moments of a single value #7900

Closed

TomAugspurger added Algos labels Aug 21, 2014

TomAugspurger added this to the 0.15.0 milestone Aug 21, 2014

This was referenced Sep 15, 2014

PERF: Poor numerical stability of rolling_kurt and rolling_skew #6929

Closed

use z-scores in rolling skew/kurt calculations #8270

Closed

jreback modified the milestones: 0.15.1, 0.15.0 Sep 16, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

TomAugspurger closed this as completed Jul 6, 2018

TomAugspurger modified the milestones: Contributions Welcome, No action Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

seth-p commented Aug 21, 2014

seth-p commented Aug 21, 2014

jreback commented Sep 10, 2014

seth-p commented Sep 16, 2014

jreback commented Sep 16, 2014

jaimefrio commented Sep 30, 2014

seth-p commented Sep 30, 2014

jaimefrio commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

seth-p commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

TomAugspurger commented Jul 6, 2018

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

Comments

seth-p commented Aug 21, 2014

seth-p commented Aug 21, 2014

jreback commented Sep 10, 2014

seth-p commented Sep 16, 2014

jreback commented Sep 16, 2014

jaimefrio commented Sep 30, 2014

seth-p commented Sep 30, 2014

jaimefrio commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

seth-p commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

seth-p commented Sep 30, 2014

jreback commented Sep 30, 2014

TomAugspurger commented Jul 6, 2018