Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

Closed
seth-p opened this issue Aug 21, 2014 · 17 comments
Closed

BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() #8086

seth-p opened this issue Aug 21, 2014 · 17 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug

Comments

@seth-p
Copy link
Contributor

seth-p commented Aug 21, 2014

Note that for a constant series, Series.skew() returns 0, while rolling/expanding_skew() return NaN.

In [532]: s = Series([1]*4)

In [533]: s.skew()
Out[533]: 0

In [534]: expanding_skew(s)
Out[534]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [535]: rolling_skew(s, 4)
Out[535]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

While rolling/expanding_kurt() similarly return NaN for a constant series, Series.kurt() produces an error.

In [541]: expanding_kurt(s)
Out[541]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [542]: rolling_kurt(s, 4)
Out[542]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [543]: s.kurt()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-543-e41f2ec435bd> in <module>()
----> 1 s.kurt()

C:\Python34\lib\site-packages\pandas\core\generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
   3783                                               skipna=skipna)
   3784                 return self._reduce(f, axis=axis,
-> 3785                                     skipna=skipna, numeric_only=numeric_only)
   3786             stat_func.__name__ = name
   3787             return stat_func

C:\Python34\lib\site-packages\pandas\core\series.py in _reduce(self, op, axis, skipna, numeric_only, filter_type, **kwds)
   2007                 filter_type=None, **kwds):
   2008         """ perform a reduction operation """
-> 2009         return op(_values_from_object(self), skipna=skipna, **kwds)
   2010
   2011     def _reindex_indexer(self, new_index, indexer, copy):

C:\Python34\lib\site-packages\pandas\core\nanops.py in _f(*args, **kwargs)
     46                                 'this dtype'.format(f.__name__.replace('nan',
     47                                                                        '')))
---> 48             return f(*args, **kwargs)
     49         return _f
     50

C:\Python34\lib\site-packages\pandas\core\nanops.py in nankurt(values, axis, skipna)
    504     D = _zero_out_fperr(D)
    505
--> 506     result = (((count * count - 1.) * D / (B * B) - 3 * ((count - 1.) ** 2)) /
    507               ((count - 2.) * (count - 3.)))
    508     if isinstance(result, np.ndarray):

ZeroDivisionError: float division by zero
@seth-p seth-p changed the title BUG: expanding/rolling_skew() inconsistent with Series.skew() BUG: expanding/rolling_skew/kurt() inconsistent with Series.skew/kurt() Aug 21, 2014
@seth-p
Copy link
Contributor Author

seth-p commented Aug 21, 2014

My 2c: there really shouldn't be two different implementations of skew/kurtosis functions.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2014

@seth-p maybe edit the top section with a code-reference so that when this is addressed the section can be fixed (as you have commented for #7926)

@seth-p
Copy link
Contributor Author

seth-p commented Sep 16, 2014

@jreback, afraid I'm not sure I follow your last comment. Do you mean the following?

As a test for a fix, un-comment the following lines in test_moments.py:

        #(mom.expanding_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.expanding_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed
        #(mom.rolling_skew, lambda v: Series(v).skew(), 3), # restore once GH 8086 is fixed
        #(mom.rolling_kurt, lambda v: Series(v).kurt(), 4), # restore once GH 8086 is fixed

@jreback
Copy link
Contributor

jreback commented Sep 16, 2014

@seth-p I realize now why you commented these out

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 16, 2014
@jaimefrio
Copy link
Contributor

The solution to this is to modify the behavior of Series.skew() and Series.kurt() so that they return np.nan instead of 0, right?

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

I think so, assuming they are supposed to return unbiased estimates. Though to be honest I haven't examined them closely beyond observing the inconsistencies noted above.

@jaimefrio
Copy link
Contributor

I was looking into this to remove the commented tests for the rolling versions. There are a bunch of tests making sure that the return value is 0, and the code itself explicitly sets things to 0. So even if it is a bug, it was clearly considered a feature when the code was written. Not sure whether changing the behavior can be happily done without breaking someone's code...

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

relatd to this: #7928

I wouldn't return nan unless you have all missing values or its empty. 0 is the calculated value no?

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

I think, ideally, the commented-out consistency tests (#8086 (comment)) should be there. Obviously some of the cases are 0 / NaN, but others aren't.

Separately, I think should add to _test_moments_consistency() additional parameters skew_unbiased=None, kurt_unbiased=None and skew_biased=None, kurt_biased=None, and consistency checks between them and the corresponding lower-order moments (biased or unbiased, as appropriate), i.e. comparable to this test for biased variance estimates:

                if var is var_biased:
                    # check that biased var(x) == mean(x^2) - mean(x)^2
                    mean_x2 = mean(x * x)
                    assert_equal(var_x, mean_x2 - (mean_x * mean_x))

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

Regarding 0 vs NaN, even if for now we are keeping the disparate behavior ofrolling/expanding_kurt/skew() and Series.skew/kurt(), I would leave those commented-out tests there (still commented out), as an aspirational goal...

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

@jreback, if need to divide by a denominator that is 0, then I think should return NaN (analogous to calculating an unbiased standard deviation where need to divide by N-1 when N=1).

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

@seth-p yep that too!
I believe var/std do that as well
shouldn't be filled by 0 by default (though is ATM)

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

@jreback, no, at least in master, calculating unbiased (ddof=1, the default) var/std will produce NaN for a single non-NaN value (since it wants to divide by N-1 = 0). This should be the case for rolling/expanding_var(), ewmvar() (with bias=False, the default), and Series.var(). If that's not the case, and any of these produces 0 for a single non-NaN value, then I missed something in my tests...

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

@seth-p what I mean is that var/std DO provide a nan when dividing by 0. These should all be consistent (hence this issue).

@seth-p
Copy link
Contributor Author

seth-p commented Sep 30, 2014

Ah, sorry, I misunderstood you. Yes, all the var/std functions do produce a NaN when dividing by 0...

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

as far as the tests go, yes I think will have to change those tests and test for nan in those edge cases.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@TomAugspurger
Copy link
Contributor

In [133]: s.rolling(4).kurt()
Out[133]:
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

In [134]: s.kurt()
Out[134]: 0

@TomAugspurger TomAugspurger modified the milestones: Contributions Welcome, No action Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Projects
None yet
Development

No branches or pull requests

4 participants