use z-scores in rolling skew/kurt calculations #8270

behzadnouri · 2014-09-14T20:13:46Z

adds some stability to rolling skew and kurt calculation by using the z-scores instead of original values; related: #6929

on master:

In [3]: np.random.seed(2718281)
In [4]: xs = np.random.rand(10)

In [5]: pd.stats.moments.rolling_skew(xs, 4) - pd.stats.moments.rolling_skew(xs + 5000, 4)
Out[5]: 
array([             nan,              nan,              nan,
        -1.64816426e-03,   1.10392004e-03,   1.09544324e-03,
         7.99253200e-03,   8.66733795e-05,  -4.48450842e-03,
        -6.03475754e-03])

In [6]: pd.stats.moments.rolling_kurt(xs, 4) - pd.stats.moments.rolling_kurt(xs + 100, 4)
Out[6]: 
array([             nan,              nan,              nan,
        -1.39880824e-04,  -2.91491317e-04,  -2.35463343e-04,
        -7.36046530e-04,  -3.99141811e-04,  -8.61177376e-05,
         9.26058376e-06])

on branch:

In [7]: pd.stats.moments.rolling_skew(xs, 4) - pd.stats.moments.rolling_skew(xs + 5000, 4)
Out[7]: 
array([             nan,              nan,              nan,
        -1.03450581e-12,   7.36521955e-13,   2.42605935e-12,
        -2.43799425e-12,  -1.85962357e-12,  -1.06048503e-12,
         1.66638925e-12])

In [8]: pd.stats.moments.rolling_kurt(xs, 4) - pd.stats.moments.rolling_kurt(xs + 100, 4)
Out[8]: 
array([             nan,              nan,              nan,
        -1.24344979e-13,  -7.99360578e-14,   6.75015599e-14,
        -9.05941988e-14,  -5.15143483e-14,  -1.11910481e-13,
        -5.15143483e-14])

This does not handle all situations that causes instability, but I think it is generally more stable to work with z-scores; In particular, affine transformations generate same results.

Also, input is a python 3 built-in. I suggest renaming all instances of input in algo.pyx file to inp or something else.

seth-p · 2014-09-15T19:58:17Z

Note also #8086.

behzadnouri · 2014-09-15T20:08:57Z

@seth-p FYI, a constant series has zero standard deviation, so the skew should be nan not zero. so, that is a bug in Series.skew not rolling/expanding_skew and this PR is independent of #8086

seth-p · 2014-09-15T20:17:32Z

Yep, separate issue. Just hoping someone (:-)) will want to unify Series.skew and rolling/expanding_skew so that they rely on a single code base and are consistent. And similarly with the kurt() functions.

jreback · 2014-09-16T14:41:36Z

@behzadnouri can you create a separate issue about Series.skew returning nan (not 0) for a constant series?

jreback · 2014-09-16T14:41:57Z

@behzadnouri tests!

jreback · 2014-09-16T14:42:22Z

does ths close #6929 or is their a further extension needed and ths is a partial cas

jreback · 2014-09-16T14:44:49Z

cc @jaimefro

jreback · 2014-09-16T14:48:03Z

@behzadnouri I realized #8086 is this issue @seth-p pointing at (unifying skew)

behzadnouri · 2014-09-25T13:44:19Z

@jreback tests added

jreback · 2014-09-26T16:08:56Z

well, I think a more complete soln to #6929 is in order no?

@seth-p
cc @jaimefrio

jaimefrio · 2014-09-26T16:21:56Z

You can do much, much better... I have a PR halfway through here, which uses the Welford-like algorithm described here. The case where and observation is removed and another added is not worked out there (or anywhere else as far as I can tell) and you have to figure it out on your own.

behzadnouri · 2014-09-26T17:46:34Z

@jaimefrio I tried to take that route before making my code changes and that is definitely the right way to go if the code was doing expanding window moments not rolling window skew/kurt while handling nan values gracefully.

take kurtosis for example, you need to maintain effective number of observations n (that is the count of not null values within the window), and 4 central moments. that is 5 updating equations which is fine. but there are 4 cases for each equation based on the new value and dropping value being null or not. that is 5 * 4 = 20 equations with all their corner cases and chances of making mistakes. (think about which n you should use to update the moments if the first value is null versus the new value is null, new n or old n)

and even doing all that, at the end of the day, it is not like:

You can do much, much better...

no! definitely not. for all real world series that I have seen, and any time series which does not change scale/location with multiple orders of magnitude z-scores will do the job with minimal code complexity and a lot more code maintainability.

jaimefrio · 2014-09-26T22:27:57Z

I don't know... A mock up of your solution runs on my system a little over 2x slower than the current code. And while it improves stability for what arguably are the most common use cases, it is still going to fail for a number of not-so-pathological ones, e.g. x= np.random.rand(1000); x[500:] += 10000; rolling_skew(x, 100, 0) is all wrong. That could be OK, were it not because there is an algorithm that provides superior stability, and better performance (roll_var was slowed down about 15%, if memory serves me well).

I am not much of a Pandas user or contributor, so I may be getting the philosophy all wrong. But on a project that uses home brewed implementations of skip lists (for roll_median) and of circular deques (for roll_min and roll_max), and that advertises itself as "providing high-performance, easy-to-use data structures", choosing an objectively worse algorithm because it makes code less complex and more maintainable seems like a pretty weak argument. Especially since you have the complexity packaged away inside a function, that can very easily be checked to ensure that all code paths are working fine with 10 or 20 lines of test code.

Just to make it explicitly clear, I am -1 on this approach. But if it is deemed the right one, I would suggest that you put your code in the Python wrapper, as I really don't think you are taking any advantage of Cython.

jreback · 2014-09-26T22:40:30Z

@jaimefrio your impl that you pointed above seems pretty straightforward and you have nicely refactored out the counting issue. I think it seems pretty reasonable and is likely to have similar perf to the current routines, pls continue work and submit a PR when ready.

behzadnouri · 2014-09-27T00:45:56Z

@jaimefrio what you are calling "not-so-pathological" has a jump more thank 34K of trailing standard deviation:

In [28]: np.diff(x)[499] / x[:500].std()
Out[28]: 34371.41

what actual real-world time series makes such a jump and still someone would ask "lets look at skew numbers"?

choosing an objectively worse algorithm because it makes code less complex and more maintainable seems like a pretty weak argument

you are calling it "objectively worse" based on a contrived example, while at the same time admitting

improves stability for what arguably are the most common use cases,

and looking at this i honestly do not know what to say for what you call "pretty weak argument".

jaimefrio · 2014-09-27T01:32:35Z

@behzadnouri I was calling it objectively worse because I timed it an it run 2x slower than the current code. Which is kind of surprisingly fast, since you are iterating at least eight times over the input array before spitting an answer.

jaimefrio · 2014-09-27T06:15:17Z

It turns out that it isn't that hard to come up with very simple examples that fail big time:

import pandas as pd
from pandas.algos import roll_kurt
import numpy as np
from scipy.stats import kurtosis

# Sanity check
x = np.random.rand(1e6)
y = x - x.mean(); y /= y.std()
>>> roll_kurt(x, 1000, 0)[-1]
-1.1175114235008137
>>> roll_kurt(y, 1000, 0)[-1]
-1.1175114234982957
>>> kurtosis(y[-1000:], bias=False)
-1.1175114234980807

# Meltdown
x = np.linspace(0, 1000, 1e6)
y = x - x.mean(); y /= y.std()
>>> roll_kurt(x, 1000, 0)[-1]
10.397302582213468
>>> roll_kurt(y, 1000, 0)[-1]
-0.46924667029766476
>>> kurtosis(x[-1000:], bias=False)
-1.2000000000000006

behzadnouri · 2014-09-27T14:07:18Z

@jaimefrio the numbers you are quoting do not come out of this code (especially that 10.39... which is extremely off from output of this patch). you may want to check your mock up code.

That you have got even this simpler algorithm wrong, proves my point of why code simplicity and maintainability matters.

also, you are looking at a series with such characteristics:

In [48]: (x[-1000:].std()) / x.std()
Out[48]: 0.0010

In [49]: (x[-1000:].mean() - x[:1000].mean()) / x.std()
Out[49]: 3.4606

In [50]: (x[-1000:].mean() - x[:1000].mean()) / x[-1000:].std() # same as `/ x[:1000].std()`
Out[50]: 3460.6

Do you honestly believe that a series which changes location around 3.5 times of its standard deviation (3460 of rolling window standard deviation) is a use case for kurtosis as a measure of tail heaviness? does anyone out there care if tails are heavy when the location can already move by orders of magnitude?

I am happy that you mentioned earlier that this patch

improves stability for what arguably are the most common use cases,

and that your mock up code passes sanity check for the sane example. That said, I do not feel this discussion is an effective use of our time. If you think I am wrong, please spend time on implementing what you think is right, rather than trying to prove me wrong.

Please do not continue this argument, and instead focus on correct implementation of what you think is a better path.

jaimefrio · 2014-09-29T07:21:55Z

Trust me, @behzadnouri, I have better things to do than proving you wrong. This is supposed to be a collaborative effort to produce great software, not a contest to see who has the bestest PR. Let's try to get back to that...

I have done some more tests over the weekend, and it is terribly easy to find simple examples with terrible behavior. Perhaps the most illustrative is a long enough linear ramp, with a sufficiently small window. The kurtosis should be -1.2 for any window size, but here's what you get:

x = np.linspace(0, 1, 10000)
>>> roll_kurt(x, 10, 0)[-1]  # current master, completely broken
39.756569994447986
>>> roll_kurt((x - x.mean()) / x.std(), 10, 0)[-1]  # this PR, better but 30% off
-1.5704869226377387
>>> roll_kurt(x - x.mean(), 10, 0)[-1]  # similar idea to this PR, faster, less robust
-0.15264283115696387
>>> roll_kurt2(x ,10, 0)[-1]  # a stabler algorithm, not bad, but 40% off
-1.6595757481342461
>>> roll_kurt2(x - x.mean(), 10, 0)[-1]  # stabler algo + centering, getting close
-1.4211005796851859
>>> roll_kurt2((x - x.mean()) / x.std(), 10, 0)[-1]  # stabler algo + this PR, almost (4% off)
-1.2454174095606518

There is also the thing with performance... Some timings to put things in context, with the same above x:

%timeit roll_kurt(x, 10, 0)  # current master is terribly slow
1000 loops, best of 3: 890 µs per loop

%timeit roll_kurt2(x, 10, 0)  # better stability and 2.5x faster
1000 loops, best of 3: 375 µs per loop

%%timeit  # this PR preprocessing mock-up
... mask = np.isfinite(x)
... y = x.copy()
... y[mask] -= np.mean(x[mask])
... y[mask] /= np.std(x[mask])
1000 loops, best of 3: 333 µs per loop

%%timeit  # similar to this PR, 25% faster
... mask = np.isfinite(x)
... y = x.copy()
... y -= np.mean(x[mask])
... y /= np.std(x[mask])
1000 loops, best of 3: 244 µs per loop

%%timeit  # centering w/o normalizing is 2.5x faster
... mask = np.isfinite(x)
... y = x.copy()
... y -= np.mean(x[mask])
10000 loops, best of 3: 103 µs per loop

In [27]: %timeit y = x - x[mask].mean()  # building the mask is not cheap
10000 loops, best of 3: 64.4 µs per loop

I think there is plenty of food for thought here. My views on this:

The current algorithms in the core roll_skew and roll_kurt are very, very bad, and need to be replaced. I will do that shortly.
Getting rolling skewness and kurtosis right is hard! Using z-scores, or at least centering, may be a good idea, but you have to be careful with performance.

So based on all of the above, my original -1 is now a +0.25. To make it a full +1, I would suggest:

Move the code to the Python wrapper. When the Python functions rolling_skew or rolling_kurt are called, before calling the Cython functions roll_skew or roll_kurt, a call is made to _rolling_moment, which calls _process_data_structure, where a mask of inf values in the arrays is already being calculated, see here. If you can mix both maskings into a single one, you can minimize the effect on performance, and get the extra robustness almost for free.
I would skip the division by the standard deviation. It is taking 50% of the time (more if the maskings are joined), and for most cases it seems to have a marginal effect on robustness.

jreback · 2014-09-29T12:57:55Z

@jaimefrio sounds like a nice comprehensive approach. Pls submit a PR when ready.

behzadnouri force-pushed the z-score branch 3 times, most recently from c232707 to 28cea38 Compare September 15, 2014 19:10

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Sep 16, 2014

use z-scores in rolling skew/kurt calculations

5b60546

behzadnouri force-pushed the z-score branch from 28cea38 to 5b60546 Compare September 25, 2014 12:41

jreback added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Sep 26, 2014

jreback added this to the 0.15.1 milestone Sep 26, 2014

jreback closed this Sep 29, 2014

jaimefrio mentioned this pull request Sep 30, 2014

ENH: numerically stable rolling_skew and rolling_kurt #8424

Closed

jreback mentioned this pull request Jan 6, 2016

Series.skew() for constant series returns inconsistent values #11974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use z-scores in rolling skew/kurt calculations #8270

use z-scores in rolling skew/kurt calculations #8270

behzadnouri commented Sep 14, 2014

seth-p commented Sep 15, 2014

behzadnouri commented Sep 15, 2014

seth-p commented Sep 15, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

behzadnouri commented Sep 25, 2014

jreback commented Sep 26, 2014

jaimefrio commented Sep 26, 2014

behzadnouri commented Sep 26, 2014

jaimefrio commented Sep 26, 2014

jreback commented Sep 26, 2014

behzadnouri commented Sep 27, 2014

jaimefrio commented Sep 27, 2014

jaimefrio commented Sep 27, 2014

behzadnouri commented Sep 27, 2014

jaimefrio commented Sep 29, 2014

jreback commented Sep 29, 2014

use z-scores in rolling skew/kurt calculations #8270

use z-scores in rolling skew/kurt calculations #8270

Conversation

behzadnouri commented Sep 14, 2014

seth-p commented Sep 15, 2014

behzadnouri commented Sep 15, 2014

seth-p commented Sep 15, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

jreback commented Sep 16, 2014

behzadnouri commented Sep 25, 2014

jreback commented Sep 26, 2014

jaimefrio commented Sep 26, 2014

behzadnouri commented Sep 26, 2014

jaimefrio commented Sep 26, 2014

jreback commented Sep 26, 2014

behzadnouri commented Sep 27, 2014

jaimefrio commented Sep 27, 2014

jaimefrio commented Sep 27, 2014

behzadnouri commented Sep 27, 2014

jaimefrio commented Sep 29, 2014

jreback commented Sep 29, 2014