TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

twoertwein · 2020-10-07T04:49:54Z

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

The removed for-loop doesn't seem to be necessary (I hope this code is tested by an existing test).

I feel like I'm missing an obvious reason why these for-loops are needed: looking at the code I don't think we need them and the tests also pass.

jreback

yeah i don't remember why this is like this for the non-monotonic case.

can you see what tests are actually hitting this (it could be not very many / nothing).

twoertwein · 2020-10-07T17:43:23Z

there are exactly two tests that cover at least the sum part (mean/var/skew/kurtosis do not seem to be covered!).

The tests are test_indexer_constructor_arg and test_indexer_accepts_rolling_args, both from pandas/tests/window/test_base_indexer.py.

I will mark this PR as a draft and look into tests for these functions on the weekend.

I assume that the easiest way to trigger the non-monotonic branch for variable windows is by having a dataframe with a datetime index that is not sorted, or?

edit: I think the only way to trigger the variable non-monotonic part is through using a BaseIndexer with rolling. A ValueError is thrown for non-monotonic datetime.

twoertwein · 2020-10-08T22:30:50Z

I printed the values I set to zero on master: they are zero. I assume the testcase has some randomness/platform-specific behavior?

jreback · 2020-10-09T20:12:04Z

cc @mroeschke

pandas/tests/window/test_rolling.py

jreback · 2020-10-10T15:51:37Z

cc @mroeschke

pandas/tests/window/test_rolling.py

mroeschke · 2020-10-10T17:54:48Z

While we're addressing this

Could you name all the is_monotonic_* variables to is_monotonic_increasing_* just for clarity?
Could you see if we have tests for a monotonically decreasing index (and add some tests if there are none)? That's what should hit this code path

twoertwein · 2020-10-10T18:45:34Z

Could you name all the is_monotonic_* variables to is_monotonic_increasing_* just for clarity?

Will do

Could you see if we have tests for a monotonically decreasing index (and add some tests if there are none)? That's what should hit this code path

I think the not is_monotonic_bounds branch calculates the requested statistic separately for each window (naive calculation but with cython) and is probably meant for non-monotonic bounds. If decreasing indices are common (they fall currently in the not is_monotonic_bounds branch), we could have a far better alternative: re-use the is_monotonic_bounds branch but switch the add_/remove_ calls?

mroeschke · 2020-10-10T19:00:06Z

If you trace back the usage of is_monotonic_bounds, it checks the entire bounds at the start for monotonically increasing indexes.

I didn't implement the non-monotonic bound rolling case, but happy to change it if there a more efficient way to do it.

twoertwein · 2020-10-10T20:48:13Z

I didn't implement the non-monotonic bound rolling case

Isn't that already handled by the current not is_monotonic_bounds branch? Since that code calculates the statistic for each window completely independently, it can work with any windows.

I would have expected that rolling is much slower for decreasing indices as it processes windows independently of each other. At least in this simple test, that is not the case :)

df = pd.DataFrame({'values': pd.np.random.rand(1000)})
df_reverse = pd.DataFrame({'values': df['values'][::-1]}, index=df.index[::-1])

%timeit df.rolling(window=10).mean()
645 µs ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df_reverse.rolling(window=10).mean()
653 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

twoertwein · 2020-10-11T16:56:46Z

kurt/skew are sometimes different between increasing/decreasing indices (at an order of 1-e8 for kurt and 1-e10 for skew). I would like to belief that the decreasing indices are more accurate as we set the accumulated value to zero instead of removing the to be deleted values.

@mroeschke is it okay to 1) use deterministic values for testing (instead of np.random.rand) 2) and then "fine-tune" the maximal difference to avoid failing the tests?

twoertwein · 2020-10-12T21:26:50Z

should be good now :) @mroeschke @jreback

jreback

looks fine to me. please add a whatsnew note (1.2. bug fixes in rolling).

can you also hand calculate at least mean/sum in your test (i know you hard-code which is good) to make sure that we like those results. Just not 100% trusting 1.1.3

twoertwein · 2020-10-16T20:28:56Z

okay, I will manually compute the expected results.

About whatsnew: this PR doesn't add any new features and doesn't fix any bugs. Just tests and avoiding for-loops for non-increasing indices.

jreback · 2020-10-17T00:03:53Z

okay, I will manually compute the expected results.

About whatsnew: this PR doesn't add any new features and doesn't fix any bugs. Just tests and avoiding for-loops for non-increasing indices.

so results in 1.1.3 are ok, we just regressed somehow on master but haven't since haven't released 1.2 this is ok. great.

ping when ready. #37166 is going to rebase after this is in.

twoertwein · 2020-10-17T00:18:37Z

rebased and added note that the expected statistics for sum/mean have been verified.

twoertwein · 2020-10-17T01:52:41Z

@jreback green'ish (unrelated CI failure)

…c indices

mroeschke · 2020-10-25T18:39:49Z

Mind merging master and fixing up the code checks error?

twoertwein · 2020-10-25T18:45:18Z

@mroeschke I think the code check errors are not caused by this PR. I rebased it now, let's see whether that fixes it.

twoertwein · 2020-10-25T20:21:02Z

@mroeschke green except two unrelated windows CI failures

mroeschke · 2020-10-25T22:09:55Z

Thanks @twoertwein. Awesome find and patch!

…c indices (pandas-dev#36933)

twoertwein changed the title ~~roll_sum_variable: simplification for non-monotonic indices~~ CLN: roll_sum_variable: simplification for non-monotonic indices Oct 7, 2020

jreback requested changes Oct 7, 2020

View reviewed changes

jreback added the Window rolling, ewma, expanding label Oct 7, 2020

twoertwein marked this pull request as draft October 7, 2020 17:43

twoertwein force-pushed the master branch from fb00a3e to f544f43 Compare October 8, 2020 16:02

twoertwein marked this pull request as ready for review October 8, 2020 16:04

twoertwein marked this pull request as draft October 8, 2020 17:02

twoertwein force-pushed the master branch from f544f43 to 214e434 Compare October 8, 2020 20:20

twoertwein changed the title ~~CLN: roll_sum_variable: simplification for non-monotonic indices~~ TST/CLN: roll_sum/mean/var/skew/kurt_variable: simplification for non-monotonic indices Oct 8, 2020

twoertwein commented Oct 9, 2020

View reviewed changes

pandas/tests/window/test_rolling.py Show resolved Hide resolved

twoertwein force-pushed the master branch from 214e434 to 0aa950d Compare October 9, 2020 20:55

twoertwein marked this pull request as ready for review October 9, 2020 22:20

twoertwein force-pushed the master branch 2 times, most recently from 9092ed0 to 1e6b5b5 Compare October 10, 2020 14:21

twoertwein changed the title ~~TST/CLN: roll_sum/mean/var/skew/kurt_variable: simplification for non-monotonic indices~~ TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices Oct 10, 2020

twoertwein force-pushed the master branch from 1e6b5b5 to 71864fd Compare October 10, 2020 14:49

mroeschke reviewed Oct 10, 2020

View reviewed changes

pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved

twoertwein force-pushed the master branch 4 times, most recently from e6a20dd to dd7f48c Compare October 11, 2020 15:45

twoertwein force-pushed the master branch from 237a576 to b42811c Compare October 12, 2020 18:01

twoertwein force-pushed the master branch 2 times, most recently from 449faa1 to 4275c46 Compare October 15, 2020 01:30

twoertwein requested a review from jreback October 15, 2020 03:02

twoertwein force-pushed the master branch from 4275c46 to d31aa12 Compare October 16, 2020 15:09

jreback added this to the 1.2 milestone Oct 16, 2020

jreback requested changes Oct 16, 2020

View reviewed changes

jreback mentioned this pull request Oct 16, 2020

BUG: Bug in quantile() and median() returned wrong result for non monotonic window borders #37166

Merged

5 tasks

twoertwein force-pushed the master branch from d31aa12 to 6e6196f Compare October 17, 2020 00:17

twoertwein force-pushed the master branch from 6e6196f to b828420 Compare October 17, 2020 00:21

twoertwein force-pushed the master branch 2 times, most recently from 785f58c to 336ad7a Compare October 20, 2020 06:02

mroeschke approved these changes Oct 21, 2020

View reviewed changes

twoertwein force-pushed the master branch 2 times, most recently from a61e861 to 589a90f Compare October 23, 2020 01:59

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

8829bf2

…c indices

twoertwein force-pushed the master branch from 589a90f to 8829bf2 Compare October 25, 2020 18:44

mroeschke approved these changes Oct 25, 2020

View reviewed changes

mroeschke merged commit d592e5e into pandas-dev:master Oct 25, 2020

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

e910494

…c indices (pandas-dev#36933)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

CLN/TST: roll_sum/mean/var/skew/kurt: simplification for non-monotoni…

b537437

…c indices (pandas-dev#36933)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

twoertwein commented Oct 7, 2020 •

edited

Loading

jreback left a comment

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 8, 2020

jreback commented Oct 9, 2020

jreback commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020 •

edited

Loading

twoertwein commented Oct 11, 2020

twoertwein commented Oct 12, 2020

jreback left a comment

twoertwein commented Oct 16, 2020

jreback commented Oct 17, 2020

twoertwein commented Oct 17, 2020

twoertwein commented Oct 17, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 25, 2020

twoertwein commented Oct 25, 2020

mroeschke commented Oct 25, 2020

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

TST/CLN: roll_sum/mean/var/skew/kurt: simplification for non-monotonic indices #36933

Conversation

twoertwein commented Oct 7, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

twoertwein commented Oct 7, 2020 • edited Loading

twoertwein commented Oct 8, 2020

jreback commented Oct 9, 2020

jreback commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020

mroeschke commented Oct 10, 2020

twoertwein commented Oct 10, 2020 • edited Loading

twoertwein commented Oct 11, 2020

twoertwein commented Oct 12, 2020

jreback left a comment

Choose a reason for hiding this comment

twoertwein commented Oct 16, 2020

jreback commented Oct 17, 2020

twoertwein commented Oct 17, 2020

twoertwein commented Oct 17, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 25, 2020

twoertwein commented Oct 25, 2020

mroeschke commented Oct 25, 2020

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 7, 2020 •

edited

Loading

twoertwein commented Oct 10, 2020 •

edited

Loading