Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Rolling for multi-index and reversed index. #28297

Merged
merged 12 commits into from Oct 22, 2019
Merged

fix Rolling for multi-index and reversed index. #28297

merged 12 commits into from Oct 22, 2019

Conversation

leftys
Copy link
Contributor

@leftys leftys commented Sep 5, 2019

Fix Rolling operation for level of multi-index and descending time index (that is monotonic, but decreasing).

@leftys leftys marked this pull request as ready for review September 5, 2019 15:04
@WillAyd
Copy link
Member

WillAyd commented Sep 5, 2019

Does this close a particular issue? Also tests are the most critical part of any PR so would be great if you can add

@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 5, 2019
@leftys
Copy link
Contributor Author

leftys commented Sep 5, 2019

Does this close a particular issue?

I found those two just now:
#19248
#15584

Also tests are the most critical part of any PR so would be great if you can add

Will do.

@pep8speaks
Copy link

pep8speaks commented Sep 6, 2019

Hello @leftys! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-19 20:11:05 UTC

@leftys
Copy link
Contributor Author

leftys commented Sep 6, 2019

@WillAyd I added tests and updated whatsnew to reference the two issues.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good generally

pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved
pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved
pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved
pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved
pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved
pandas/tests/window/test_timeseries_window.py Outdated Show resolved Hide resolved
pandas/core/window/rolling.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Sep 7, 2019

can you update the top section with the issue references (use a closes #....)

@leftys
Copy link
Contributor Author

leftys commented Sep 7, 2019

@jreback Fixed according to review comments.

@leftys leftys requested a review from jreback September 20, 2019 21:01
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

"""
if not self._on.is_monotonic:
formatted = self.on or "index"
if not (self._on.is_monotonic_increasing or self._on.is_monotonic_decreasing):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised this doesn't match the previous definition of just is_monotonic. Would welcome that as a follow up to align the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean updating is_monotonic to return True also for monotonic decreasing? I tried that, but some of its callers rely on the fact that the sequence is also increasing and many tests failed afterwards. So I would rather not do that, especially not in this PR.

@leftys leftys requested a review from WillAyd October 15, 2019 20:14
@jreback jreback added this to the 1.0 milestone Oct 16, 2019
[date_range("20190101", periods=3), range(2)], names=["date", "seq"]
),
)
result = df.rolling("10d", on=df.index.get_level_values("date")).sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very odd way of calling this as it should be a scalar. why do you think we should allow this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I understand you. how about passing something like level = 0 to identify the datetime column of multi index to use for comparing rolling window?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right what I mean, is that this doesn't make rolling any more friendly to a Multi-index. we would need to add a level= kwarg that can be used internall to create the on. Would take a followup PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@leftys leftys requested a review from jreback October 18, 2019 14:53
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, ping on green

elif isinstance(self.obj, ABCDataFrame) and self.on in self.obj.columns:
return Index(self.obj[self.on])
else:
raise ValueError(
"invalid on specified as {0}, "
"must be a column (if DataFrame) "
"must be a column (of DataFrame), an Index "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string (and type annoation if its there) for on in Rolling

# non-monotonic
df.index = reversed(df.index.tolist())
def test_non_monotonic_on(self):
df = DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment

df = DataFrame(
{"A": date_range("20130101", periods=5, freq="s"), "B": range(5)}
)
df.set_index("A", inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use inplace in tests at all, rather df = df.set_index(....)

[date_range("20190101", periods=3), range(2)], names=["date", "seq"]
),
)
result = df.rolling("10d", on=df.index.get_level_values("date")).sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right what I mean, is that this doesn't make rolling any more friendly to a Multi-index. we would need to add a level= kwarg that can be used internall to create the on. Would take a followup PR for that.

@WillAyd WillAyd merged commit d10e25a into pandas-dev:master Oct 22, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 22, 2019

Thanks @leftys - very nice PR!

HawkinsBA pushed a commit to HawkinsBA/pandas that referenced this pull request Oct 29, 2019
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
df = DataFrame({"column": [3, 4, 4, 2, 1]}, index=reversed(index))
result = df.rolling("2s").min()
expected = DataFrame(
{"column": [3.0, 3.0, 3.0, 2.0, 1.0]}, index=reversed(index)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the 2 seconds time window is not respected, but the minimum is taken over the whole dataframe. To my understanding, the excpected results would be [3, 4, 2, 1, 1] (closed interval limits) or [3, 4, 2, 2, 1] (open interval end). Even if you consider the next 2 seconds in time order, i.e. the other way round, you would have [3, 3, 4, 2, 1].

Copy link

@revv00 revv00 Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue found here. By using df.rolling().apply(func), we can print the window inside func and it is the while dataframe till current offset.

Copy link
Contributor Author

@leftys leftys Mar 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the expected result is wrong. I debugged it a bit and found out that VariableWindowIndexer indeed works only if the index is monotonic and increasing. I will create another PR to finally fix this.

EDIT: Here it comes: #32386

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: support monotonic_decreasing on rolling rolling( window='10D') does not work for df with MultiIndex
6 participants