Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EHN: Implement closed=left|right|neither in DataFrame.rolling for fixed windows #34315

Closed
2 of 3 tasks
victorbrjorge opened this issue May 22, 2020 · 3 comments · Fixed by #37207
Closed
2 of 3 tasks
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Window rolling, ewma, expanding
Milestone

Comments

@victorbrjorge
Copy link

victorbrjorge commented May 22, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

data = [0, 1, 1, 0, 0, 1, 0, 1]
df = pd.DataFrame({'binary_col': data}, index=pd.date_range(start='2020-01-01', freq="min", periods=len(data)))

rolling = df.rolling(window=len(df), closed='left', min_periods=1)
rolling.mean()

Problem description

The rolling window ignores the closed='left' parameter. Since it was supposed to be closed only on the left side, the current right side of the window should not be used on the calculations. But it turns out it is.

Original dataframe

index binary_col
2020-01-01 00:00:00 0
2020-01-01 00:01:00 1
2020-01-01 00:02:00 1
2020-01-01 00:03:00 0
2020-01-01 00:04:00 0
2020-01-01 00:05:00 1
2020-01-01 00:06:00 0
2020-01-01 00:07:00 1

Actual output

index binary_col
2020-01-01 00:00:00 0.000000
2020-01-01 00:01:00 0.500000
2020-01-01 00:02:00 0.666667
2020-01-01 00:03:00 0.500000
2020-01-01 00:04:00 0.400000
2020-01-01 00:05:00 0.500000
2020-01-01 00:06:00 0.428571
2020-01-01 00:07:00 0.500000

Expected Output

index binary_col
2020-01-01 00:00:00 NaN
2020-01-01 00:01:00 0.000000
2020-01-01 00:02:00 0.500000
2020-01-01 00:03:00 0.666667
2020-01-01 00:04:00 0.500000
2020-01-01 00:05:00 0.400000
2020-01-01 00:06:00 0.500000
2020-01-01 00:07:00 0.428571

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-70-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : pt_BR.UTF-8

pandas : 1.0.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.2.0.post20200511
Cython : None
pytest : 5.4.2
hypothesis : None
sphinx : 3.0.3

@victorbrjorge victorbrjorge added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2020
@dsaxton dsaxton added the Window rolling, ewma, expanding label May 23, 2020
@jbrockmendel jbrockmendel removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jun 5, 2020
@phofl
Copy link
Member

phofl commented Sep 29, 2020

The question here is, if window=8 should behave like window="8s", e.g. if window=8 should be a time aware window.

If yes, we would break the following behavior:

index = pd.bdate_range(start='1/1/2018', end='1/10/2018')
data = [0, 1, 1, 0, 0, 1, 0, 1]
df = pd.DataFrame({'binary_col': data},
                  index=index
                  )

rolling = df.rolling(8, closed='right', min_periods=1)
print(rolling.sum())

rolling = df.rolling("8B", closed='right', min_periods=1)
print(rolling.sum())

The first rolling works, the second raises:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_5.py", line 206, in <module>
    rolling = df.rolling("8B", closed='right', min_periods=1)
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 10651, in rolling
    return Rolling(
  File "/home/developer/PycharmProjects/pandas/pandas/core/window/rolling.py", line 185, in __init__
    self.validate()
  File "/home/developer/PycharmProjects/pandas/pandas/core/window/rolling.py", line 1964, in validate
    self.window = freq.nanos
  File "pandas/_libs/tslibs/offsets.pyx", line 684, in pandas._libs.tslibs.offsets.BaseOffset.nanos.__get__
ValueError: <8 * BusinessDays> is a non-fixed frequency

If this should not be the same, we should raise in the case above, because window=8 would be a fixed window and closed is not implemented for that.

Thoughts @jbrockmendel @dsaxton ?

@jbrockmendel
Copy link
Member

cc @mroeschke ?

@mroeschke
Copy link
Member

@phofl your example is raising because you are using 8B instead of 8D which is the "equivalent" to window=8.

To the original issue, it's stated in the docs:

For fixed windows, defaults to ‘both’. Remaining cases not implemented for fixed windows.

I suppose this should have raised a NotImplemented error instead of silently ignoring. Additionally, a PR would be welcome to implement the other bound types for fixed windows

@mroeschke mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas and removed Bug labels Oct 2, 2020
@mroeschke mroeschke changed the title BUG: DataFrame.rolling ignores closed='left' parameter EHN: Implement closed=left|right|neither in DataFrame.rolling for fixed windows Oct 2, 2020
@jreback jreback added this to the 1.2 milestone Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants