Skip to content

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

@Iqigai

Description

@Iqigai

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame({'A': [1] * 5 + [0]*5})
win_size = 4
df['left'] = df['A'].rolling(win_size, closed='left').sum()
df['right'] = df['A'].rolling(win_size, closed='right').sum()
df['both'] = df['A'].rolling(win_size, closed='both').sum()
df['neither'] = df['A'].rolling(win_size, closed='neither').sum()
df
Out[6]: 
   A  left  right  both  neither
0  1   NaN    NaN   NaN      NaN
1  1   NaN    NaN   NaN      NaN
2  1   NaN    NaN   NaN      NaN
3  1   NaN    4.0   4.0      NaN
4  1   4.0    4.0   5.0      NaN
5  0   4.0    3.0   4.0      NaN
6  0   3.0    2.0   3.0      NaN
7  0   2.0    1.0   2.0      NaN
8  0   1.0    0.0   1.0      NaN
9  0   0.0    0.0   0.0      NaN

Issue Description

There seems to be some inconsistencies in the behavior of the rolling function related to the parameter 'closed'. Different options were tested and assigned to a different column in the toy example above.
First, when using 'neither' it returns NaNs as per the 'neither' column in the output.
Second, when we use 'right' or 'left', the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed='both', we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.

Expected Behavior

1- Using 'neither', should yield the same result as 'left' minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to 'left'.
2- Using 'left' or 'right' should exclude from the calculation one of the endpoints and never take the whole window for calculation.
3- Using 'both', should use a window, exactly the same size as the parameter 'window', inclusive of the current row.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Canada.1252
pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 58.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.2
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.29.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions