offset-based rolling window, multiple issues with closed='left' #26005

pshargo · 2019-04-05T22:01:59Z

Code Sample

# Case 1: single row
df1 = pd.DataFrame({'B': [0]}, index=[pd.Timestamp('20130101 09:00:00')])
df1.rolling('1s', closed='left').median()  # <- raises 'MemoryError: skiplist_init failed'

# Case 2: multiple rows, but entries separated by a larger time than the specified window
df2 = pd.DataFrame({'B': [0, 1]}, index=[pd.Timestamp('20130101 09:00:00'), pd.Timestamp('20130101 09:00:02')])
df2.rolling('1s', closed='left').median() # <- raises 'MemoryError: skiplist_init failed'
df2.rolling('1s', closed='left').max() # <- no error, but second entry seems incorrect

# Case 3: as long as at least one row has other entries in its window, it runs without 
# an exception but the values are suspect
df3 = pd.DataFrame({'B': [1, 2, 3]}, index=[pd.Timestamp('20130101 09:00:00'), pd.Timestamp('20130101 09:00:02'), pd.Timestamp('20130101 09:00:03')])
df3.rolling('1s', closed='left').median() # <- no exception, but the values seem incorrect
df3.rolling('1s', closed='left').max() # ditto
df3.rolling('2s', closed='left').median() # ditto (note longer window)

Problem description

Obviously, the exception cases are a big problem and should be addressed. The other cases laid out here seem to give unexpected results that are inconsistent with other aggregations (such as mean and sum) that do seem to be operating correctly. Note that using closed='right' or closed='both' does seem to give results consistent with my expectations, while using closed='neither' yields similar problems as closed='left'. (So, it would seem that the common factor here is whether or not the input rows are included in their own rolling windows.)

Expected Output

Case 1: since there are no other entries in the input row's window, I would expect that the median aggregation return NaN. (This would be consistent with mean, max, etc. for this case.)

                      B
2013-01-01 09:00:00 NaN

Case 2: since neither input row should have any other entries in their windows, I would expect that the median and max results should all be NaN. (This would be consistent with what the mean aggregation returns for this case.)

                     B
2013-01-01 09:00:00  NaN
2013-01-01 09:00:02  NaN

Case 3a and 3b (1s window): since neither of the first two input rows should have any other entries in their windows, I would expect that their median and max results should all be NaN. since the last row does have an entry in its window (the second row) I would expect that both the median and max should be 2.0. (This would be consistent with what the mean aggregation returns for this case.)

                     B
2013-01-01 09:00:00  NaN
2013-01-01 09:00:02  NaN
2013-01-01 09:00:03  2.0

Case 3c (2s window): since the first row should have no entries in its window, I would expect the first output row to be NaN. the second row will have the first entry in its window, so I would expect its output to be 1.0. similarly, the last row will have the second entry in its window and I would expect its output to be 2.0.

                     B
2013-01-01 09:00:00  NaN
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-17134-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.0.1
setuptools: 40.6.3
Cython: None
numpy: 1.16.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.0
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-04-09T20:01:04Z

Thanks for the report. Might be an off by one error with closed='left' for these. Investigation and PRs would certainly be welcome

…either' (#26005) (#26910)

WillAyd added Bug Datetime Datetime data dtype Window rolling, ewma, expanding labels Apr 9, 2019

WillAyd added this to the Contributions Welcome milestone Apr 9, 2019

This was referenced Jun 17, 2019

BUG: Fix rolling median and quantile with closed='left' and closed='neither' (#26005) #26910

Merged

BUG: Fix skiplist init error with empty window #26940

Merged

jreback pushed a commit that referenced this issue Jun 21, 2019

BUG: Fix rolling median and quantile with closed='left' and closed='n…

9088f5e

…either' (#26005) (#26910)

ihsansecer mentioned this issue Jun 30, 2019

BUG: Fix empty closed window issue with rolling min and max #27140

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 1, 2019

jreback closed this as completed in #27140 Jul 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

offset-based rolling window, multiple issues with closed='left' #26005

offset-based rolling window, multiple issues with closed='left' #26005

pshargo commented Apr 5, 2019

INSTALLED VERSIONS

WillAyd commented Apr 9, 2019

offset-based rolling window, multiple issues with closed='left' #26005

offset-based rolling window, multiple issues with closed='left' #26005

Comments

pshargo commented Apr 5, 2019

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Apr 9, 2019

Output of `pd.show_versions()`