Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash #21704

Closed
louis-red opened this issue Jul 2, 2018 · 5 comments

Comments

Projects
None yet
6 participants
@louis-red
Copy link
Contributor

commented Jul 2, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from pandas.tseries.frequencies import to_offset

df = pd.Series(data=np.arange(10), 
               index=pd.date_range('2000', periods=10))

df.rolling(to_offset('3D'), closed='left').max()

Problem description

With this rolling and aggregation function, python just crashes. It does too with .min() or .agg(np.max) as aggregation steps. It does not when closed='right' or with mean as an aggregation function.

Expected Output

2000-01-01    NaN
2000-01-02    0.0
2000-01-03    1.0
2000-01-04    2.0
2000-01-05    3.0
2000-01-06    4.0
2000-01-07    5.0
2000-01-08    6.0
2000-01-09    7.0
2000-01-10    8.0
Freq: D, dtype: float64

or, if closed='left' raises complicated problems (as it is not implemented for fixed windows), disable it too for the offset-based windows.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@louis-red louis-red changed the title Offset-based rolling window, with closed='left' and min or max as aggregation function, make python crash Offset-based rolling window, with closed='left' and max as aggregation function, make python crash Jul 2, 2018

@louis-red louis-red changed the title Offset-based rolling window, with closed='left' and max as aggregation function, make python crash BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash Jul 2, 2018

@uds5501

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2018

The bug exists on master branch.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

Thanks for the report - that does look strange. Investigation and PRs are always welcome!

@grzjab

This comment has been minimized.

Copy link

commented Jul 4, 2018

Same thing happen with closed='neither', but it works for closed='right'

@louis-red

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2018

[Disclaimer : I don't know much about Cython]
The way Python crashes surely mean that it comes from a cython segmentation fault. And the fact that closed='right' works but not left may come from an off-by-one error. As of why the aggregation function matters, I have no clue.
Would love to investigate but as of today I do not have enough experience in C or Cython.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 11, 2018

BUG: datetime rolling min/max segfaults when closed=left (pandas-dev#…
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.
@changhiskhan

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

I just hit this as well. @JacquotLeHaricot you're intuition was spot on. Definitely an off-by-one.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 11, 2018

BUG: datetime rolling min/max segfaults when closed=left (pandas-dev#…
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.

@jreback jreback added this to the 0.24.0 milestone Jul 12, 2018

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 12, 2018

BUG: datetime rolling min/max segfaults when closed=left (pandas-dev#…
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 13, 2018

BUG: datetime rolling min/max segfaults when closed=left (pandas-dev#…
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes these three bugs, at the cost of casting the output array to
floating point even if the input is integer. This is less than ideal if the
output has no NaNs, but is consistent with roll_sum behavior.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 13, 2018

BUG: datetime rolling min/max segfaults when closed=left (pandas-dev#…
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes these three bugs, at the cost of casting the output array to
floating point even if the input is integer. This is less than ideal if the
output has no NaNs, but is consistent with roll_sum behavior.

WillAyd added a commit that referenced this issue Jul 20, 2018

@WillAyd WillAyd added the Window label Sep 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.