Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash #21704

Closed
louis-red opened this issue Jul 2, 2018 · 5 comments · Fixed by #21853
Labels
Bug Window rolling, ewma, expanding
Milestone

Comments

@louis-red
Copy link
Contributor

louis-red commented Jul 2, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from pandas.tseries.frequencies import to_offset

df = pd.Series(data=np.arange(10), 
               index=pd.date_range('2000', periods=10))

df.rolling(to_offset('3D'), closed='left').max()

Problem description

With this rolling and aggregation function, python just crashes. It does too with .min() or .agg(np.max) as aggregation steps. It does not when closed='right' or with mean as an aggregation function.

Expected Output

2000-01-01    NaN
2000-01-02    0.0
2000-01-03    1.0
2000-01-04    2.0
2000-01-05    3.0
2000-01-06    4.0
2000-01-07    5.0
2000-01-08    6.0
2000-01-09    7.0
2000-01-10    8.0
Freq: D, dtype: float64

or, if closed='left' raises complicated problems (as it is not implemented for fixed windows), disable it too for the offset-based windows.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@louis-red louis-red changed the title Offset-based rolling window, with closed='left' and min or max as aggregation function, make python crash Offset-based rolling window, with closed='left' and max as aggregation function, make python crash Jul 2, 2018
@louis-red louis-red changed the title Offset-based rolling window, with closed='left' and max as aggregation function, make python crash BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash Jul 2, 2018
@uds5501
Copy link
Contributor

uds5501 commented Jul 2, 2018

The bug exists on master branch.

@WillAyd
Copy link
Member

WillAyd commented Jul 2, 2018

Thanks for the report - that does look strange. Investigation and PRs are always welcome!

@grzjab
Copy link

grzjab commented Jul 4, 2018

Same thing happen with closed='neither', but it works for closed='right'

@louis-red
Copy link
Contributor Author

[Disclaimer : I don't know much about Cython]
The way Python crashes surely mean that it comes from a cython segmentation fault. And the fact that closed='right' works but not left may come from an off-by-one error. As of why the aggregation function matters, I have no clue.
Would love to investigate but as of today I do not have enough experience in C or Cython.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 11, 2018
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.
@changhiskhan
Copy link
Contributor

I just hit this as well. @JacquotLeHaricot you're intuition was spot on. Definitely an off-by-one.

changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 11, 2018
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.
@jreback jreback added this to the 0.24.0 milestone Jul 12, 2018
changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 12, 2018
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.
changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 13, 2018
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes these three bugs, at the cost of casting the output array to
floating point even if the input is integer. This is less than ideal if the
output has no NaNs, but is consistent with roll_sum behavior.
changhiskhan added a commit to changhiskhan/pandas that referenced this issue Jul 13, 2018
…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes these three bugs, at the cost of casting the output array to
floating point even if the input is integer. This is less than ideal if the
output has no NaNs, but is consistent with roll_sum behavior.
@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants