BUG: Rolling min_periods not working on groupby object #36040

justinessert · 2020-09-01T14:57:53Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

df = pd.DataFrame({
    'segment': 'A',
    'data': range(10)
})

df.rolling(5, center=True, min_periods=1).max()

df.groupby('segment').rolling(5, center=True, min_periods=1).max().reset_index(drop=True)

Problem description

For the DataFrame above, with a single segment 'A', the result of df.rolling(5, center=True, min_periods=1).max() should be identical to that of df.groupby('segment').rolling(5, center=True, min_periods=1).max().reset_index(drop=True). Instead, the latter operation has NaNs in the last two positions of the data column.

Expected Output

Both operations should return the sequence [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0, 9.0]. Instead, df.groupby('segment').rolling(5, center=True, min_periods=1).max().reset_index(drop=True) returns [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, NaN, NaN]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.7.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.0
pip : 20.1.1
setuptools : 47.3.0.post20200616
Cython : None
pytest : 6.0.0
hypothesis : None
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

justinessert · 2020-09-01T15:02:47Z

[Edited]

Additionally, I have found that if there are two segments in the DataFrame, the groupby is not respected, but the NaNs only come in on the last segment.

df = pd.DataFrame({
    'segment': ['A']*10 + ['B']*10,
    'data': range(20)
})
df.groupby('segment').rolling(5, center=True, min_periods=1).max()

Here, the expected result of df.groupby('segment').rolling(5, center=True, min_periods=1).max() is:
for segment 'A' is [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0, 9.0] but the actual result is [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
for segment 'B' is [12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 19.0, 19.0] but the actual result is [12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, NaN, NaN]

jreback · 2020-09-02T21:20:29Z

cc @mroeschke

wfvining · 2020-09-11T17:51:31Z

Seeing what I think is the same problem with the following example.

x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8])
x.groupby(x % 2).rolling(window=3, min_periods=1, center=True).sum()

I expect to see

0  1     6.0                                                                                                               
   3    12.0
   5    18.0 
   7    14.0
1  0     4.0
   2     9.0
   4    15.0
   6    12.0
dtype: float64

But instead I get

0  1     6.0                                                                                                               
   3    12.0
   5    18.0 
   7     1.0
1  0     4.0
   2     9.0
   4    15.0
   6     NaN
dtype: float64

If center or min_periods are not specified then I get the expected behavior.

justinessert · 2020-10-10T15:58:05Z

Issue resolved with PR 36567

mroeschke · 2020-10-10T16:46:42Z

We'll officially close this issue with your PR in #37035

justinessert added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 1, 2020

TomAugspurger removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 4, 2020

justinessert mentioned this issue Sep 4, 2020

API: reimplement FixedWindowIndexer.get_window_bounds to fix groupby bug #36132

Closed

5 tasks

wfvining mentioned this issue Sep 11, 2020

Identify when the sun is up based on power or irradiance pvlib/pvanalytics#67

Merged

9 tasks

justinessert mentioned this issue Oct 10, 2020

API: reimplement FixedWindowIndexer.get_window_bounds #37035

Merged

5 tasks

justinessert closed this as completed Oct 10, 2020

mroeschke reopened this Oct 10, 2020

mroeschke added this to the 1.2 milestone Oct 10, 2020

jreback added Groupby Window rolling, ewma, expanding labels Oct 10, 2020

jreback closed this as completed in #37035 Oct 10, 2020

wfvining mentioned this issue Nov 10, 2020

BUG: rolling groupby does not respect min_periods when center=True #37743

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Rolling min_periods not working on groupby object #36040

BUG: Rolling min_periods not working on groupby object #36040

justinessert commented Sep 1, 2020 •

edited

Loading

INSTALLED VERSIONS

justinessert commented Sep 1, 2020 •

edited

Loading

jreback commented Sep 2, 2020

wfvining commented Sep 11, 2020

justinessert commented Oct 10, 2020

mroeschke commented Oct 10, 2020

BUG: Rolling min_periods not working on groupby object #36040

BUG: Rolling min_periods not working on groupby object #36040

Comments

justinessert commented Sep 1, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

justinessert commented Sep 1, 2020 • edited Loading

jreback commented Sep 2, 2020

wfvining commented Sep 11, 2020

justinessert commented Oct 10, 2020

mroeschke commented Oct 10, 2020

justinessert commented Sep 1, 2020 •

edited

Loading

Output of `pd.show_versions()`

justinessert commented Sep 1, 2020 •

edited

Loading