Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby.pct_change() does not work properly in Pandas 0.23.0. Grouping is ignored. #21200

Closed
Pferdow30 opened this issue May 25, 2018 · 4 comments

Comments

Projects
None yet
5 participants
@Pferdow30
Copy link

commented May 25, 2018

Code Sample

>>>import pandas as pd
>>>import numpy as np

>>>df = pd.DataFrame(data=np.random.rand(8, 1), columns={'a'})
>>>df['grp']=1
>>>df.loc[::2, 'grp']=2
>>>df['%_groupby']=df.groupby('grp')['a'].pct_change()
>>>df['%_shift']=df.groupby('grp')['a'].shift(0)/df.groupby('grp')['a'].shift(1)-1
>>>print(df)

Problem description

When there are different groups in a dataframe, by using groupby it is expected that the pct_change function be applied on each group. However, combining groupby with pct_change does not produce the correct result.

Output:

     a  grp  %_groupby   %_shift
0  1.0    2        NaN       NaN
1  1.1    1   0.100000       NaN
2  1.2    2   0.090909  0.200000
3  1.3    1   0.083333  0.181818
4  1.4    2   0.076923  0.166667
5  1.5    1   0.071429  0.153846
6  1.6    2   0.066667  0.142857
7  1.7    1   0.062500  0.133333

Expected Output

     a  grp  %_groupby   %_shift
0  1.0    2        NaN       NaN
1  1.1    1        NaN       NaN
2  1.2    2   0.200000  0.200000
3  1.3    1   0.181818  0.181818
4  1.4    2   0.166667  0.166667
5  1.5    1   0.153846  0.153846
6  1.6    2   0.142857  0.142857
7  1.7    1   0.133333  0.133333

Output of pd.show_versions()

INSTALLED VERSIONS


commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@SimonAlecks

This comment has been minimized.

Copy link
Contributor

commented May 25, 2018

I can see the pct_change function in groupby.py on line ~3944 is not implementing this properly. Whereas the method it overrides implements it properly for a dataframe. I'd like to think this should be relatively straightforward to remedy.
I'll take a crack at a PR for this. Although I haven't contributed to pandas before, so we'll see if I am able to complete it in a timely manner.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 25, 2018

maybe related to #11811

@jreback jreback changed the title ``groupby`` followed by ``pct_change`` does not work properly in Pandas 0.23.0. Grouping is ignored. BUG: groupby.pct_change() does not work properly in Pandas 0.23.0. Grouping is ignored. May 25, 2018

@jreback jreback added this to the Next Major Release milestone May 25, 2018

@SimonAlecks SimonAlecks referenced this issue May 28, 2018

Merged

pct change bug issue 21200 #21235

3 of 3 tasks complete
@rontho1992

This comment has been minimized.

Copy link

commented Jun 5, 2018

Found something along these lines when you shift in reverse so

import pandas_datareader.data as web
import pandas as pd

tickers = ['F','AAPL','NFLX','AMZN','GOOG']

df = pd.DataFrame()
for ticker in tickers:
    data = web.DataReader(ticker, 'iex', '2018-01-01', '2018-06-01')
    data['ticker'] = ticker
    df = df.append(data)

df = df.reset_index()
df['5_day_growth'] = df.groupby('ticker').close.pct_change(periods=-5)
df['5_day_growth_alt'] = df.groupby('ticker').close.pct_change(periods=5).shift(-5)

The alternate method gives you correct output rather than shifting in the calculation.

print(df[['date','ticker','close','5_day_growth', '5_day_growth_alt']].head(6))

          date ticker    close  5_day_growth  5_day_growth_alt
0  2018-01-02      F  12.1939     -0.032115          0.033181
1  2018-01-03      F  12.2903     -0.020717          0.021155
2  2018-01-04      F  12.5022     -0.013672          0.013862
3  2018-01-05      F  12.7141     -0.002268          0.002273
4  2018-01-08      F  12.6659      0.003820         -0.003805
5  2018-01-09      F  12.5985      0.073894         -0.068810
@WillKoehrsen

This comment has been minimized.

Copy link

commented Jun 28, 2018

A workaround for this is using apply. This should produce the desired result:

df['%_groupby'] = df.groupby('grp')['a'].apply(lambda x: x.pct_change())

matthewgilbert added a commit to matthewgilbert/strategy that referenced this issue Aug 22, 2018

Fix return calculations for pandas 0.23.*
There was a bug introduced in pandas 0.23.* using pct_change() on a
groupby. Details at pandas-dev/pandas#21200

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.