Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas dataframe pct_change function arguments #39672

Closed
jpcam opened this issue Feb 8, 2021 · 14 comments · Fixed by #40266
Closed

Pandas dataframe pct_change function arguments #39672

jpcam opened this issue Feb 8, 2021 · 14 comments · Fixed by #40266

Comments

@jpcam
Copy link

jpcam commented Feb 8, 2021

Typically financial data is ordered by columns in reverse chronology ie columns '2020', '2019', '2018', '2017'.. But we need the pct_change function in reverse order. Looking at the Pandas reference (last example): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html?highlight=pct_change#pandas.DataFrame.pct_change

df.pct_change(axis='columns')

       2016    2015         2014

GOOG NaN -0.151997 -0.086016

APPL NaN 0.337604 0.012002

Such a calculation makes no sense or at least is not useful. We need to look at change from today over past period, not past period over future period. Other than iterating in reverse order or transposing, can the pct_change function be augmented with a new argument to reverse the calculation to the form: (A0-A1)/A1 instead of (A1-A0)/A0 ??

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 8, 2021

Thanks @jpcam

I agree that the last example might not be a useful use-case, and it could do with being re-written as

In [5]: df.pct_change(axis='columns', periods=-1)
Out[5]:
          2016      2015  2014
GOOG  0.179241  0.094112   NaN
APPL -0.252395 -0.011860   NaN

, do you want to submit a pull request? If so, here's the contributing guide

@joybh98
Copy link
Contributor

joybh98 commented Feb 9, 2021

@MarcoGorelli if @jpcam is not working on it, I'll like take up this issue.

@MarcoGorelli
Copy link
Member

go ahead, thanks

@jpcam
Copy link
Author

jpcam commented Feb 10, 2021

@joybh98 Thanks for having a go at it. Conceptually it appears trivial but could take different solutions which will depend how efficient you want it to run. keep me posted on your progress. have fun!

@MarcoGorelli
Copy link
Member

@jpcam which different solutions? All that needs updating here is the example in the docs so that it uses periods=-1

@joybh98
Copy link
Contributor

joybh98 commented Feb 11, 2021

@andrei are you working on this issue ? I don't want to step in your shoes

@andrei-assa
Copy link

Hi, sorry, I actually did not see the full discussion before I made the changes. I am actually a newbie and found this through the 'good first issue' tag. Looks like my edits did not pass some tests, so I can step back and try to learn what went wrong. As a side-note, just want to say, pandas is super awesome and totally changed the way I was working on certain projects, and would love to learn from others about contributing, so please send me a message if you have useful tips for a new contributor like me. Thanks!

@joybh98
Copy link
Contributor

joybh98 commented Feb 12, 2021

@andrei-assa sure, i'll open a PR ASAP on this and you can help me there, btw for future reference, the issue had a doc tag, which meant changes to the documentation was needed, not the source code itself.

@MarcoGorelli
Copy link
Member

Thanks @andrei-assa - see here for the contributing guide

@andrei-assa
Copy link

Thanks @MarcoGorelli and @joybh98. Perhaps documentation can include mention that, under US Generally Accepted Accounting Principles (US GAAP), the convention for presenting period elements is: for period 0 to t represented on axis=1 with axis indices 0 to n, major financial statements typically represent periods in reverse order, i.e. beginning with period t at index 0 and ending with period 0 at index n. This is exactly the opposite of most timeseries, where for periods 0 -> t and indices 0 -> n, period 0 is represented at index 0 and period t and index n. The reversal of this convention under US GAAP is what necessitates the periods=-1 argument. Not sure if this can be explained succinctly in documentation; if you think it is worth mentioning, I can come up with a 1-line version of this explanation.

@MarcoGorelli
Copy link
Member

No need IMO, pandas is meant to be general-purpose, just modifying the example so it matches a more common use-case (i.e. relative change forwards in time) should be enough

@andrei-assa
Copy link

@MarcoGorelli Thanks for your reply. If you don't mind, can you suggest any open issues for me to try to tackle so I can get practice contributing? Thanks!

@MarcoGorelli
Copy link
Member

@joybh98 have you already started your PR? If not, shall we leave it to @andrei-assa (for whom it'd actually be a first issue)?

Else, any 'good first issue' is usually a good place to start

@joybh98
Copy link
Contributor

joybh98 commented Feb 12, 2021

@MarcoGorelli I don't have any problem, @andrei-assa go ahead!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment