-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas dataframe pct_change function arguments #39672
Comments
Thanks @jpcam I agree that the last example might not be a useful use-case, and it could do with being re-written as In [5]: df.pct_change(axis='columns', periods=-1)
Out[5]:
2016 2015 2014
GOOG 0.179241 0.094112 NaN
APPL -0.252395 -0.011860 NaN , do you want to submit a pull request? If so, here's the contributing guide |
@MarcoGorelli if @jpcam is not working on it, I'll like take up this issue. |
go ahead, thanks |
@joybh98 Thanks for having a go at it. Conceptually it appears trivial but could take different solutions which will depend how efficient you want it to run. keep me posted on your progress. have fun! |
@jpcam which different solutions? All that needs updating here is the example in the docs so that it uses |
@andrei are you working on this issue ? I don't want to step in your shoes |
Hi, sorry, I actually did not see the full discussion before I made the changes. I am actually a newbie and found this through the 'good first issue' tag. Looks like my edits did not pass some tests, so I can step back and try to learn what went wrong. As a side-note, just want to say, pandas is super awesome and totally changed the way I was working on certain projects, and would love to learn from others about contributing, so please send me a message if you have useful tips for a new contributor like me. Thanks! |
@andrei-assa sure, i'll open a PR ASAP on this and you can help me there, btw for future reference, the issue had a doc tag, which meant changes to the documentation was needed, not the source code itself. |
Thanks @andrei-assa - see here for the contributing guide |
Thanks @MarcoGorelli and @joybh98. Perhaps documentation can include mention that, under US Generally Accepted Accounting Principles (US GAAP), the convention for presenting period elements is: for period 0 to t represented on axis=1 with axis indices 0 to n, major financial statements typically represent periods in reverse order, i.e. beginning with period t at index 0 and ending with period 0 at index n. This is exactly the opposite of most timeseries, where for periods 0 -> t and indices 0 -> n, period 0 is represented at index 0 and period t and index n. The reversal of this convention under US GAAP is what necessitates the periods=-1 argument. Not sure if this can be explained succinctly in documentation; if you think it is worth mentioning, I can come up with a 1-line version of this explanation. |
No need IMO, pandas is meant to be general-purpose, just modifying the example so it matches a more common use-case (i.e. relative change forwards in time) should be enough |
@MarcoGorelli Thanks for your reply. If you don't mind, can you suggest any open issues for me to try to tackle so I can get practice contributing? Thanks! |
@joybh98 have you already started your PR? If not, shall we leave it to @andrei-assa (for whom it'd actually be a first issue)? Else, any 'good first issue' is usually a good place to start |
@MarcoGorelli I don't have any problem, @andrei-assa go ahead! |
Typically financial data is ordered by columns in reverse chronology ie columns '2020', '2019', '2018', '2017'.. But we need the pct_change function in reverse order. Looking at the Pandas reference (last example): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html?highlight=pct_change#pandas.DataFrame.pct_change
df.pct_change(axis='columns')
GOOG NaN -0.151997 -0.086016
APPL NaN 0.337604 0.012002
Such a calculation makes no sense or at least is not useful. We need to look at change from today over past period, not past period over future period. Other than iterating in reverse order or transposing, can the pct_change function be augmented with a new argument to reverse the calculation to the form: (A0-A1)/A1 instead of (A1-A0)/A0 ??
The text was updated successfully, but these errors were encountered: