-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Problem description
The behaviour of pct_change
when there are negative numbers could be improved.
In [3]: df = pd.DataFrame([10, -10, 10])
In [4]: df.pct_change()
Out[4]:
0
0 NaN
1 -2.0
2 -2.0
There are basically two ways of calculating %change with negative numbers, depending on what you want.
1) Usual (p_1 - p_0) / p_0
or (p_1 / p_0) - 1
This results in %changes that have correct dimension but wrong direction; %change and difference have different signs.
On the other hand, these %changes can be compounded, as shown below:
In [12]: (df.pct_change().replace(np.NaN, 0) + 1).cumprod() * 10
Out[12]:
0
0 10.0
1 -10.0
2 10.0
2) Using absolute denominator (p_1 - p_0) / abs(p_0)
This results in %changes that have correct dimension and direction, but that cannot be compounded.
This post explains it really well: https://math.stackexchange.com/a/3986352
Bloomberg, for example, reports %change like this.
Suggestion
There is no "one size fits all" here. However, I feel there is space for pandas
to be proactive and alert the user that negative prices have certain implications.
I would like to suggest the following small modification to pct_change
:
- Add an argument
calc_type
with 3 possible values: None
: current calculation + warning if negative values are found."signed"
: current calculation."absolute"
: calculation using absolute denominators.
The advantage of this? To alert users that the current calculation may not be what they expect if the data has a stray negative number, and to force them to be explicit when that happens.
One might say that "users should read the documentation" and that pandas
should not validate your data for you, which is also correct. As a user, however, I would have been massively thankful if pandas
had alerted me of the behaviour of pct_change
.
I imagine many people out there don't really think about this if they expect their data to be strictly positive. I myself never gave this too much thought until crude oil turned negative on 20-Mar-2020.
Output of pd.show_versions()
In [1]: import pandas as pd
In [2]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit : f2c8480af2f25efdbd803218b9d87980f416563e
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-71-generic
Version : #79-Ubuntu SMP Wed Mar 24 10:56:57 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.2.3
numpy : 1.20.2
pytz : 2020.5
dateutil : 2.8.1
pip : 21.0.1
setuptools : 44.0.0
Cython : 0.29.21
pytest : 6.2.1
hypothesis : 6.0.1
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : 0.16.2
xlrd : None
xlwt : None
numba : 0.52.0