Skip to content

pct_change and negative values #40911

@botant

Description

@botant

Problem description

The behaviour of pct_change when there are negative numbers could be improved.

In [3]: df = pd.DataFrame([10, -10, 10])

In [4]: df.pct_change()
Out[4]: 
     0
0  NaN
1 -2.0
2 -2.0

There are basically two ways of calculating %change with negative numbers, depending on what you want.

1) Usual (p_1 - p_0) / p_0 or (p_1 / p_0) - 1

This results in %changes that have correct dimension but wrong direction; %change and difference have different signs.

On the other hand, these %changes can be compounded, as shown below:

In [12]: (df.pct_change().replace(np.NaN, 0) + 1).cumprod() * 10
Out[12]: 
      0
0  10.0
1 -10.0
2  10.0

2) Using absolute denominator (p_1 - p_0) / abs(p_0)

This results in %changes that have correct dimension and direction, but that cannot be compounded.

This post explains it really well: https://math.stackexchange.com/a/3986352

Bloomberg, for example, reports %change like this.

Suggestion

There is no "one size fits all" here. However, I feel there is space for pandas to be proactive and alert the user that negative prices have certain implications.

I would like to suggest the following small modification to pct_change:

  • Add an argument calc_type with 3 possible values:
  • None: current calculation + warning if negative values are found.
  • "signed": current calculation.
  • "absolute": calculation using absolute denominators.

The advantage of this? To alert users that the current calculation may not be what they expect if the data has a stray negative number, and to force them to be explicit when that happens.

One might say that "users should read the documentation" and that pandas should not validate your data for you, which is also correct. As a user, however, I would have been massively thankful if pandas had alerted me of the behaviour of pct_change.

I imagine many people out there don't really think about this if they expect their data to be strictly positive. I myself never gave this too much thought until crude oil turned negative on 20-Mar-2020.

Output of pd.show_versions()

In [1]: import pandas as pd
In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : f2c8480af2f25efdbd803218b9d87980f416563e
python           : 3.8.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-71-generic
Version          : #79-Ubuntu SMP Wed Mar 24 10:56:57 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

pandas           : 1.2.3
numpy            : 1.20.2
pytz             : 2020.5
dateutil         : 2.8.1
pip              : 21.0.1
setuptools       : 44.0.0
Cython           : 0.29.21
pytest           : 6.2.1
hypothesis       : 6.0.1
sphinx           : 3.2.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.20.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.6.2
sqlalchemy       : None
tables           : None
tabulate         : 0.8.7
xarray           : 0.16.2
xlrd             : None
xlwt             : None
numba            : 0.52.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions