Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.shift shows different behavior for axis=1 when freq is specified #47039

Closed
3 tasks done
wjsi opened this issue May 17, 2022 · 2 comments · Fixed by #47051
Closed
3 tasks done

BUG: DataFrame.shift shows different behavior for axis=1 when freq is specified #47039

wjsi opened this issue May 17, 2022 · 2 comments · Fixed by #47051
Labels
Bug Datetime Datetime data dtype Index Related to the Index class or subclasses Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@wjsi
Copy link
Contributor

wjsi commented May 17, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np, pandas as pd

rs = np.random.RandomState(0)

raw = pd.DataFrame(
    rs.randint(1000, size=(10, 8)), columns=["col" + str(i + 1) for i in range(8)]
)
raw.index = pd.date_range("2020-1-1", periods=10)
raw.columns = pd.date_range("2020-3-1", periods=8)


print(raw)
"""result is
            2020-03-01  2020-03-02  2020-03-03  2020-03-04  2020-03-05  2020-03-06  2020-03-07  2020-03-08
2020-01-01         684         559         629         192         835         763         707         359
2020-01-02           9         723         277         754         804         599          70         472
2020-01-03         600         396         314         705         486         551          87         174
2020-01-04         600         849         677         537         845          72         777         916
2020-01-05         115         976         755         709         847         431         448         850
2020-01-06          99         984         177         755         797         659         147         910
2020-01-07         423         288         961         265         697         639         544         543
2020-01-08         714         244         151         675         510         459         882         183
2020-01-09          28         802         128         128         932          53         901         550
2020-01-10         488         756         273         335         388         617          42         442
"""

print(raw.shift(periods=2, freq="D", axis=1))
"""result is
            2020-03-01  2020-03-02  2020-03-03  2020-03-04  2020-03-05  2020-03-06  2020-03-07  2020-03-08
2020-01-01         NaN         NaN         684         559         629         192         835         763
2020-01-02         NaN         NaN           9         723         277         754         804         599
2020-01-03         NaN         NaN         600         396         314         705         486         551
2020-01-04         NaN         NaN         600         849         677         537         845          72
2020-01-05         NaN         NaN         115         976         755         709         847         431
2020-01-06         NaN         NaN          99         984         177         755         797         659
2020-01-07         NaN         NaN         423         288         961         265         697         639
2020-01-08         NaN         NaN         714         244         151         675         510         459
2020-01-09         NaN         NaN          28         802         128         128         932          53
2020-01-10         NaN         NaN         488         756         273         335         388         617
"""

print(raw.shift(periods=2, freq="D", axis=1, fill_value=0))
"""result is
            2020-03-03  2020-03-04  2020-03-05  2020-03-06  2020-03-07  2020-03-08  2020-03-09  2020-03-10
2020-01-01         684         559         629         192         835         763         707         359
2020-01-02           9         723         277         754         804         599          70         472
2020-01-03         600         396         314         705         486         551          87         174
2020-01-04         600         849         677         537         845          72         777         916
2020-01-05         115         976         755         709         847         431         448         850
2020-01-06          99         984         177         755         797         659         147         910
2020-01-07         423         288         961         265         697         639         544         543
2020-01-08         714         244         151         675         510         459         882         183
2020-01-09          28         802         128         128         932          53         901         550
2020-01-10         488         756         273         335         388         617          42         442
"""

Issue Description

As is defined in doc, when freq is specified, only values in axes are changed, while data are kept. When axis=0, things work well. However, when axis=1, data are shifted as if freq is not specified. The strange thing is that when fill_value is supplied with a value, freq argument worked. This behavior is not documented in https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.shift.html.

Expected Behavior

shift shows the same behavior for both axis values whenever fill_value is supplied.

As the change might break existing behaviors, it is better to correct this behavior since 1.5.0.

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : zh_CN.UTF-8
pandas : 1.4.2
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 61.2.0
Cython : 0.29.24
pytest : 6.2.4
hypothesis : 6.46.5
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.0
numba : None
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
snappy : None
sqlalchemy : 1.4.27
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@wjsi wjsi added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 17, 2022
@simonjayhawkins
Copy link
Member

Thanks @wjsi for the report.

As is defined in doc, when freq is specified, only values in axes are changed, while data are kept. When axis=0, things work well. However, when axis=1, data are shifted as if freq is not specified. The strange thing is that when fill_value is supplied with a value, freq argument worked.

I think pandas 1.1.5 was giving the expected result for raw.shift(periods=2, freq="D", axis=1). will label as a regression pending further investigation.

@simonjayhawkins simonjayhawkins added Datetime Datetime data dtype Index Related to the Index class or subclasses Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 17, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 17, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 17, 2022
@simonjayhawkins
Copy link
Member

I think pandas 1.1.5 was giving the expected result

from bisect i'm getting

first bad commit: [44ce988] BUG: df.diff axis=1 mixed dtypes (#36710)

but that change starting raising TypeError: cannot insert DatetimeArray with incompatible label

There was a regression reported for that PR #38434 and fixed in #38504 so it maybe that it that fix that now produces the incorrect behavior (not checked though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Index Related to the Index class or subclasses Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants