Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: reset_index() looses the frequency of a DatetimeIndex #59273

Open
3 tasks done
annika-rudolph opened this issue Jul 18, 2024 · 5 comments
Open
3 tasks done

BUG: reset_index() looses the frequency of a DatetimeIndex #59273

annika-rudolph opened this issue Jul 18, 2024 · 5 comments
Assignees
Labels
Bug Closing Candidate May be closeable, needs more eyeballs

Comments

@annika-rudolph
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> index = pd.DatetimeIndex(pd.date_range(start="2000", freq = 'YS', periods = 10), name = 'Date')
>>> df = pd.DataFrame(data=list(range(10)), index = index)
>>> print(df.index.freq)
<YearBegin: month=1>
>>> print(df.reset_index()['Date']._values.freq)
None
>>> df = df.reset_index().set_index('Date')
>>> print(df.index.freq)
None

Issue Description

When doing reset_index() on a DatetimeIndex this leads to the frequency being lost. Although the newly created column is a DatetimeArray, it does not seem to carry the freq attribute. As a result, when doing reset_index() -> set_index() I cannot restore the original index which potentially creates issues.

Expected Behavior

I would expect that reset_index().set_index() let's me recover the original index :)

Installed Versions

INSTALLED VERSIONS

commit : bfe5be0
python : 3.10.12
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 0+untagged.34794.gbfe5be0
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 22.0.2
Cython : 3.0.10
sphinx : 7.3.7
IPython : 8.23.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.3.1
html5lib : 1.1
hypothesis : 6.100.1
gcsfs : 2024.3.1
jinja2 : 3.1.3
lxml.etree : 5.2.1
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.2
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 16.0.0
pyreadstat : 1.2.7
pytest : 8.1.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.3.1
scipy : 1.13.0
sqlalchemy : 2.0.29
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.3.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@annika-rudolph annika-rudolph added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 18, 2024
@aram-cinnamon
Copy link
Contributor

take

@aram-cinnamon
Copy link
Contributor

I did some digging, and it seems it's intended that freq becomes None in a column:

if isinstance(values, (DatetimeArray, TimedeltaArray)) and values.freq is not None:
# freq is only stored in DatetimeIndex/TimedeltaIndex, not in Series/DataFrame
values = values._with_freq(None)

The above was added in this PR #41425, which mentions that "The long-term behavior is definitely going to always drop the freq (more specifically, DTA/TDA won't have freq, xref #31218). So this PR standardizes always-dropping."

@annika-rudolph What do you think?
Also @jbrockmendel @jreback @mroeschke @jorisvandenbossche you created/reviewed/were mentioned in the PR. What are your thoughts on this issue?

@annika-rudolph
Copy link
Contributor Author

annika-rudolph commented Jul 23, 2024

Thanks for digging into this! It is what I suspected :)

From a user perspective I can say that frequencies in DatetimeIndices are quite important, even more so since some functionality (like businessday and resample) will be dropped for Periodindices -- which for us means that we recently moved everything to DatetimeIndices. Thus, it would be nice if they could cover the same functionality as Periodindices and specifically, the frequency attribute could be retained in all transformations.
Reset_index() -> set_index() is a common pattern that I see a lot when working with MultiIndices, which is also very relevant in many of my projects.

It seems to me that the decision on always dropping the frequency was taken some time ago (before deciding to drop PeriodIndex functionality?), so maybe it could be reconsidered?

@yuanx749
Copy link
Contributor

I encountered this issue and did some debugging. It is the reshape below that leads to loss of freq.

values = values.reshape(1, -1)

But as mentioned by @aram-cinnamon , I think this behaviour is expected.

@rhshadrach rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2024
@TorstenPietrek
Copy link

With the deprecation of PeriodIndex functionality progressing and the current recommendation to use a DateTimeIndex in place, I think the frequency property of a DateTimeIndex should not be dropped. If you drop the frequency property the object no longer holds information on the periods, so it could not be used to replace the PeriodIndex. @jbrockmendel it would be great to hear your opinion on that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

5 participants