Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: FutureWarning: Resampling with a PeriodIndex is deprecated, how to resample now? #57033

Open
2 tasks done
andreas-wolf opened this issue Jan 23, 2024 · 7 comments
Open
2 tasks done

Comments

@andreas-wolf
Copy link

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/77862775/pandas-2-2-futurewarning-resampling-with-a-periodindex-is-deprecated

Question about pandas

Pandas version 2.2 raises a warning when using this code:

import pandas as pd

df = pd.DataFrame.from_dict({"something": {pd.Period("2022", "Y-DEC"): 2.5}})
# FutureWarning: Resampling with a PeriodIndex is deprecated.
# Cast index to DatetimeIndex before resampling instead.
print(df.resample("M").ffill())

#          something
# 2022-01        2.5
# 2022-02        2.5
# 2022-03        2.5
# 2022-04        2.5
# 2022-05        2.5
# 2022-06        2.5
# 2022-07        2.5
# 2022-08        2.5
# 2022-09        2.5
# 2022-10        2.5
# 2022-11        2.5
# 2022-12        2.5

This does not work:

df.index = df.index.to_timestamp()
print(df.resample("M").ffill())

#             something
# 2022-01-31        2.5

I have PeriodIndex all over the place and I need to resample them a lot, filling gaps with ffill.
How to do this with Pandas 2.2?

@andreas-wolf andreas-wolf added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jan 23, 2024
@MarcoGorelli
Copy link
Member

This comes from #55968 , and here's the relevant issue #53481

I'd suggest to create a datetimeindex and ffill

@MarcoGorelli MarcoGorelli removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jan 23, 2024
@matteo-zanoni
Copy link

I just hit this too.
For upsampling the PeriodIndex had a different result than the DatetimeIndex, in particular PeriodIndex would create a row for each new period in the old period:

import pandas as pd

s = pd.Series(1, index=pd.period_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
s.resample("1h").ffill()

This would create a series including ALL hours of 2024-01-02.

If, instead, we first convert to DatetimeIndex:

import pandas as pd

s = pd.Series(1, index=pd.period_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
d.index = d.index.to_timestamp()
s.resample("1h").ffill()

The output will only contain 1 hour of 2024-01-02.
Even if the series is constructed directly with a DatetimeIndex (even one containing the frequency information) the result is the same:

import pandas as pd

s = pd.Series(1, index=pd.date_range(pd.Timestamp(2024, 1, 1), pd.Timestamp(2024, 1, 2), freq="d"))
s.resample("1h").ffill()

Will there be no way to obtain the old behaviour of PeriodIndex in future versions?
IMO upsampling is quite common and the way PeriodIndex implemented it is more usefull. It would be a shame to loose it.

@andreas-wolf
Copy link
Author

andreas-wolf commented Feb 27, 2024

IMO upsampling is quite common and the way PeriodIndex implemented it is more usefull. It would be a shame to loose it.

I agree. Now you'll need to do reindexing manually, while with periodIndex this was a one-liner.
Furthermore resampling with a datetime index seems to change the data type (a bug?). Here some sample code:

import pandas as pd

# some sample data
data = {2023: 1, 2024: 2}
df = pd.DataFrame(list(data.values()), index=pd.PeriodIndex(data.keys(), freq="Y"))

# Old style resampling, just a one-liner
old_style_resampling = df.resample("M").ffill()
print(old_style_resampling)
print(type(old_style_resampling.iloc[0][0]))

# Convert index to DatetimeIndex
df.index = pd.to_datetime(df.index.start_time)
last_date = df.index[-1] + pd.offsets.YearEnd()
df_extended = df.reindex(
    df.index.union(pd.date_range(start=df.index[-1], end=last_date, freq="D"))
).ffill()
new_style_resampling = df_extended.resample("ME").ffill()
print(new_style_resampling)
print(type(new_style_resampling.iloc[0][0]))

I also opt for keeping the periodIndex resampling.

@ChadFulton
Copy link

Related to my comments in #56588, I think that this is another example where Period is being deprecated too fast without a clear replacement in mind.

@jbrockmendel
Copy link
Member

Are all the relevant cases about upsampling and never downsampling? A big part of the motivation for deprecating was that PeriodIndexResampler._downsample is deeply broken in a way that didn't seem worth fixing. Potentially we could just deprecate downsampling and not upsampling?

@andreas-wolf
Copy link
Author

The upsampling example is just the one where it's very obvious what will be missing when periodindex resampling won't work any more.

When downsampling would not work anymore I would have to convert the index, downsample and convert the index back again. Does not sound very compelling.

The period index resampling (up and down) is very convenient when one has to combine different data sources in days, months, quarters and years. I can't remember a project where I did not use period resampling. The convenience was always an argument to use pandas instead of other libraries like polars where one has to handle all the conversions yourself.

From my point of view the PeriodIndex was always one of the great things about Pandas.

I have very limited experience with Pandas internals, so I don't understand how downsampling can be deeply broken so that it's not worth fixing when "just" converting to a datetime index would fix it? Can't the datetime indexing be used internally to fix it?

@MarcosYanase
Copy link

I agree with @andreas-wolf.
My projects have a lot of dataframes using PeriodIndex, with differents frequencies and resampling is very very useful tool for calculation.
Keeping it at least for upsampling would be excellent, but for downsampling a workaround (like Andreas posted) is not trivial.
What are the bugs related to PeriodIndexResampler._downsample?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants