Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add level to asfreq for MultiIndex #33647

Open
2 tasks done
MaxPowerWasTaken opened this issue Apr 19, 2020 · 3 comments
Open
2 tasks done

ENH: Add level to asfreq for MultiIndex #33647

MaxPowerWasTaken opened this issue Apr 19, 2020 · 3 comments

Comments

@MaxPowerWasTaken
Copy link

MaxPowerWasTaken commented Apr 19, 2020

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Question about pandas

I posted my question with a bit more context on Stack Overflow here, and got an impressive answer, but I doubt (or want to doubt) that the answer I got is the simplest way to set the freq attribute of a DateTime index that's part of a MultiIndex

Given a MultiIndexed Series loaded from a CSV that looks like this...

# generate example data
users = ['A', 'B', 'C', 'D']
#dates = pd.date_range("2020-02-01 00:00:00", "2020-04-04 20:00:00", freq="H")
dates = pd.date_range("2020-02-01 00:00:00", "2020-02-04 20:00:00", freq="H")
idx = pd.MultiIndex.from_product([users, dates])
idx.names = ["user", "datehour"]
y = pd.Series(np.random.choice(a=[0, 1], size=len(idx)), index=idx).rename('y')

# write to csv and reload (turns out this matters)
y.to_csv('reprod_example.csv')
y = pd.read_csv('reprod_example.csv', parse_dates=['datehour'])
y = y.set_index(['user', 'datehour']).y

>>> y.head()
user  datehour           
A     2020-02-01 00:00:00    0
      2020-02-01 01:00:00    0
      2020-02-01 02:00:00    1
      2020-02-01 03:00:00    0
      2020-02-01 04:00:00    0
Name: y, dtype: int64

...is there a way to set the freq attribute of the DateTime index level that's simpler / more intuitive than this answer I received?...

y = pd.read_csv('reprod_example.csv', parse_dates=['datehour'])
y = y.groupby('user').apply(lambda df: df.set_index('datehour').asfreq('H')).y

... setting an index inside a groupby/apply/lambda in order to update the freq attribute is not what I expected this would take. Certainly not the first thing I (or the answerer) tried.

Thanks! Love Pandas, just thought this was maybe a good opportunity to check if there's a more idiomatic way to do this, since the use case seems not super atypical.

@MaxPowerWasTaken MaxPowerWasTaken added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Apr 19, 2020
@TomAugspurger
Copy link
Contributor

Thanks for the report. Does this work?

y.index.levels[1].freq = "H"

If not, I think that we could support something like y.asfreq("H", level=1) here.

@TomAugspurger TomAugspurger added Needs Info Clarification about behavior needed to assess issue Timeseries and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@plammens
Copy link
Contributor

Related to #2141

@mroeschke mroeschke changed the title Most idiomatic way to set freq of Datetime Index in MultiIndex? ENH: Add level to asfreq for MultiIndex Mar 27, 2021
@mroeschke mroeschke added Enhancement Frequency DateOffsets MultiIndex and removed Needs Info Clarification about behavior needed to assess issue Timeseries Usage Question labels Mar 27, 2021
@hakimbazet
Copy link

Thanks for the report. Does this work?

y.index.levels[1].freq = "H"

If not, I think that we could support something like y.asfreq("H", level=1) here.

Would be nice to have this because currently to change the frequency within multiindex I need to do something like this:

y.groupby(["user", pd.Grouper(freq="H", level="datehour")])["y"].first()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants