resampling on a MultiIndex #21283

alex-git-rd · 2018-06-01T06:12:30Z

I posed this issue on SO but was hoping to get a more detailed explanation. I have a df that has an (id, date) MultiIndex. I would like to resample the df within each id. Intuitively I thought the code should look something like:

df = pd.DataFrame(data = {"val": np.arange(30), 
                          "id": np.tile([1,2], 15),
                          "date": np.repeat(pd.date_range(start = "2000-01-01", periods = 15, name="date"), 2)
                         })

df = df.set_index(["id", "date"]).sort_index() 
df.groupby("id")["val"].resample(rule = "M", closed = "right", label = "right").apply(lambda x: np.sqrt(sum(x)/10))

This raises an exception and the working answer suggests that I only have date as the index leaving the id as a column, then group by that column and resample. Namely, instead of df.set_index(["id", "date"]).sort_index() just df.set_index("date").sort_index() and then everything else should work as is. I'm a bit confused why my original attempt failed. My understanding of groupby is that it will create an object that has grouping information and all the methods on that object respect that grouping information by operating on each group only. Specifically the index or column that the df is grouped on will be absent from the group subframe passed to any methods and will only be used to cat the results together. Therefore, I expected that resample would receive a df with only a date index rather than a MultiIndex. Am I think about this completely incorrectly?

The text was updated successfully, but these errors were encountered:

gitgithan · 2018-12-07T04:21:47Z

Looks like this is an issue not specific to just resample, but also any subsequently chained function that does not take in MultiIndex. I guess the implementation of groupby retains row labels as they are because that would offer the most amount of information and clarity to the user when he is dealing with the groups after groupby.
One possible question is would it be necessary to implement a pandas.MultiIndex.droplevel function when grouping on MultiIndex. (Is this what you are looking for?) That seems unnecessary as the working answer you mentioned has already proven, to flexibly use the values of a column to achieve the grouping rather than setting them into index to group.

nealxm · 2022-02-20T12:27:51Z

Hi! I'm curious if this is still a problem/ something you wanted to work on.

jbrockmendel added MultiIndex Resample resample method labels Jul 30, 2018

mroeschke added the Bug label May 11, 2020

mroeschke added Enhancement and removed Bug labels Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resampling on a MultiIndex #21283

resampling on a MultiIndex #21283

alex-git-rd commented Jun 1, 2018

gitgithan commented Dec 7, 2018 •

edited

nealxm commented Feb 20, 2022

resampling on a MultiIndex #21283

resampling on a MultiIndex #21283

Comments

alex-git-rd commented Jun 1, 2018

gitgithan commented Dec 7, 2018 • edited

nealxm commented Feb 20, 2022

gitgithan commented Dec 7, 2018 •

edited