Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resampling on a MultiIndex #21283

Open
alex-git-rd opened this issue Jun 1, 2018 · 2 comments
Open

resampling on a MultiIndex #21283

alex-git-rd opened this issue Jun 1, 2018 · 2 comments

Comments

@alex-git-rd
Copy link

I posed this issue on SO but was hoping to get a more detailed explanation. I have a df that has an (id, date) MultiIndex. I would like to resample the df within each id. Intuitively I thought the code should look something like:

df = pd.DataFrame(data = {"val": np.arange(30), 
                          "id": np.tile([1,2], 15),
                          "date": np.repeat(pd.date_range(start = "2000-01-01", periods = 15, name="date"), 2)
                         })

df = df.set_index(["id", "date"]).sort_index() 
df.groupby("id")["val"].resample(rule = "M", closed = "right", label = "right").apply(lambda x: np.sqrt(sum(x)/10))

This raises an exception and the working answer suggests that I only have date as the index leaving the id as a column, then group by that column and resample. Namely, instead of df.set_index(["id", "date"]).sort_index() just df.set_index("date").sort_index() and then everything else should work as is. I'm a bit confused why my original attempt failed. My understanding of groupby is that it will create an object that has grouping information and all the methods on that object respect that grouping information by operating on each group only. Specifically the index or column that the df is grouped on will be absent from the group subframe passed to any methods and will only be used to cat the results together. Therefore, I expected that resample would receive a df with only a date index rather than a MultiIndex. Am I think about this completely incorrectly?

@gitgithan
Copy link

gitgithan commented Dec 7, 2018

Looks like this is an issue not specific to just resample, but also any subsequently chained function that does not take in MultiIndex. I guess the implementation of groupby retains row labels as they are because that would offer the most amount of information and clarity to the user when he is dealing with the groups after groupby.
One possible question is would it be necessary to implement a pandas.MultiIndex.droplevel function when grouping on MultiIndex. (Is this what you are looking for?) That seems unnecessary as the working answer you mentioned has already proven, to flexibly use the values of a column to achieve the grouping rather than setting them into index to group.

@nealxm
Copy link
Contributor

nealxm commented Feb 20, 2022

Hi! I'm curious if this is still a problem/ something you wanted to work on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants