Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Preserve .names in df.set_index(df.index) #6459

Merged
merged 1 commit into from
Mar 4, 2014

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Feb 24, 2014

closes #6452.

This causes a slight change in behavior in @jseabold's second example. Previously, df.set_index(df.index) would convert a MultiIndex into an Index of tuples:

In [7]: from statsmodels.datasets import grunfeld

In [8]: data = sm.datasets.grunfeld.load_pandas().data

In [9]: data = data.set_index(['firm', 'year'])

In [10]: data.set_index(data.index).index.names
Out[10]: [None]

In [11]: data
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 220 entries, (General Motors, 1935.0) to (American Steel, 1954.0)
Data columns (total 3 columns):
invest     220  non-null values
value      220  non-null values
capital    220  non-null values
dtypes: float64(3)

In [13]: data.set_index(data.index)
Out[13]: 
<class 'pandas.core.frame.DataFrame'>
Index: 220 entries, (General Motors, 1935.0) to (American Steel, 1954.0)
Data columns (total 3 columns):
invest     220  non-null values
value      220  non-null values
capital    220  non-null values
dtypes: float64(3)

This change makes it so the index remains a MultiIndex.

@jreback
Copy link
Contributor

jreback commented Feb 24, 2014

this is not the way to fix this
instead catch the case of a passed index and treat it like a series rather than the ndarray case

@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 24, 2014

Treating it like a Series doesn't fix the issue described above. What should df.set_index([df.index, df.index]) return when the index is a MultiIndex? Currently (and in this patch), this would create an Index of two pairs of tuples.

The more correct behavior, in my opinion, would be to return a 4-level MultiIndex. This would require treating the MultiIndex as a DataFrame here or modify from_arrays to detect this case (presumably undesirable).

@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 24, 2014

@jreback This newest commit is what I'm thinking. Let me know if there's a more elegant way to get the columns out of a MultiIndex.

@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 25, 2014

@jreback, thanks for the suggestions. Most recent commit has those changes.

@jreback jreback added Bug and removed Bug labels Feb 25, 2014
@jreback jreback added this to the 0.14.0 milestone Feb 25, 2014
@jreback
Copy link
Contributor

jreback commented Feb 25, 2014

looks good. can you add a note to release.rst and v0.14.0.txt both in the API sections, reference this issue and provide a short explanation (prob just a 1-liner - fi you think more is warranted you can do this in v0.14.0.txt with an example - but only if not clear what the change does)

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 3, 2014

@jreback Added to notes and rebased. Sorry for the delay - I've been sick the last few days.

df = pd.util.testing.makeDataFrame()
df.index.name = 'name'

assert df.set_index(df.index).index.names == ['name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.assertEquals for these rather than a bare assert

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 4, 2014

@jreback Added an example to whatsnew. Let me know if you'd like this branch squashed/rebased.

@jreback
Copy link
Contributor

jreback commented Mar 4, 2014

looks good
pls squash down to 1-2 commits and good to merge

Preserve .names in df.set_index(df.index)

Check that df.set_index(df.index) doesn't convert a MultiIndex to an Index

Handle general case of df.set_index([df.index,...])

Cleanup

Add to release notes

Add equality checks

Fix issue on 2.6

Add example to whatsnew
@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 4, 2014

Alright, squashed and rebased.

jreback added a commit that referenced this pull request Mar 4, 2014
ENH: Preserve .names in df.set_index(df.index)
@jreback jreback merged commit 28f3af4 into pandas-dev:master Mar 4, 2014
@jreback
Copy link
Contributor

jreback commented Mar 4, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: set_index drops index name
2 participants