Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
DataFrame.reset_index deletes index, does not all for ints as level arg #16263
Comments
|
This is related to passing the In [3]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).reset_index(level='not present')
Out[3]:
index A B C D
0 0 0.057457 0.065932 0.276079 0.305390
1 1 -0.562195 -0.385750 -0.228925 -0.426511
2 2 0.377559 -0.837031 -0.384840 -0.305262
3 3 -0.670057 -0.737446 0.561989 0.528754
In [4]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).reset_index(level=['not present'])
Out[4]:
index A B C D
0 0 0.613373 -0.169316 -0.592379 1.050764
1 1 0.069762 0.995308 0.030434 -0.361300
2 2 -0.526487 0.165054 0.015452 0.954447
3 3 0.585677 -1.435712 -0.298280 -0.581473
but a7a0574 changed the behaviour so that now, vice-versa, even valid level names/indices are not considered. I can provide a PR, the question is what we want to do with non-existent level names: raise or ignore? |
Sorry, the question is already answered by the behaviour when there is a In [5]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).set_index(['A', 'B']).reset_index(level=['A', 'E'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
610 'level number' % level)
--> 611 level = self.names.index(level)
612 except ValueError:
ValueError: 'E' is not in list
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-5-f1686d7d4dfc> in <module>()
----> 1 pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).set_index(['A', 'B']).reset_index(level=['A', 'E'])
/home/nobackup/repo/pandas/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
3016 if not isinstance(level, (tuple, list)):
3017 level = [level]
-> 3018 level = [self.index._get_level_number(lev) for lev in level]
3019 if isinstance(self.index, MultiIndex):
3020 if len(level) < self.index.nlevels:
/home/nobackup/repo/pandas/pandas/core/frame.py in <listcomp>(.0)
3016 if not isinstance(level, (tuple, list)):
3017 level = [level]
-> 3018 level = [self.index._get_level_number(lev) for lev in level]
3019 if isinstance(self.index, MultiIndex):
3020 if len(level) < self.index.nlevels:
/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
612 except ValueError:
613 if not isinstance(level, int):
--> 614 raise KeyError('Level %s not found' % str(level))
615 elif level < 0:
616 level += self.nlevels
KeyError: 'Level E not found'
|
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
May 6, 2017
|
|
toobaz |
d27ae91
|
toobaz
referenced
this issue
May 6, 2017
Merged
BUG: support for "level=" when reset_index() is called with a flat Index #16266
jorisvandenbossche
added the
Regression
label
May 6, 2017
jorisvandenbossche
added this to the
0.20.2
milestone
May 6, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
May 6, 2017
|
|
toobaz |
fd83502
|
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
May 6, 2017
|
|
toobaz |
f4c3121
|
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
May 6, 2017
|
|
toobaz |
7cc21c2
|
jreback
added the
Indexing
label
May 6, 2017
schwab
commented
May 6, 2017
•
|
Perhaps it was working by accident before, but the new behavior of completely dropping the index column when reset index is called seems problematic. Additionally, according to the docs for reset_index "For a standard index, the index name will be used..." which indicates now it's even out of sync with the documented spec. It also brings up the important question, if we do want to keep this behavior going forward, then what is the new "correct" way to remove a single column index from a dataframe while keeping its data? |
schwab
commented
May 6, 2017
•
|
@m4g005 To get this working the same in both version, you can try it without the level name. `data.reset_index(inplace=True) data`
|
This is going to be fixed, no doubt. |
m4g005 commentedMay 6, 2017
Code Sample, a copy-pastable example if possible
u'0.20.1'
Problem description
between v0.19.2 and v0.20.1, the behavior of DataFrame.reset_index changed.
With a single set index:
Expected Output
v0.19.2 results:
Output of
pd.show_versions()