-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
BugIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselves
Milestone
Description
When calling set_index on an index with duplicates, the verify_integrity=True option correctly identifies the duplicates but this check appears to take place after the original columns have already been dropped when inplace=True is also passed. This results in data being lost.
I believe it would be better if the original DataFrame object was only modified in the case that the set_index operation is successful.
Code to reproduce the problem:
In [189]: df = DataFrame({'one':[1, 1, 2], 'two':[1,2,3]})
In [190]: df
Out[190]:
one two
0 1 1
1 1 2
2 2 3
In [191]: df.set_index(['one'], inplace=True, verify_integrity=True)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/mnt/hgfs/fastdata/<ipython-input-191-e1c0e8c92f6c> in <module>()
----> 1 df.set_index(['one'], inplace=True, verify_integrity=True)
/home/tobias/code/envs/mac/local/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, append, inplace, verify_integrity)
2328 if verify_integrity and not index.is_unique:
2329 duplicates = index.get_duplicates()
-> 2330 raise Exception('Index has duplicate keys: %s' % duplicates)
2331
2332 # clear up memory usage
Exception: Index has duplicate keys: [1]
In [192]: df
Out[192]:
two
0 1
1 2
2 3
In [202]: print sys.version
2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3]
In [203]: print pd.version.version
0.8.1
In [204]:
Metadata
Metadata
Assignees
Labels
BugIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselves