DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

emsems · 2016-03-23T14:33:44Z

Hi,
there seems to be a bug in DataFrame.drop() when the DataFrame has a non-unique MultiIndex and one of the levels is a DatetimeIndex (labels of which I would like to pass to the drop-method)

Code Sample

import pandas as pd
from pandas import DataFrame
import numpy as np

# Prepare DataFrame
idx = pd.Index([0, 0, 1, 1, 1, 2, 3, 4, 4, 5], name='id')
idxdt = pd.to_datetime(['201603231200',
                        '201603231200',
                        '201603231300',
                        '201603231300',
                        '201603231400',
                        '201603231400',
                        '201603231500',
                        '201603231600',
                        '201603231600',
                        '201603231700'])
df = DataFrame(np.arange(30).reshape(10, 3), columns=list('abc'), index=idx)
df['tstamp'] = idxdt
df = df.set_index('tstamp', append=True)
print df

# Drop the following timestamp
ts = pd.Timestamp('201603231600')
df = df.drop(ts, level='tstamp')
print df

Expected Output

                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29
                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
5  2016-03-23 17:00:00  27  28  29

Current Output

                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29
                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 0.6
Cython: 0.23.3
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.0.2
xlrd: None
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.5.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-23T14:48:49Z

yeah this is broken, needs something like for the non-unique multi-index case, rather than how it is doing it now here

In [21]: df.loc[idx[:,~df.index.get_level_values('tstamp').isin([ts])], :]
Out[21]: 
                         a   b   c
id tstamp                         
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
5  2016-03-23 17:00:00  27  28  29

So the indexer should be
~axis.get_level_values(leel).isin(labels]

obviously not well tested :<

pull-requests welcomed!

This reverts commit 48158b8.

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Novice MultiIndex labels Mar 23, 2016

jreback added this to the 0.18.1 milestone Mar 23, 2016

homiziado added a commit to homiziado/pandas that referenced this issue Mar 25, 2016

Closes pandas-dev#12701

48158b8

homiziado added a commit to homiziado/pandas that referenced this issue Mar 25, 2016

Revert "Closes pandas-dev#12701"

d48bdd9

This reverts commit 48158b8.

jonaslb mentioned this issue Apr 3, 2016

BUG: DataFrame.drop() does nothing for non-unique Datetime MultiIndex #12783

Closed

4 tasks

jreback closed this as completed in 9f68a96 Apr 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

emsems commented Mar 23, 2016

jreback commented Mar 23, 2016

DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

Comments

emsems commented Mar 23, 2016

Code Sample

Expected Output

Current Output

output of pd.show_versions()

jreback commented Mar 23, 2016

output of `pd.show_versions()`