Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.drop() does nothing for non-unique MultiIndex when attempting to drop from a level with DatetimeIndex #12701

Closed
emsems opened this issue Mar 23, 2016 · 1 comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@emsems
Copy link

emsems commented Mar 23, 2016

Hi,
there seems to be a bug in DataFrame.drop() when the DataFrame has a non-unique MultiIndex and one of the levels is a DatetimeIndex (labels of which I would like to pass to the drop-method)

Code Sample

import pandas as pd
from pandas import DataFrame
import numpy as np

# Prepare DataFrame
idx = pd.Index([0, 0, 1, 1, 1, 2, 3, 4, 4, 5], name='id')
idxdt = pd.to_datetime(['201603231200',
                        '201603231200',
                        '201603231300',
                        '201603231300',
                        '201603231400',
                        '201603231400',
                        '201603231500',
                        '201603231600',
                        '201603231600',
                        '201603231700'])
df = DataFrame(np.arange(30).reshape(10, 3), columns=list('abc'), index=idx)
df['tstamp'] = idxdt
df = df.set_index('tstamp', append=True)
print df

# Drop the following timestamp
ts = pd.Timestamp('201603231600')
df = df.drop(ts, level='tstamp')
print df

Expected Output

                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29
                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
5  2016-03-23 17:00:00  27  28  29

Current Output

                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29
                         a   b   c
id tstamp
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
4  2016-03-23 16:00:00  21  22  23
   2016-03-23 16:00:00  24  25  26
5  2016-03-23 17:00:00  27  28  29

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 0.6
Cython: 0.23.3
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.0.2
xlrd: None
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.5.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: None
@jreback
Copy link
Contributor

jreback commented Mar 23, 2016

yeah this is broken, needs something like for the non-unique multi-index case, rather than how it is doing it now here

In [21]: df.loc[idx[:,~df.index.get_level_values('tstamp').isin([ts])], :]
Out[21]: 
                         a   b   c
id tstamp                         
0  2016-03-23 12:00:00   0   1   2
   2016-03-23 12:00:00   3   4   5
1  2016-03-23 13:00:00   6   7   8
   2016-03-23 13:00:00   9  10  11
   2016-03-23 14:00:00  12  13  14
2  2016-03-23 14:00:00  15  16  17
3  2016-03-23 15:00:00  18  19  20
5  2016-03-23 17:00:00  27  28  29

So the indexer should be
~axis.get_level_values(leel).isin(labels]

obviously not well tested :<

pull-requests welcomed!

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Novice MultiIndex labels Mar 23, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 23, 2016
homiziado added a commit to homiziado/pandas that referenced this issue Mar 25, 2016
homiziado added a commit to homiziado/pandas that referenced this issue Mar 25, 2016
@jreback jreback closed this as completed in 9f68a96 Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants