Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IndexSlice on MultiIndex includes out-of-range rows #12685

Closed
markroth8 opened this issue Mar 22, 2016 · 2 comments
Closed

BUG: IndexSlice on MultiIndex includes out-of-range rows #12685

markroth8 opened this issue Mar 22, 2016 · 2 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Timeseries
Milestone

Comments

@markroth8
Copy link
Contributor

Code Sample, a copy-pastable example if possible

>>> dft = pd.DataFrame(np.random.randn(100000,1),columns=['A'],index=pd.date_range('20130101',periods=100000,freq='T'))
>>> dft2 = pd.DataFrame(np.random.randn(200000,1),columns=['A'],index=pd.MultiIndex.from_product([dft.index, ['a', 'b']]))
>>> dft2.loc[pd.IndexSlice['2013-03':'2013-03',:],:]
                              A
2013-01-01 00:00:00 a  0.563968
2013-01-01 00:01:00 a -0.439376
2013-01-01 00:02:00 a  1.785202
2013-01-01 00:03:00 a  0.376901
2013-01-01 00:04:00 a -0.977926
2013-01-01 00:05:00 a -0.738415
2013-01-01 00:06:00 a -2.156905
2013-01-01 00:07:00 a -0.016085
2013-01-01 00:08:00 a  1.035935
2013-01-01 00:09:00 a -1.198991
2013-01-01 00:10:00 a -0.867717
2013-01-01 00:11:00 a -0.596923
2013-01-01 00:12:00 a -1.256923
2013-01-01 00:13:00 a  1.042369
2013-01-01 00:14:00 a  1.161365
2013-01-01 00:15:00 a  0.853495
2013-01-01 00:16:00 a  1.398594
2013-01-01 00:17:00 a -0.431314
2013-01-01 00:18:00 a  2.630920
2013-01-01 00:19:00 a -1.031731
2013-01-01 00:20:00 a -0.891799
2013-01-01 00:21:00 a  0.075546
2013-01-01 00:22:00 a -0.163999
2013-01-01 00:23:00 a  0.678027
2013-01-01 00:24:00 a  1.427323
2013-01-01 00:25:00 a  1.426418
2013-01-01 00:26:00 a  0.492158
2013-01-01 00:27:00 a  0.477629
2013-01-01 00:28:00 a -0.703225
2013-01-01 00:29:00 a  0.864293
...                         ...
2013-03-11 10:25:00 a  1.585880
                    b -0.595927
2013-03-11 10:26:00 a  0.259137
                    b -0.718385
2013-03-11 10:27:00 a -0.143240
                    b -0.898806
2013-03-11 10:28:00 a -0.293221
                    b -1.645180
2013-03-11 10:29:00 a -0.790069
                    b -1.075649
2013-03-11 10:30:00 a  0.368399
                    b  2.206858
2013-03-11 10:31:00 a  1.125287
                    b -0.834588
2013-03-11 10:32:00 a  0.740703
                    b -0.587527
2013-03-11 10:33:00 a  0.765259
                    b -0.662232
2013-03-11 10:34:00 a  2.138155
                    b -0.030379
2013-03-11 10:35:00 a -1.510801
                    b  1.200521
2013-03-11 10:36:00 a  0.934974
                    b  1.340875
2013-03-11 10:37:00 a -0.251199
                    b  1.728432
2013-03-11 10:38:00 a  1.651664
                    b -1.032225
2013-03-11 10:39:00 a -0.352417
                    b  0.378273

[115040 rows x 1 columns]

Expected Output

Data from 2013-01-01 through 2013-03-03 should not be present. Data from 2013-03-04 onward should not be present, either.

output of pd.show_versions()

INSTALLED VERSIONS

commit: 25cec3a
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0+31.g25cec3a
nose: 1.3.7
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: None
statsmodels: None
xarray: None
IPython: 4.1.1
sphinx: 1.3.5
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None

@markroth8 markroth8 changed the title IndexSlice on MultiIndex includes out-of-range rows BUG: IndexSlice on MultiIndex includes out-of-range rows Mar 22, 2016
@jreback jreback added Bug Timeseries Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Intermediate labels Mar 22, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 22, 2016
@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche
Copy link
Member

This seems to work correctly on master in the meantime:

In [12]: df = pd.DataFrame(np.random.randn(200,1), columns=['A'], index=pd.MultiIndex.from_product([pd.date_range('20130101',periods=100), ['a', 'b']]))

In [13]: df.loc[pd.IndexSlice['2013-03':'2013-03',:],:]
Out[13]:
                     A
2013-03-01 a -0.199156
           b  0.121741
2013-03-02 a  0.142018
           b  0.390804
2013-03-03 a -0.883441
           b  0.303635
2013-03-04 a -0.059659
           b  2.252698
...                ...
2013-03-28 a  1.232271
           b  0.735451
2013-03-29 a  0.519657
           b  0.469528
2013-03-30 a -0.814646
           b  0.149653
2013-03-31 a  0.758980
           b  0.089183

[62 rows x 1 columns]

@jorisvandenbossche
Copy link
Member

Possibly fixed by #13117, will add an explicit test for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Timeseries
Projects
None yet
Development

No branches or pull requests

3 participants