Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Multiindexing slicing of dates #3843

Open
hayd opened this Issue · 8 comments

2 participants

@hayd
Collaborator

Should I be able to slice like follows:

In [1]: df
Out[1]:
          report_date  item_id  views category
0 2013-06-01 00:00:00        2      3        a
1 2013-06-01 00:00:00        2      2        b
2 2013-06-01 00:00:00        5     16        a
3 2013-06-01 00:00:00        2      4        c
4 2013-06-01 00:00:00        2      5        d

In [2]: pivot = df.pivot_table(values=['views'], rows=['report_date','item_id'], aggfunc='sum')

In [3]: pivot
Out[3]:
                     views
report_date item_id
2013-06-01  2           14
            5           16

In [4]: pivot.ix[('2013-6-01', 3):('2013-06-1', 6)]
Out[4]:
                     views
report_date item_id
2013-06-01  5           16

But if I try and slice from outside any of the ranges, I get an exception:

In [21]: pivot.ix[('2013-6-01',3):('2013-06-02', 6)]
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-21-390c2ec15fd2> in <module>()
----> 1 pivot.ix[('2013-6-01',3):('2013-06-02', 6)]

/Users/234BroadWalk/pandas/pandas/core/indexing.pyc in __getitem__(self, key)
     46             return self._getitem_tuple(key)
     47         else:
---> 48             return self._getitem_axis(key, axis=0)
     49
     50     def _get_label(self, label, axis=0):

/Users/234BroadWalk/pandas/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
    390         labels = self.obj._get_axis(axis)
    391         if isinstance(key, slice):
--> 392             return self._get_slice_axis(key, axis=axis)
    393         elif _is_list_like(key) and not (isinstance(key, tuple) and
    394                                          isinstance(labels, MultiIndex)):

/Users/234BroadWalk/pandas/pandas/core/indexing.pyc in _get_slice_axis(self, slice_obj, axis)
    652         else:
    653             try:
--> 654                 indexer = labels.slice_indexer(start, stop, slice_obj.step)
    655             except Exception:
    656                 if _is_index_slice(slice_obj):

/Users/234BroadWalk/pandas/pandas/core/index.pyc in slice_indexer(self, start, end, step)
   1199         This function assumes that the data is sorted, so use at your own peril
   1200         """
-> 1201         start_slice, end_slice = self.slice_locs(start, end)
   1202
   1203         # return a slice

/Users/234BroadWalk/pandas/pandas/core/index.pyc in slice_locs(self, start, end, strict)
   2276             if not isinstance(end, tuple):
   2277                 end = end,
-> 2278             end_slice = self._partial_tup_index(end, side='right')
   2279
   2280         return start_slice, end_slice

/Users/234BroadWalk/pandas/pandas/core/index.pyc in _partial_tup_index(self, tup, side)
   2293             if lab not in lev:
   2294                 if not lev.is_type_compatible(lib.infer_dtype([lab])):
-> 2295                     raise Exception('Level type mismatch: %s' % lab)
   2296
   2297                 # short circuit

Exception: Level type mismatch: 2013-06-02

see this SO question.

@jreback
Owner

why don't u reset index, subset on the dates and/or id?

the problem with selecting via a tuple range is a) index must be sorted and b) the code is complicated with dates

mark as a bug for. 0.12 I guess

@jreback
Owner

Can't seem to repru the isee you mention in the link just above

In [36]: df = DataFrame(1.,columns=['A'],index=date_range('2010-08-31 12:36:53',freq='5min',periods=10))

In [37]: df
Out[37]: 
                     A
2010-08-31 12:36:53  1
2010-08-31 12:41:53  1
2010-08-31 12:46:53  1
2010-08-31 12:51:53  1
2010-08-31 12:56:53  1
2010-08-31 13:01:53  1
2010-08-31 13:06:53  1
2010-08-31 13:11:53  1
2010-08-31 13:16:53  1
2010-08-31 13:21:53  1

In [38]: df['2010-07':'2010-10']
Out[38]: 
                     A
2010-08-31 12:36:53  1
2010-08-31 12:41:53  1
2010-08-31 12:46:53  1
2010-08-31 12:51:53  1
2010-08-31 12:56:53  1
2010-08-31 13:01:53  1
2010-08-31 13:06:53  1
2010-08-31 13:11:53  1
2010-08-31 13:16:53  1
2010-08-31 13:21:53  1

In [39]: df['2010-7':'2010-10']
Out[39]: 
                     A
2010-08-31 12:36:53  1
2010-08-31 12:41:53  1
2010-08-31 12:46:53  1
2010-08-31 12:51:53  1
2010-08-31 12:56:53  1
2010-08-31 13:01:53  1
2010-08-31 13:06:53  1
2010-08-31 13:11:53  1
2010-08-31 13:16:53  1
2010-08-31 13:21:53  1
@jreback
Owner

@hayd I think the example indicated (the pivot) is actually not a bug, but a misuse; you can't slice on labels that are not included in the index, whether date or not (if they ARE STRINGS)

In [76]: pivot.ix[('2013-06-01',2):(Timestamp('20130602'),2)]
Out[76]: 
                     views
report_date item_id       
2013-06-01  2           14
            5           16

so I guess it is a buglet then; here treating datetimes different from the strings....hmmm

@hayd
Collaborator

How fun! (I thought I'd played around with things like that.)

@jreback
Owner

@hayd close this? or make a bug/enhancement out of it?

@hayd
Collaborator

you can't slice on labels that are not included in the index, whether date or not (if they ARE STRINGS)

So you can do it with strings, even in a mi, just not when they are dates?

Should we pull out that as a someday feature?

@jreback
Owner

it's a bug/feature just have to track down
move to 0.14

@jreback jreback modified the milestone: 0.15.0, 0.14.0
@jreback jreback modified the milestone: 0.16.0, Next Major Release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.