Partial indexing only valid for ordered time series #2437

hayd · 2012-12-06T12:09:25Z

This was mentioned on StackOverflow, I thought I ought to post it here. I'm not sure whether or not this is a bug:

You can select by a date string in an ordered tseries, but not in an ordered one:

import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2)

ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)

ts.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 365, Freq: D, Timezone: None

ts['2011-01-01']
# -1.1454418070543406

ts2.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-04-16 00:00:00, ..., 2011-03-10 00:00:00]
Length: 365, Freq: None, Timezone: None

ts2['2011-01-01'] # same for ts2.ix['2011-01-01']

TimeSeriesError                           Traceback (most recent call last)
<ipython-input-112-051e81424a0d> in <module>()
----> 1 ts2['2011-01-01']

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/core/series.pyc in __getitem__(self, key)
    462     def __getitem__(self, key):
    463         try:
--> 464             return self.index.get_value(self, key)
    465         except InvalidIndexError:
    466             pass

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in get_value(self, series, key)
   1015 
   1016             try:
-> 1017                 loc = self._get_string_slice(key)
   1018                 return series[loc]
   1019             except (TypeError, ValueError, KeyError):

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in _get_string_slice(self, key)
   1062         asdt, parsed, reso = parse_time_string(key, freq)
   1063         key = asdt
-> 1064         loc = self._partial_date_slice(reso, parsed)
   1065         return loc
   1066 

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in _partial_date_slice(self, reso, parsed)
    977     def _partial_date_slice(self, reso, parsed):
    978         if not self.is_monotonic:
--> 979             raise TimeSeriesError('Partial indexing only valid for ordered time'
    980                                   ' series')
    981 

TimeSeriesError: Partial indexing only valid for ordered time series

The text was updated successfully, but these errors were encountered:

hayd · 2012-12-06T12:19:22Z

Now I think about it: it's not clear how to pick a Timestamp from a date unless they are ordered! Which explains this behaviour...

In the ordered case the first result which satisfies the partial date is selected.

In the unordered case returning sometimes returning a series would be confusing, and picking a "closest" wouldn't really make sense.

ghost · 2012-12-09T23:33:09Z

I respectfully disagree with aspects of this behavior as it stands. For example: whenever a user specifies a range of dates (in one of the several ways possible to do this), they can expect (quite reasonably) to receive a list of entries from an ordered time series. What's wrong with implementing this same behavior on an unordered series? Right now, for an unordered series, you get an error if you try to specify a date range. This can be fixed without impacting the question of which entry should be "first" .

This may not fly with you guys, but I'd even take it one step further. If a user specifies a single date as an index, they should receive all entries with the matching date. It's better behavior, because its consistent: if you ask for a month, you get the whole month; a year, you get the whole year; what's so special about a day that the behavior should suddenly change?

hayd · 2012-12-09T23:57:43Z

@jbrdly that's a convincing argument (at least, you've convinced me).

In fact, this also seems inconsistent when you compare to the behaviour of Series:

In [1]: s = Series(np.arange(5), index=[1, 1, 1, 2, 3])

In [2]: s
Out[2]: 
1    0
1    1
1    2
2    3
3    4

In [3]: s.ix[1] # same as s[1]
Out[3]: 
1    0
1    1
1    2

In [4]: s.ix[2]
Out[4]: 3

wesm · 2013-01-19T22:10:48Z

I guess the error message should read "not yet supported". Would be nice to fix this in the future

jreback · 2013-03-22T01:13:29Z

@hayd this is all fixed up, note that the stamps will be returned in the same order as the index of the series (which makes sense)

jreback mentioned this issue Mar 22, 2013

ENH: extend selection semantics on ordered timeseries to unordered #3133

Closed

jreback mentioned this issue Mar 22, 2013

ENH: GH2437 added selection to an unordered timeseries #3136

Merged

jreback closed this as completed Mar 22, 2013

jreback mentioned this issue Apr 24, 2013

BUG: selection from unordered time-series has incorrect / odd behavior #3448

Closed

jbrockmendel mentioned this issue Jan 17, 2020

ENH: partial string indexing on non-monotonic PeriodIndex #31096

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial indexing only valid for ordered time series #2437

Partial indexing only valid for ordered time series #2437

hayd commented Dec 6, 2012

hayd commented Dec 6, 2012

ghost commented Dec 9, 2012

hayd commented Dec 9, 2012

wesm commented Jan 19, 2013

jreback commented Mar 22, 2013

Partial indexing only valid for ordered time series #2437

Partial indexing only valid for ordered time series #2437

Comments

hayd commented Dec 6, 2012

hayd commented Dec 6, 2012

ghost commented Dec 9, 2012

hayd commented Dec 9, 2012

wesm commented Jan 19, 2013

jreback commented Mar 22, 2013