New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial indexing only valid for ordered time series #2437

Closed
hayd opened this Issue Dec 6, 2012 · 5 comments

Comments

Projects
None yet
3 participants
@hayd
Contributor

hayd commented Dec 6, 2012

This was mentioned on StackOverflow, I thought I ought to post it here. I'm not sure whether or not this is a bug:

You can select by a date string in an ordered tseries, but not in an ordered one:

import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2)
ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)
ts.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 365, Freq: D, Timezone: None

ts['2011-01-01']
# -1.1454418070543406
ts2.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-04-16 00:00:00, ..., 2011-03-10 00:00:00]
Length: 365, Freq: None, Timezone: None

ts2['2011-01-01'] # same for ts2.ix['2011-01-01']
TimeSeriesError                           Traceback (most recent call last)
<ipython-input-112-051e81424a0d> in <module>()
----> 1 ts2['2011-01-01']

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/core/series.pyc in __getitem__(self, key)
    462     def __getitem__(self, key):
    463         try:
--> 464             return self.index.get_value(self, key)
    465         except InvalidIndexError:
    466             pass

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in get_value(self, series, key)
   1015 
   1016             try:
-> 1017                 loc = self._get_string_slice(key)
   1018                 return series[loc]
   1019             except (TypeError, ValueError, KeyError):

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in _get_string_slice(self, key)
   1062         asdt, parsed, reso = parse_time_string(key, freq)
   1063         key = asdt
-> 1064         loc = self._partial_date_slice(reso, parsed)
   1065         return loc
   1066 

/Library/Python/2.7/site-packages/pandas-0.9.0-py2.7-macosx-10.7-intel.egg/pandas/tseries/index.pyc in _partial_date_slice(self, reso, parsed)
    977     def _partial_date_slice(self, reso, parsed):
    978         if not self.is_monotonic:
--> 979             raise TimeSeriesError('Partial indexing only valid for ordered time'
    980                                   ' series')
    981 

TimeSeriesError: Partial indexing only valid for ordered time series
@hayd

This comment has been minimized.

Contributor

hayd commented Dec 6, 2012

Now I think about it: it's not clear how to pick a Timestamp from a date unless they are ordered! Which explains this behaviour...

In the ordered case the first result which satisfies the partial date is selected.

In the unordered case returning sometimes returning a series would be confusing, and picking a "closest" wouldn't really make sense.

@ghost

This comment has been minimized.

ghost commented Dec 9, 2012

I respectfully disagree with aspects of this behavior as it stands. For example: whenever a user specifies a range of dates (in one of the several ways possible to do this), they can expect (quite reasonably) to receive a list of entries from an ordered time series. What's wrong with implementing this same behavior on an unordered series? Right now, for an unordered series, you get an error if you try to specify a date range. This can be fixed without impacting the question of which entry should be "first" .

This may not fly with you guys, but I'd even take it one step further. If a user specifies a single date as an index, they should receive all entries with the matching date. It's better behavior, because its consistent: if you ask for a month, you get the whole month; a year, you get the whole year; what's so special about a day that the behavior should suddenly change?

@hayd

This comment has been minimized.

Contributor

hayd commented Dec 9, 2012

@jbrdly that's a convincing argument (at least, you've convinced me).

In fact, this also seems inconsistent when you compare to the behaviour of Series:

In [1]: s = Series(np.arange(5), index=[1, 1, 1, 2, 3])

In [2]: s
Out[2]: 
1    0
1    1
1    2
2    3
3    4

In [3]: s.ix[1] # same as s[1]
Out[3]: 
1    0
1    1
1    2

In [4]: s.ix[2]
Out[4]: 3
@wesm

This comment has been minimized.

Member

wesm commented Jan 19, 2013

I guess the error message should read "not yet supported". Would be nice to fix this in the future

@jreback

This comment has been minimized.

Contributor

jreback commented Mar 22, 2013

@hayd this is all fixed up, note that the stamps will be returned in the same order as the index of the series (which makes sense)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment