KeyError on slicing with datetime or pandas.Timestamp #5821

vfilimonov · 2014-01-02T12:40:54Z

related #6066 (on Float64Index too)

The 'KeyError' on slicing was discussed multiple times but I'm still not sure if the issue below is a bug or just my misunderstanding.

I'm experiencing 'KeyError' from time to time when I try to slice my dataframes with datetime or Timestamp objects, however slicing with strings works perfectly. I was unable to construct synthetic example with pandas.date_range, so I needed to upload the piece of real data where the issue appears:
https://www.dropbox.com/s/ibzbwqs35tiydyc/tmp.h5

df = pd.read_hdf('tmp.h5', 'data')
print df.head()

When I try to slice it with the pandas.Timestamp objects it results in 'KeyError':

tt1 = pd.Timestamp('2006-11-16 18:30:00')
tt2 = pd.Timestamp('2006-11-16 18:40:00')
print df.ix[tt1:tt2].head()

Same for datetime objects:

tt1 = pd.Timestamp('2006-11-16 18:30:00').to_datetime()
tt2 = pd.Timestamp('2006-11-16 18:40:00').to_datetime()
print df.ix[tt1:tt2].head()

However this slicing works perfectly:

tt1 = pd.Timestamp('2006-11-16 18:30:00').strftime('%Y-%m-%d %H:%M:%S')
tt2 = pd.Timestamp('2006-11-16 18:40:00').strftime('%Y-%m-%d %H:%M:%S')
print df[tt1:tt2].head()

Numpy version 1.8.0
Pandas version 0.13.0

The text was updated successfully, but these errors were encountered:

jreback · 2014-01-02T12:59:06Z

Your index is not monotonic (e.g. sorted). This is more of an incorrect error report

If you sort it it works (with exact indexes or not)

In [14]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06.075980')]
Out[14]: 
                                  val
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:06.075980+00:00    0
2006-11-16 18:30:06.075980+00:00    0

[5 rows x 1 columns]

In [15]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06')]
Out[15]: 
                                  val
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0

[3 rows x 1 columns]

In [16]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04'):Timestamp('2006-11-16 18:30:06')]
Out[16]: 
                                  val
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0
2006-11-16 18:30:04.101497+00:00    0

[3 rows x 1 columns]

This should be a ValueError I think; so its a 'bug' on the error report

In [17]: df.ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06.075980')]
KeyError: 'cannot peform a slice operation on a non-unique non-monotonic index'

@jtratner agree?

vfilimonov · 2014-01-02T18:24:26Z

@jreback thanks for pointing out the problem with my dataset! I was not aware of it.

jreback · 2014-04-09T13:00:23Z

closing as not a bug

KristianHolsheimer · 2016-08-23T09:30:02Z

Hi there, I know this issue is closed, since it's not a bug. I would argue, though, that the error message could point the user in the right direction (I googled the error message and it lead me here).

leonsas · 2016-08-31T15:54:59Z

I second Kristian. And I'll provide a bit more context which can be helpful:

trimmed_series = series[event.startTime:event.endTime]

Which throws KeyError: 1472641846000000000. Initially I thought that maybe event.endTime happens to be after the last index of series, and that maybe pandas isn't smart in dealing with out of bound errors. Obviously pandas does deal with this and it was the index not being sorted (which was a bug elsewhere in my code), which I found here. But it definitely isn't intuitive.

jreback · 2016-08-31T16:27:33Z

@leonsas @KristianHolsheimer

if you have a nice reproducible example, pls open a new issue (and xref this one). This is well-defined behavior. I closed this because its not a bug, though, it could/should be a ValueError with an even more informative error message.

kdebrab · 2017-01-12T17:39:04Z

As requested by @jreback, I hereby create a nice reproducible example:

import pandas as pd
index = pd.date_range('2016-10-29 23:00', '2016-10-30 3:00', freq='15T', tz='UTC')
index = index.tz_convert('Europe/Brussels').tz_localize(None)
ts = pd.Series(1, index=index)
ts.truncate(before='2016-10-30 2:10')

which raises:

KeyError: 1477793400000000000L

In above example, the index is non-sorted due to the naive local time, comprising a switch from Summer to Winter time.

Slicing the data as:

ts[pd.Timestamp('2016-10-30 2:10'):]

returns the same KeyError.

Truncating with a time that is contained inside the index works just fine (unless you choose a non-unique label):

ts.truncate(before='2016-10-30 1:30')

Interestingly:

ts['2016-10-30 2:10':]

raises no error, returning the correct (non-sorted !) result. Even when choosing a duplicate label!

All different operations work on the sorted series, e.g.:

ts.sort_index().truncate(before='2016-10-30 2:10')

Though now the result is sorted as well of course.

I'm using pandas version 0.19.1

jreback · 2017-01-12T19:04:25Z

@kdebrab the only thing that would be nicer would be the actual KeyError message for a not-found label. (It should show it as a Timestamp). So would take a fix for that.

you can open a new issue, or push a PR if that works for you.

jorisvandenbossche · 2017-01-12T21:15:16Z

The truncate does just a slice under the hood, so the example of @kdebrab becomes:

In [12]: ts[pd.Timestamp('2016-1-1'):]
...
KeyError: 1451606400000000000

So which raises an error, because you are slicing with a non-present value on a non-sorted index.
To give a more simple example for this:

In [33]: s = pd.Series(range(5), index=[1,2,3,2,4])

# works because '1' is present (and is unique), so unsortedness does not matter
In [34]: s.loc[1:]
Out[34]: 
1    0
2    1
3    2
2    3
4    4
dtype: int64

# fails because '0' is not a label, because of unsortedness pandas cannot know how to handle this
In [35]: s.loc[0:]
...
KeyError: 0

I think it really be nice to have a better error message for this, as the difference between sorted/not-sorted can be very subtle, and with a sorted index, slicing with a non-present label is perfectly fine.

@jreback The only thing I don't directly find clear is why does the same example work with strings?

In [13]: ts['2016-1-1':]
Out[13]: 
2016-10-30 01:00:00    1
2016-10-30 01:15:00    1
...
2016-10-30 03:45:00    1
2016-10-30 04:00:00    1
Freq: 15T, dtype: int64

Shouldn't this be equivalent to ts[pd.Timestamp('2016-1-1'):]? Or at least in the case of ts['2016-1-1 00:00:00':] (which also works) where it cannot be seen as partial string?

jreback · 2017-01-12T21:21:23Z

this works with strings because a string turns into partial timestamp indexing and thus is a slice, and hence works.

so maybe we ought to always make truncate a slice, then this will just work. (internally its a scalar being passed and NOT a slice), so this is pretty easy to 'fix'.

and I agree that this should work.

So let's create 2 new issues for this

better error message with a datetime aware (e.g. it should show the converted key, not the internal repr)
fix .truncate to always be slice like

jorisvandenbossche · 2017-01-12T21:25:18Z

@jreback I thought partial strings only turned into a slice when the string has lower resolution than the series. In the case of ts['2016-1-1 00:00:00':] this is certainly not the case, and should be treated as a slice with an exact Timestamp?

jreback · 2017-01-12T21:28:55Z

In [7]: ts.index.resolution
Out[7]: 'minute'

so I think its always treated as a slice (in this example)

This is essentially a workaround around pandas-dev/pandas#5821.

DanielHabenicht · 2023-02-13T00:10:35Z

So this was never settled?
I think I will create a new bug for it as it just cost me almost an hour.

jreback mentioned this issue Jan 24, 2014

df[0:1] row select fails with Float64Index on 0.13.0 #6066

Closed

jreback closed this as completed Apr 9, 2014

jreback added the Usage Question label Apr 9, 2014

aptiko added a commit to aptiko/loggertodb that referenced this issue May 27, 2019

Improve error message when file order is wrong (fixes openmeteo#12)

23ec7f5

This is essentially a workaround around pandas-dev/pandas#5821.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError on slicing with datetime or pandas.Timestamp #5821

KeyError on slicing with datetime or pandas.Timestamp #5821

vfilimonov commented Jan 2, 2014

jreback commented Jan 2, 2014

vfilimonov commented Jan 2, 2014

jreback commented Apr 9, 2014

KristianHolsheimer commented Aug 23, 2016

leonsas commented Aug 31, 2016

jreback commented Aug 31, 2016

kdebrab commented Jan 12, 2017 •

edited

jreback commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017 •

edited

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017

DanielHabenicht commented Feb 13, 2023

KeyError on slicing with datetime or pandas.Timestamp #5821

KeyError on slicing with datetime or pandas.Timestamp #5821

Comments

vfilimonov commented Jan 2, 2014

jreback commented Jan 2, 2014

vfilimonov commented Jan 2, 2014

jreback commented Apr 9, 2014

KristianHolsheimer commented Aug 23, 2016

leonsas commented Aug 31, 2016

jreback commented Aug 31, 2016

kdebrab commented Jan 12, 2017 • edited

jreback commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017 • edited

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017

DanielHabenicht commented Feb 13, 2023

kdebrab commented Jan 12, 2017 •

edited

jreback commented Jan 12, 2017 •

edited