Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Indexing into Series of tz-aware datetime64s fails using __getitem__ #12089
Comments
JackKelly
referenced
this issue
in nilmtk/nilmtk
Jan 19, 2016
Closed
proportion_of_energy_submetered() raises an IndexError #471
jorisvandenbossche
added Bug Timeseries Regression Timezones
labels
Jan 19, 2016
jorisvandenbossche
added this to the
0.18.0
milestone
Jan 19, 2016
|
I can confirm this bug, also with current master. |
|
just need a @JackKelly want to do a PR? |
|
tests can go in the same place as in #12054 |
|
sure, I'll give it a go now... |
JackKelly
added a commit
to JackKelly/pandas
that referenced
this issue
Jan 19, 2016
|
|
JackKelly |
420c926
|
|
OK, I've attempted the fix. Here's the relevant commit on my fork of Pandas. However, this hasn't fixed the issue and I'm not sure what's best to do. My 'fix' has revealed a new issue. The problem appears to be that, now, when we do In [5]: dates = pd.date_range("2011-01-01", periods=3, tz='utc')
In [6]: dates
Out[6]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns, UTC]', freq='D')
In [7]: series = pd.Series(dates, index=['a', 'b', 'c'])
# Note the lack of timezone:
In [8]: series['a']
Out[8]: Timestamp('2011-01-01 00:00:00')
# But using `loc` we do get the timezone:
In [9]: series.loc['a']
Out[9]: Timestamp('2011-01-01 00:00:00+0000', tz='UTC')
In [10]: series
Out[10]:
a 2011-01-01 00:00:00+00:00
b 2011-01-02 00:00:00+00:00
c 2011-01-03 00:00:00+00:00
dtype: datetime64[ns, UTC]
In [11]: series['a'] == series.loc['a']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-0de902e8919c> in <module>()
----> 1 series['a'] == series.loc['a']
/home/jack/workspace/python/pandas/pandas/tslib.pyx in pandas.tslib._Timestamp.__richcmp__ (pandas/tslib.c:19258)()
971 (type(self).__name__, type(other).__name__))
972
--> 973 self._assert_tzawareness_compat(other)
974 return _cmp_scalar(self.value, ots.value, op)
975
/home/jack/workspace/python/pandas/pandas/tslib.pyx in pandas.tslib._Timestamp._assert_tzawareness_compat (pandas/tslib.c:19638)()
1000 if self.tzinfo is None:
1001 if other.tzinfo is not None:
-> 1002 raise TypeError('Cannot compare tz-naive and tz-aware '
1003 'timestamps')
1004 elif other.tzinfo is None:
TypeError: Cannot compare tz-naive and tz-aware timestamps |
|
yeh, prob some issues down the path. lmk if you get stuck. |
|
Hmm, I think this is way over my head to be honest. I'm really not very familiar with Pandas' internals. I have had a quick shot at getting to the bottom of it. Not sure if I've found any bugs or not. Here are my notes: Set up a debugging session in IPython like this: dates_with_tz = pd.date_range("2011-01-01", periods=3, tz="US/Eastern")
s_with_tz = pd.Series(dates_with_tz, index=['a', 'b', 'c'])
%debug s_with_tz['a']we find that: In ['2011-01-01T05:00:00.000000000+0000' '2011-01-02T05:00:00.000000000+0000'
'2011-01-03T05:00:00.000000000+0000']i.e. timezone is switched from "US/Eastern" to UTC. I've tried stepping into In [32]: s_with_tz._values._values
Out[32]:
array(['2011-01-01T05:00:00.000000000+0000',
'2011-01-02T05:00:00.000000000+0000',
'2011-01-03T05:00:00.000000000+0000'], dtype='datetime64[ns]')but I'm really not sure! Is In [33]: s_with_tz._values
Out[33]:
DatetimeIndex(['2011-01-01 00:00:00-05:00', '2011-01-02 00:00:00-05:00',
'2011-01-03 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [34]: s_with_tz
Out[34]:
a 2011-01-01 00:00:00-05:00
b 2011-01-02 00:00:00-05:00
c 2011-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern] |
jreback
referenced
this issue
Jan 27, 2016
Closed
BUG: getitem and a series with a non-ndarray values #12151
jreback
added a commit
to jreback/pandas
that referenced
this issue
Jan 27, 2016
|
|
jreback |
824ddbe
|
jreback
closed this
in 3152bdc
Jan 27, 2016
|
thank you @jreback :) |
JackKelly commentedJan 19, 2016
I'm a huge fan of Pandas. Thanks for all the hard work!
I believe I have stumbled across a small bug in Pandas 0.17.1 which was not present in 0.16.2. Indexing into Series of timezone-aware
datetime64s fails using__getitem__but indexing succeeds if thedatetime64s are timezone-naive. Here is a minimal code example and the exception produced by Pandas 0.17.1:If the dates are timezone-aware then we can access them using
locbut, as far as I'm aware, we should be able to use__getitem__in this situation too:However, if the dates are timezone-naive then indexing using
__getitem__works as expected:So indexing into a
Seriesusing__getitem__works if the data is a list of timezone-naivedatetime64s but indexing fails if thedatetime64s are timezone-aware.