Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Indexing with UTC offset string no longer ignored #25263

Merged
merged 26 commits into from
Feb 24, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d2716fd
BUG: Indexing with UTC offset string not longer ignored
Feb 11, 2019
f444b92
Add whatsnew
Feb 11, 2019
6994e77
Address failures
Feb 11, 2019
f02e669
isort
Feb 11, 2019
807a1f8
Add comment and move test
Feb 11, 2019
71b093e
lint
Feb 11, 2019
d48c0db
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 11, 2019
3bda5fa
Add test with mixed timestamp:string slice
Feb 11, 2019
c10bcff
Don't check if one end is None
Feb 12, 2019
6363f05
Be explicity with endpoints to check
Feb 12, 2019
9678ebb
Add datetime args in test
Feb 12, 2019
b24b773
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 12, 2019
77e1e4d
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 13, 2019
44ed2a4
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 14, 2019
c16e0cd
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 14, 2019
0ef291c
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 15, 2019
0a272a9
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 16, 2019
200cdbf
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 16, 2019
ea339b6
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 17, 2019
ed96516
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 19, 2019
6ac4598
Document change in own section and timeseries.rst
Feb 20, 2019
a99d9c4
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 20, 2019
6d66371
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 20, 2019
138ac7c
add versionadded tag to timeseries.rst
Feb 20, 2019
e0e3e25
Merge remote-tracking branch 'upstream/master' into datestring_offset…
Feb 21, 2019
bb4814a
fix formatting
Feb 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Interval
Indexing
^^^^^^^^

-
- Bug when a date string with a UTC offset would get ignored during indexing. (:issue:`24076`, :issue:`16785`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a signficant enough change that it needs a sub-section in api breaking (yes its a bug, but may lead to incorrect results, so better if its highly visible)

-
-

Expand Down
23 changes: 19 additions & 4 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@

import numpy as np

from pandas._libs import (
Timedelta, algos as libalgos, index as libindex, join as libjoin, lib,
tslibs)
from pandas._libs import (algos as libalgos, index as libindex,
join as libjoin, lib)
from pandas._libs.lib import is_datetime_array
from pandas._libs.tslibs import Timedelta, Timestamp, OutOfBoundsDatetime
from pandas._libs.tslibs.timezones import tz_compare
import pandas.compat as compat
from pandas.compat import range, set_function_name, u
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -447,7 +448,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
try:
return DatetimeIndex(subarr, copy=copy,
name=name, **kwargs)
except tslibs.OutOfBoundsDatetime:
except OutOfBoundsDatetime:
pass

elif inferred.startswith('timedelta'):
Expand Down Expand Up @@ -4866,6 +4867,20 @@ def slice_locs(self, start=None, end=None, step=None, kind=None):
# If it's a reverse slice, temporarily swap bounds.
start, end = end, start

# GH 16785: If start and end happen to be date strings with UTC offsets
# attempt to parse and check that the offsets are the same
if (isinstance(start, compat.string_types) and
isinstance(end, compat.string_types)):
try:
jreback marked this conversation as resolved.
Show resolved Hide resolved
ts_start = Timestamp(start)
ts_end = Timestamp(end)
except ValueError:
pass
else:
if not tz_compare(ts_start.tz, ts_end.tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this happen regardless if these are strings or not?

e.g. df[pd.Timestamp(...):'string']

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm didn't know we supported df[pd.Timestamp(...):'date_string']. If that's the case then definitely.

raise ValueError("Both date strings must have the same "
"UTC offset")

start_slice = None
if start is not None:
start_slice = self.get_slice_bound(start, 'left', kind)
Expand Down
76 changes: 37 additions & 39 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,8 @@
from pandas.core.ops import get_op_result_name
import pandas.core.tools.datetimes as tools

from pandas.tseries import offsets
from pandas.tseries.frequencies import Resolution, to_offset
from pandas.tseries.offsets import CDay, prefix_mapping
from pandas.tseries.offsets import CDay, prefix_mapping, Nano


def _new_DatetimeIndex(cls, d):
Expand Down Expand Up @@ -826,54 +825,53 @@ def _parsed_string_to_bounds(self, reso, parsed):
lower, upper: pd.Timestamp

"""
valid_resos = {'year', 'month', 'quarter', 'day', 'hour', 'minute',
'second', 'minute', 'second', 'microsecond'}
if reso not in valid_resos:
raise KeyError
if reso == 'year':
return (Timestamp(datetime(parsed.year, 1, 1), tz=self.tz),
Timestamp(datetime(parsed.year, 12, 31, 23,
59, 59, 999999), tz=self.tz))
start = Timestamp(parsed.year, 1, 1)
end = Timestamp(parsed.year, 12, 31, 23, 59, 59, 999999)
elif reso == 'month':
d = ccalendar.get_days_in_month(parsed.year, parsed.month)
return (Timestamp(datetime(parsed.year, parsed.month, 1),
tz=self.tz),
Timestamp(datetime(parsed.year, parsed.month, d, 23,
59, 59, 999999), tz=self.tz))
start = Timestamp(parsed.year, parsed.month, 1)
end = Timestamp(parsed.year, parsed.month, d, 23, 59, 59, 999999)
elif reso == 'quarter':
qe = (((parsed.month - 1) + 2) % 12) + 1 # two months ahead
d = ccalendar.get_days_in_month(parsed.year, qe) # at end of month
return (Timestamp(datetime(parsed.year, parsed.month, 1),
tz=self.tz),
Timestamp(datetime(parsed.year, qe, d, 23, 59,
59, 999999), tz=self.tz))
start = Timestamp(parsed.year, parsed.month, 1)
end = Timestamp(parsed.year, qe, d, 23, 59, 59, 999999)
elif reso == 'day':
st = datetime(parsed.year, parsed.month, parsed.day)
return (Timestamp(st, tz=self.tz),
Timestamp(Timestamp(st + offsets.Day(),
tz=self.tz).value - 1))
start = Timestamp(parsed.year, parsed.month, parsed.day)
end = start + timedelta(days=1) - Nano(1)
elif reso == 'hour':
st = datetime(parsed.year, parsed.month, parsed.day,
hour=parsed.hour)
return (Timestamp(st, tz=self.tz),
Timestamp(Timestamp(st + offsets.Hour(),
tz=self.tz).value - 1))
start = Timestamp(parsed.year, parsed.month, parsed.day,
parsed.hour)
end = start + timedelta(hours=1) - Nano(1)
elif reso == 'minute':
st = datetime(parsed.year, parsed.month, parsed.day,
hour=parsed.hour, minute=parsed.minute)
return (Timestamp(st, tz=self.tz),
Timestamp(Timestamp(st + offsets.Minute(),
tz=self.tz).value - 1))
start = Timestamp(parsed.year, parsed.month, parsed.day,
parsed.hour, parsed.minute)
end = start + timedelta(minutes=1) - Nano(1)
elif reso == 'second':
st = datetime(parsed.year, parsed.month, parsed.day,
hour=parsed.hour, minute=parsed.minute,
second=parsed.second)
return (Timestamp(st, tz=self.tz),
Timestamp(Timestamp(st + offsets.Second(),
tz=self.tz).value - 1))
start = Timestamp(parsed.year, parsed.month, parsed.day,
parsed.hour, parsed.minute, parsed.second)
end = start + timedelta(seconds=1) - Nano(1)
elif reso == 'microsecond':
st = datetime(parsed.year, parsed.month, parsed.day,
parsed.hour, parsed.minute, parsed.second,
parsed.microsecond)
return (Timestamp(st, tz=self.tz), Timestamp(st, tz=self.tz))
else:
raise KeyError
start = Timestamp(parsed.year, parsed.month, parsed.day,
parsed.hour, parsed.minute, parsed.second,
parsed.microsecond)
end = start + timedelta(microseconds=1) - Nano(1)
if parsed.tzinfo is not None:
jreback marked this conversation as resolved.
Show resolved Hide resolved
if self.tz is None:
raise ValueError("The index must be timezone aware "
"when indexing with a date string with a "
"UTC offset")
start = start.tz_localize(parsed.tzinfo).tz_convert(self.tz)
end = end.tz_localize(parsed.tzinfo).tz_convert(self.tz)
elif self.tz is not None:
start = start.tz_localize(self.tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this case tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tested by test_resample_dst_anchor in pandas/tests/resample/test_datetime_index.py

end = end.tz_localize(self.tz)
return start, end

def _partial_date_slice(self, reso, parsed, use_lhs=True, use_rhs=True):
is_monotonic = self.is_monotonic
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/test_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def test_stringified_slice_with_tz(self):
# GH#2658
import datetime
start = datetime.datetime.now()
idx = date_range(start=start, freq="1d", periods=10)
idx = date_range(start=start, freq="1d", periods=10, tz='US/Eastern')
jreback marked this conversation as resolved.
Show resolved Hide resolved
df = DataFrame(lrange(10), index=idx)
df["2013-01-14 23:44:34.437768-05:00":] # no exception here

Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/indexing/test_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from dateutil import tz
import numpy as np
import pytest

import pandas as pd
from pandas import DataFrame, Index, Series, Timestamp, date_range
Expand Down Expand Up @@ -313,3 +314,20 @@ def test_loc_setitem_with_existing_dst(self):
columns=['value'],
dtype=object)
tm.assert_frame_equal(result, expected)

def test_getitem_with_datestring_with_UTC_offset(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think our partial timestamp indexing tests are all over the place, so put in pandas/tests/indexes/datetimes/test_partial_slicing.py (future PR to consolidate tests in test_datetime.py to there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

# GH 24076
idx = pd.date_range(start='2018-12-02 14:50:00-07:00',
end='2018-12-03 15:00:00-07:00', freq='1min')
df = pd.DataFrame(1, index=idx, columns=['A'])
result = df['2018-12-02 21:50:00+00:00':'2018-12-02 21:52:00+00:00']
expected = df.iloc[0:3, :]
tm.assert_frame_equal(result, expected)

# GH 16785
with pytest.raises(ValueError, match="Both date strings"):
df['2018-12-02 21:50:00+00:00':'2018-12-02 21:52:00+01:00']

with pytest.raises(ValueError, match="The index must be timezone"):
df = df.tz_localize(None)
df['2018-12-02 21:50:00+00:00':'2018-12-02 21:52:00+00:00']