Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Merge timezone aware data with DST #22825

Merged
merged 11 commits into from
Oct 1, 2018

Conversation

mroeschke
Copy link
Member

When merging timezone aware data using where, make the comparison in UTC to avoid relocalizing on an ambiguous time.

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

" scalar")

@pytest.mark.xfail(reason="ToDo: do not ignore timezone, must be object")
def test_where_index_datetimetz(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was bizarrely structured above. It was given the wrong arguments for everything to pass before it hit the xfail...

Good news is that one of the cases is passing, so I am splitting the tz-aware and tz-naive case and properly xfailing the tz-aware case (which was already failing on master and is orthogonal to this change)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an issue for this case? if not can you create one and reference it here

@codecov
Copy link

codecov bot commented Sep 25, 2018

Codecov Report

Merging #22825 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22825      +/-   ##
==========================================
+ Coverage   92.18%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50830    50833       +3     
==========================================
+ Hits        46860    46863       +3     
  Misses       3970     3970
Flag Coverage Δ
#multiple 90.6% <100%> (ø) ⬆️
#single 42.37% <11.11%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 98.11% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f1b842...e4491d8. Read the comment docs.

@@ -681,6 +681,7 @@ Timezones
- Bug in :meth:`DatetimeIndex.unique` that did not re-localize tz-aware dates correctly (:issue:`21737`)
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` where an ``AmbiguousTimeError`` or ``NonExistentTimeError`` would raise if a timezone aware timeseries ended on a DST transition (:issue:`19375`, :issue:`10117`)
- Bug in :func:`merge` when merging ``datetime64[ns, tz]`` data that contained a DST transition (:issue:`18885`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to reshaping

@@ -289,6 +289,7 @@ def _ensure_localized(self, result, ambiguous='raise'):
result : DatetimeIndex / i8 ndarray
ambiguous : str, bool, or bool-ndarray
default 'raise'
from_utc : bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the default here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add what this actually means (and similarly for to_utc below).

also maybe rename 'result' to array or input or something, I think these are holdover names and are confusing.

@@ -695,14 +699,17 @@ def astype(self, dtype, copy=True):
return super(DatetimeIndexOpsMixin, self).astype(dtype, copy=copy)


def _ensure_datetimelike_to_i8(other):
def _ensure_datetimelike_to_i8(other, to_utc=False):
""" helper for coercing an input scalar or array to i8 """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

""" helper for coercing an input scalar or array to i8 """
if is_scalar(other) and isna(other):
other = iNaT
elif isinstance(other, ABCIndexClass):
# convert tz if needed
if getattr(other, 'tz', None) is not None:
other = other.tz_localize(None).asi8
if to_utc:
other = other.tz_convert('UTC').asi8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the .asi8 lower? (and just do it on other) to avoid repeating code

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Sep 25, 2018
@mroeschke
Copy link
Member Author

Addressed your comments @jreback and all green.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very small comments. ping on green.

ambiguous : str, bool, or bool-ndarray
default 'raise'
from_utc : bool
default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default goes on the line before.

----------
other : 1d array
to_utc : bool
default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

" scalar")

@pytest.mark.xfail(reason="ToDo: do not ignore timezone, must be object")
def test_where_index_datetimetz(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an issue for this case? if not can you create one and reference it here

@jreback jreback added this to the 0.24.0 milestone Sep 26, 2018
@mroeschke
Copy link
Member Author

All green besides a unrelated, failed connection to a s3 parquet test.

df2 = pd.DataFrame([
pd.to_datetime('2017-10-29 03:00:00'),
pd.to_datetime('2017-10-29 04:00:00'),
pd.to_datetime('2017-10-29 05:00:00')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create this one as well with date_range to shorten it a bit? (and if not, I think it is more typical to pass a list of strings to to_datetime instead of calling it on each string)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can maybe also directly include the 'value' column in the constructor

@mroeschke
Copy link
Member Author

All green besides a unrelated, failed connection to a s3 parquet test.

@jreback jreback merged commit a277e4a into pandas-dev:master Oct 1, 2018
@jreback
Copy link
Contributor

jreback commented Oct 1, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the timezone_merge branch October 1, 2018 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AmbiguousTimeError merging two timezone-aware DataFrames with DST change
4 participants