New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option #22644

Merged
merged 64 commits into from Oct 25, 2018

Conversation

Projects
None yet
4 participants
@mroeschke
Copy link
Member

mroeschke commented Sep 9, 2018

  • closes #8917
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Currently, users do not have any control over nonexistent datetime handling when tz_localizeing like they do ambiguous times. This adds a new keyword nonexistent to tz_localize so that users now can:

'raise': Raise an error (default)
'NaT': Replace nonexistent times with 'NaT'
'shift': Shift nonexistent times forward to the closest existing time

@pep8speaks

This comment has been minimized.

Copy link

pep8speaks commented Sep 9, 2018

Hello @mroeschke! Thanks for submitting the PR.

Matt Roeschke
@codecov

This comment has been minimized.

Copy link

codecov bot commented Sep 9, 2018

Codecov Report

Merging #22644 into master will increase coverage by <.01%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22644      +/-   ##
==========================================
+ Coverage   92.22%   92.22%   +<.01%     
==========================================
  Files         169      169              
  Lines       50911    50922      +11     
==========================================
+ Hits        46954    46965      +11     
  Misses       3957     3957
Flag Coverage Δ
#multiple 90.65% <94.11%> (ø) ⬆️
#single 42.28% <23.52%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 97.45% <100%> (+0.05%) ⬆️
pandas/core/generic.py 96.79% <83.33%> (-0.05%) ⬇️
pandas/util/testing.py 86.73% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c26375...8cf16e2. Read the comment docs.

@mroeschke mroeschke added this to the 0.24.0 milestone Sep 11, 2018

Show resolved Hide resolved pandas/_libs/tslibs/nattype.pyx
Show resolved Hide resolved pandas/_libs/tslibs/timestamps.pyx
@mroeschke

This comment has been minimized.

Copy link
Member

mroeschke commented Sep 19, 2018

@jreback re: overlap between errors and nonexistent

  1. The original issue mentions #8917 (comment) having the ability to control over ambiguous times vs nonexistent times independently.
    1a) nonexistent and ambiguous can handle their own errors in this PR independently
  2. This PR has the ability to shift the nonexistent time to a real time (like how ambiguous can take True/False

So I would propose that eventually we can depreciate errors and keep both ambiguous and nonexistent

Matt Roeschke
Matt Roeschke
@jorisvandenbossche
Copy link
Member

jorisvandenbossche left a comment

Some minor comments on the new changes

@@ -312,9 +312,13 @@ def test_dti_tz_localize_nonexistent_raise_coerce(self):
index.tz_localize(tz=tz)

with pytest.raises(pytz.NonExistentTimeError):
index.tz_localize(tz=tz, errors='raise')
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 11, 2018

Member

is this check_stacklevel=False needed? (I would think this is always the same stacklevel)

(and idem for the other ones)

@@ -648,15 +648,24 @@ def tz_localize(self, tz, ambiguous='raise', errors='raise'):
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
errors : {'raise', 'coerce'}, default 'raise'
nonexistent : 'shift', 'NaT' default 'raise'
- 'shift' will shift the nonexistent times forward to the closest

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 11, 2018

Member

Can you reuse the explanation about what a non-existent time is that is in the errors keyword below?

(and idem for the other occurences of this in docstrings)


result = index.tz_localize(tz=tz, errors='coerce')
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False,
clear=FutureWarning):

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 11, 2018

Member

why is this clear needed?

This comment has been minimized.

@mroeschke

mroeschke Oct 11, 2018

Member

I test for another FutureWarning above this one, so as I understand it, I need clear=FutureWarning to test for another FutureWarning in the same test.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 15, 2018

Member

I would expect the previous assert_produces_warning to take care of this to not leak this outside the context manager (but my experience is that with testing warnings expectations are often wrong .. :-))

Does it actually raise an error if you remove it?

Matt Roeschke added some commits Oct 11, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Oct 14, 2018

lgtm @jorisvandenbossche any more comments?

@@ -565,6 +565,8 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift' will shift the nonexistent time forward to the closest

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 15, 2018

Member

Sorry, rst formatting nitpick: there needs to be a blank line between the first sentences, and the start of this list ... (getting rst right can be annoying ..)

Matt Roeschke
@mroeschke

This comment has been minimized.

Copy link
Member

mroeschke commented Oct 17, 2018

Thanks @jorisvandenbossche. Added those blank lines for rendering.

@pytest.mark.parametrize('tz', ['Europe/Warsaw', 'dateutil/Europe/Warsaw'])
@pytest.mark.parametrize('method, exp', [
['shift', '2015-03-29 03:00:00'],
['NaT', pd.NaT],

This comment has been minimized.

@jreback

jreback Oct 18, 2018

Contributor

do you have tests that exericse the assertion when you pass a nonexistent keyword that is invalid?

This comment has been minimized.

@mroeschke

mroeschke Oct 18, 2018

Member

Just added a test for an invalid nonexistent keyword.

@@ -978,14 +979,26 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time
errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 19, 2018

Member
Suggested change Beta
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone
@@ -639,15 +639,27 @@ def tz_localize(self, tz, ambiguous='raise', errors='raise'):
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
errors : {'raise', 'coerce'}, default 'raise'
nonexistent : 'shift', 'NaT' default 'raise'
A nonexistent time doesn't not exist in a particular timezone

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 19, 2018

Member
Suggested change Beta
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone
@@ -8659,6 +8659,17 @@ def tz_localize(self, tz, axis=0, level=None, copy=True,
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time doesn't not exist in a particular timezone

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Oct 19, 2018

Member
Suggested change Beta
A nonexistent time doesn't not exist in a particular timezone
A nonexistent time does not exist in a particular timezone
@mroeschke

This comment has been minimized.

Copy link
Member

mroeschke commented Oct 19, 2018

Thanks for catching that typo @jorisvandenbossche

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Oct 24, 2018

one more rebase and I think ok to go

@jreback jreback merged commit 0a2d501 into pandas-dev:master Oct 25, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Oct 25, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the mroeschke:normalize_tz branch Oct 25, 2018

thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018

Merge remote-tracking branch 'repo_org/master' into Warning_prefix_pa…
…ndas

* repo_org/master: (23 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...

thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018

Merge remote-tracking branch 'repo_org/master' into check_np_pd_fromE…
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...

@mroeschke mroeschke referenced this pull request Oct 29, 2018

Merged

BUG/ENH: Handle NonexistentTimeError in date rounding #23406

4 of 4 tasks complete

brute4s99 added a commit to brute4s99/pandas that referenced this pull request Nov 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment