Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/TST: Add more timezone fixtures and use is_utc more consistently #23807

Merged
merged 25 commits into from
Nov 23, 2018

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Nov 20, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Was discussed a while back with @jreback to add more timezones to the tz_aware_fixture, added some less tested ones like:

  • dateutil.tz.tzlocal
  • pytz.FixedOffset

This uncovered some bugs with tzlocal especially near DST transitions.

Additionally, using is_utc more consistently for UTC checking across the codebase and no longer passing the string 'UTC' around. Instead the UTC timezone object (i.e. pytz.UTC or dateutil.tz.tzutc()) is passed along. However, for single value conversion to/from UTC, pytz.UTC is used by default.

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

@mroeschke mroeschke changed the title BUG/TST: Add more timezone fixtures and use is_utc more consistently [WIP] BUG/TST: Add more timezone fixtures and use is_utc more consistently Nov 20, 2018
@@ -1248,6 +1248,8 @@ Timezones
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` where an ``AmbiguousTimeError`` or ``NonExistentTimeError`` would raise if a timezone aware timeseries ended on a DST transition (:issue:`19375`, :issue:`10117`)
- Bug in :meth:`DataFrame.drop` and :meth:`Series.drop` when specifying a tz-aware Timestamp key to drop from a :class:`DatetimeIndex` with a DST transition (:issue:`21761`)
- Bug in :class:`DatetimeIndex` constructor where :class:`NaT` and ``dateutil.tz.tzlocal`` would raise an ``OutOfBoundsDatetime`` error (:issue:`23807`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'm guessing this happens for users for whom tzlocal is UTC-N but not those for whom tzlocal is UTC+N (@jorisvandenbossche can you check this when convenient?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually because NaT wasn't handled in tz_localize_to_utc so NaT was converted to its i8 value which was less than Timestamp.min

https://github.com/pandas-dev/pandas/pull/23807/files#diff-66d289312fd0b02e7721bf45fb281797R911

@@ -664,7 +670,8 @@ cdef inline int64_t[:] _tz_convert_dst(int64_t[:] values, tzinfo tz,


cdef inline int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz,
bint to_utc=True):
bint to_utc=True,
bint from_utc=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't from_utc just equal to not to_utc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it appears that way in general. I can try to remove the new keyword arg for not to_utc

@@ -354,7 +354,8 @@ def unique_nulls_fixture(request):


TIMEZONES = [None, 'UTC', 'US/Eastern', 'Asia/Tokyo', 'dateutil/US/Pacific',
'dateutil/Asia/Singapore']
'dateutil/Asia/Singapore', tzutc(), tzlocal(), FixedOffset(300),
FixedOffset(0), FixedOffset(-300)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brainstorm:

  • non-round numbers for FixedOffset
  • we have a small number of tests with tzinfos from psycopg2.tz and matplotlib (not sure where in the matplotlib namespace)
  • random subset of pytz.all_timezones (592 of them)
  • equivalent from dateutil.tz (not sure off the top of my head how to access that, but if we figure it out I'll make a PR there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much additional time do the tests take with these added fixtures? how many more tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add these in a follow up and further discuss.

@@ -8,6 +8,7 @@ from numpy cimport int64_t, int32_t, ndarray
cnp.import_array()

import pytz
from dateutil.tz import tzutc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you switching this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No switching, this is just used here for timezone conversion between tzlocal and tzutc (making sure they are both dateutil timezones so they play nice) https://github.com/pandas-dev/pandas/pull/23807/files/de3c83bb946a7791d29a18db5f004228b24c9b02#diff-66d289312fd0b02e7721bf45fb281797R705

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -354,7 +354,8 @@ def unique_nulls_fixture(request):


TIMEZONES = [None, 'UTC', 'US/Eastern', 'Asia/Tokyo', 'dateutil/US/Pacific',
'dateutil/Asia/Singapore']
'dateutil/Asia/Singapore', tzutc(), tzlocal(), FixedOffset(300),
FixedOffset(0), FixedOffset(-300)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much additional time do the tests take with these added fixtures? how many more tests

@@ -409,8 +409,8 @@ def test_to_datetime_tz_pytz(self, cache):
hour=3, minute=0))],
dtype=object)
result = pd.to_datetime(arr, utc=True, cache=cache)
expected = DatetimeIndex(['2000-01-01 08:00:00+00:00',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this surprising that you change this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah this was the wrong fix. Addressed

@jreback jreback added the Timezones Timezone data dtype label Nov 21, 2018
@mroeschke mroeschke changed the title [WIP] BUG/TST: Add more timezone fixtures and use is_utc more consistently BUG/TST: Add more timezone fixtures and use is_utc more consistently Nov 21, 2018
@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #23807 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23807      +/-   ##
==========================================
+ Coverage   92.28%   92.28%   +<.01%     
==========================================
  Files         161      161              
  Lines       51500    51501       +1     
==========================================
+ Hits        47528    47529       +1     
  Misses       3972     3972
Flag Coverage Δ
#multiple 90.68% <100%> (ø) ⬆️
#single 42.31% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 96.98% <ø> (ø) ⬆️
pandas/core/dtypes/dtypes.py 95.61% <100%> (ø) ⬆️
pandas/tseries/frequencies.py 97.08% <100%> (+0.01%) ⬆️
pandas/core/arrays/datetimes.py 98.51% <100%> (ø) ⬆️
pandas/io/pytables.py 92.3% <0%> (-0.05%) ⬇️
pandas/io/formats/printing.py 93.61% <0%> (+0.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e405a10...15ba5fb. Read the comment docs.

@jreback jreback added this to the 0.24.0 milestone Nov 23, 2018
@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

looks good. does this add a lot more testing? e.g. now that we are having a bigger tz matrix (and use this fixture a fair amount)

@mroeschke
Copy link
Member Author

Looking at two, 3.6 travis build, we will go from 36028 tests to 37606 tests (783.49 seconds to 931.50 seconds).

@jreback jreback merged commit 94ce05d into pandas-dev:master Nov 23, 2018
@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

thanks! of course if this does close any existing issues, pls update them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants