REF/API: DatetimeTZDtype #23990

TomAugspurger · 2018-11-29T03:12:23Z

A cleanup of DatetimeTZDtype

Remove magic constructor from string. It seemed dangerous to overload unit with either unit or a full datetime64[ns, tz] string. This required changing DatetimeTZDtype.construct_from_string and changing a number of places to use construct_from_string rather than the main __new__
Change __new__ to __init__ and remove caching. It seemed to be causing more problems that it was worth. You could too easily create nonsense DatetimeTZDtypes like

In [3]: t = pd.core.dtypes.dtypes.DatetimeTZDtype(unit=None, tz=None)

Change .tz and .unit to properties instead of attributes. I've not provided setters. We could in theory, since we're getting rid of caching, but I'd rather wait till there's a demand..

The remaining changes in the DatetimeArray PR will be to

Inherit from ExtensionDtype
Implement construct_array_type
Register the dtype

* Remove magic constructor from string * Remove Caching The remaining changes in the DatetimeArray PR will be to 1. Inherit from ExtensionDtype 2. Implement construct_array_type 3. Register

pep8speaks · 2018-11-29T03:12:27Z

Hello @TomAugspurger! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/dtypes/common.py !
There are no PEP8 issues in the file pandas/core/dtypes/dtypes.py !
There are no PEP8 issues in the file pandas/tests/dtypes/test_common.py !
There are no PEP8 issues in the file pandas/tests/dtypes/test_dtypes.py !
There are no PEP8 issues in the file pandas/tests/dtypes/test_missing.py !

pandas/tests/dtypes/test_dtypes.py

jreback · 2018-11-29T03:22:15Z

pls perf test things

caching is essential as these are created quite a lot and they are the same virtually every time

TomAugspurger · 2018-11-29T12:11:02Z

On microbenchmarks, things are fine

master:

In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 39 µs, sys: 31 µs, total: 70 µs
Wall time: 75.6 µs

In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 15 µs, sys: 0 ns, total: 15 µs
Wall time: 18.8 µs

PR:

In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 29 µs, sys: 23 µs, total: 52 µs
Wall time: 56 µs

In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC")
CPU times: user 16 µs, sys: 0 ns, total: 16 µs
Wall time: 19.8 µs

ASV for the timeseries, timestamps, and offsets

       before           after         ratio
     [6b3490f4]       [982c169a]
-         143±9ms          127±4ms     0.89  
timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-     2.73±0.05μs      2.45±0.02μs     0.90  timestamp.TimestampOps.time_tz_convert(tzutc())

Line-profiling reveals that basically all the time is spent on the timezone check.


In [5]: %lprun -s -f DatetimeTZDtype.__init__ DatetimeTZDtype(tz='utc')
Timer unit: 1e-06 s

Total time: 4e-05 s
File: /Users/taugspurger/sandbox/pandas/pandas/core/dtypes/dtypes.py
Function: __init__ at line 494

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   494                                               def __init__(self, unit="ns", tz=None):
   495                                                   """
   496                                                   An ExtensionDtype for timezone-aware datetime data.
   497
   498                                                   Parameters
   499                                                   ----------
   500                                                   unit : str, default "ns"
   501                                                       The precision of the datetime data. Currently limited
   502                                                       to ``"ns"``.
   503                                                   tz : str, int, or datetime.tzinfo
   504                                                       The timezone.
   505
   506                                                   Raises
   507                                                   ------
   508                                                   pytz.UnknownTimeZoneError
   509                                                       When the requested timezone cannot be found.
   510
   511                                                   Examples
   512                                                   --------
   513                                                   >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='UTC')
   514                                                   datetime64[ns, UTC]
   515
   516                                                   >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='dateutil/US/Central')
   517                                                   datetime64[ns, tzfile('/usr/share/zoneinfo/US/Central')]
   518                                                   """
   519         1          7.0      7.0     17.5          if isinstance(unit, DatetimeTZDtype):
   520                                                       unit, tz = unit.unit, unit.tz
   521
   522         1          1.0      1.0      2.5          if unit != 'ns':
   523                                                       raise ValueError("DatetimeTZDtype only supports ns units")
   524
   525         1          0.0      0.0      0.0          if tz:
   526         1         27.0     27.0     67.5              tz = timezones.maybe_get_tz(tz)
   527                                                   elif tz is not None:
   528                                                       raise pytz.UnknownTimeZoneError(tz)
   529                                                   elif tz is None:
   530                                                       raise TypeError("A 'tz' is required.")
   531
   532         1          4.0      4.0     10.0          self._unit = unit
   533         1          1.0      1.0      2.5          self._tz = tz

I switched the properties to cache_readonly (which we can do, since we aren't using _cache for caching instances now).

jreback · 2018-11-29T12:53:12Z

pandas/core/dtypes/dtypes.py

    _metadata = ('unit', 'tz')
    _match = re.compile(r"(datetime64|M8)\[(?P<unit>.+), (?P<tz>.+)\]")
    _cache = {}
+    # TODO: restore caching? who cares though? It seems needlessly complex.
+    # np.dtype('datetime64[ns]') isn't a singleton


this is a huge performance penalty w/o caching

Did you see the perf numbers I posted in
#23990 (comment)? It seems to be slightly faster without caching (though within noise).

try running a good part of the test suite. Its the repeated consruction that's a problem, not the single contruction which is fine. W/o caching you end up creating a huge number of these

guess could remove the comment now

pandas/core/dtypes/dtypes.py

TomAugspurger · 2018-11-29T13:18:32Z

Are you worried about memory usage? The time to create a new one from scratch is identical or faster than looking it up from the cache.

…

On Thu, Nov 29, 2018 at 7:06 AM Jeff Reback ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/dtypes/dtypes.py <#23990 (comment)>: > _metadata = ('unit', 'tz') _match = re.compile(r"(datetime64|M8)\[(?P<unit>.+), (?P<tz>.+)\]") _cache = {} + # TODO: restore caching? who cares though? It seems needlessly complex. + # np.dtype('datetime64[ns]') isn't a singleton try running a good part of the test suite. Its the repeated consruction that's a problem, not the single contruction which is fine. W/o caching you end up creating a huge number of these — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#23990 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIuydvXMpt0620svkG4I2c9VNuJfMks5uz9vOgaJpZM4Y42qg> .

jreback · 2018-11-29T13:44:01Z

maybe things changed since I created this originally, but w/o caching perf was not great.

* Use pandas_dtype * removed cache_readonly

codecov · 2018-11-29T15:45:14Z

Codecov Report

Merging #23990 into master will increase coverage by 49.87%.
The diff coverage is 97.95%.

@@             Coverage Diff             @@
##           master   #23990       +/-   ##
===========================================
+ Coverage   42.38%   92.25%   +49.87%     
===========================================
  Files         161      161               
  Lines       51701    51689       -12     
===========================================
+ Hits        21914    47687    +25773     
+ Misses      29787     4002    -25785

Flag	Coverage Δ
#multiple	`90.66% <97.95%> (?)`
#single	`42.51% <63.26%> (+0.12%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/common.py	`95.62% <ø> (+25.23%)`	⬆️
pandas/core/arrays/datetimelike.py	`96.34% <100%> (+45.91%)`	⬆️
pandas/core/arrays/datetimes.py	`98.43% <100%> (+34.63%)`	⬆️
pandas/core/internals/blocks.py	`93.69% <100%> (+41.32%)`	⬆️
pandas/core/dtypes/missing.py	`93.1% <100%> (+35.63%)`	⬆️
pandas/core/dtypes/dtypes.py	`95.33% <97.14%> (+19.17%)`	⬆️
pandas/core/computation/pytables.py	`92.37% <0%> (+0.3%)`	⬆️
pandas/io/pytables.py	`92.3% <0%> (+0.92%)`	⬆️
pandas/util/_test_decorators.py	`93.24% <0%> (+4.05%)`	⬆️
pandas/compat/__init__.py	`58.36% <0%> (+8.17%)`	⬆️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08395af...5cde369. Read the comment docs.

pandas/core/dtypes/common.py

pandas/core/dtypes/dtypes.py

TomAugspurger · 2018-11-29T16:35:08Z

I've removed _coerce_to_dtype.

jreback · 2018-11-29T16:41:07Z

dont you need to remove the _coerce_to_dtype tests?

TomAugspurger · 2018-11-29T19:01:42Z

7ab2a74 removed all the _coerce_to_dtype tests.

TomAugspurger · 2018-12-03T13:10:19Z

Do we typically add release notes for something like this (deprecations that aren't part of the public API, but are possibly in use downstream)?

TomAugspurger · 2018-12-03T13:11:28Z

Added to the deprecation log.

jreback · 2018-12-03T13:14:44Z

Do we typically add release notes for something like this (deprecations that aren't part of the public API, but are possibly in use downstream)?

this IS certainly part of the public API as DatetimeTZDtype is public.

jreback · 2018-12-03T13:15:21Z

I know pyarrow uses this at the very least. (I don't know if the 2 form of the construction is used though). cc @wesm

TomAugspurger · 2018-12-03T13:26:06Z

Ah, I didn't realize it was exported in api.types, it's not in the API docs. I'll add it to api.rst at the top-level (as of #23581)

TomAugspurger · 2018-12-03T13:30:03Z

Added a release note. Going to followup on api.rst later, since we don't have an array section for datetime array there yet.

jreback · 2018-12-03T13:31:53Z

thanks @TomAugspurger lgtm. let's merge on green. can you add a ref into #6581

TomAugspurger · 2018-12-03T14:44:22Z

@datapythonista any idea what's up with the azure CI? https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597. Seems to have hit https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4596 too.

datapythonista · 2018-12-03T14:48:27Z

Haven't seen those errors before, I guess something is failing in their side. @davidstaheli do you mind taking a look at https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597

jreback · 2018-12-03T15:01:43Z

@TomAugspurger needs a rebase as well

datapythonista · 2018-12-03T15:40:31Z

@TomAugspurger I was trying to create an incident for azure, but the url does some weird redirect with the language and it crashes. Not sure if it's Linux specific, can you check if the "Create incident" button works for you: https://azure.microsoft.com/en-us/support/create-ticket/ (the Basic technical support one)

Or @vtbassmatt, may be you can help with the above? See the error in https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=4597, we are getting that often

datapythonista · 2018-12-03T16:03:33Z

I was checking, and numba is getting the same error from azure. I guess something is down for the whole azure-pipelines.

https://dev.azure.com/numba/numba/_build/results?buildId=471

vtbassmatt · 2018-12-03T21:54:32Z

I suspect it's a little bit of abuse fighting that was turned up a little too high. I think we've addressed it now.

jreback · 2018-12-03T23:54:09Z

thanks @TomAugspurger

REF/API: DatetimeTZDtype

1ca7fa4

* Remove magic constructor from string * Remove Caching The remaining changes in the DatetimeArray PR will be to 1. Inherit from ExtensionDtype 2. Implement construct_array_type 3. Register

TomAugspurger added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype labels Nov 29, 2018

TomAugspurger added this to the 0.24.0 milestone Nov 29, 2018

unxfail test, remove caching bit

2fa4bb0

TomAugspurger commented Nov 29, 2018

View reviewed changes

pandas/tests/dtypes/test_dtypes.py Show resolved Hide resolved

Restore construct_array_type

7e6d8ea

TomAugspurger closed this Nov 29, 2018

TomAugspurger reopened this Nov 29, 2018

TomAugspurger added 2 commits November 29, 2018 06:14

cache readonly

9e4faf8

Merge remote-tracking branch 'upstream/master' into dtype-only

ad2723c

jreback requested changes Nov 29, 2018

View reviewed changes

TomAugspurger added 2 commits November 29, 2018 08:18

Updates

e0b7b77

* Use pandas_dtype * removed cache_readonly

Fixed tz name

6cc9ce5

jreback reviewed Nov 29, 2018

View reviewed changes

pandas/core/dtypes/common.py Outdated Show resolved Hide resolved

jreback reviewed Nov 29, 2018

View reviewed changes

pandas/core/dtypes/dtypes.py Outdated Show resolved Hide resolved

TomAugspurger added 3 commits November 29, 2018 10:16

Remove _coerce_to_dtype

7ab2a74

fix unpickling

c14b45f

refactor construct_from_string

10d2c8a

PeriodDtype needs freq

50e1aeb

Deprecate passing alias to unit

22699f1

jsexauer mentioned this pull request Dec 3, 2018

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

Added release note for DatetimeTZDtype.

d89a6cc

jreback approved these changes Dec 3, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into dtype-only

c82999d

Merge remote-tracking branch 'upstream/master' into dtype-only

a9b929a

TomAugspurger mentioned this pull request Dec 3, 2018

Implement DatetimeArray._from_sequence #24074

Merged

try ci

5cde369

jreback merged commit 3fe697f into pandas-dev:master Dec 3, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/API: DatetimeTZDtype (pandas-dev#23990)

931d885

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/API: DatetimeTZDtype (pandas-dev#23990)

2f09761

dargueta mentioned this pull request Mar 25, 2019

astype('datetime64[xx, UTC]') crashes if not nanoseconds #25869

Closed

jreback mentioned this pull request Nov 29, 2019

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF/API: DatetimeTZDtype #23990

REF/API: DatetimeTZDtype #23990

TomAugspurger commented Nov 29, 2018

pep8speaks commented Nov 29, 2018

jreback commented Nov 29, 2018

TomAugspurger commented Nov 29, 2018 •

edited

Loading

jreback Nov 29, 2018

TomAugspurger Nov 29, 2018

jreback Nov 29, 2018

jreback Nov 29, 2018

TomAugspurger commented Nov 29, 2018 via email

jreback commented Nov 29, 2018

codecov bot commented Nov 29, 2018 •

edited

Loading

TomAugspurger commented Nov 29, 2018

jreback commented Nov 29, 2018

TomAugspurger commented Nov 29, 2018

TomAugspurger commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

datapythonista commented Dec 3, 2018

vtbassmatt commented Dec 3, 2018

jreback commented Dec 3, 2018

REF/API: DatetimeTZDtype #23990

REF/API: DatetimeTZDtype #23990

Conversation

TomAugspurger commented Nov 29, 2018

pep8speaks commented Nov 29, 2018

jreback commented Nov 29, 2018

TomAugspurger commented Nov 29, 2018 • edited Loading

jreback Nov 29, 2018

Choose a reason for hiding this comment

TomAugspurger Nov 29, 2018

Choose a reason for hiding this comment

jreback Nov 29, 2018

Choose a reason for hiding this comment

jreback Nov 29, 2018

Choose a reason for hiding this comment

TomAugspurger commented Nov 29, 2018 via email

jreback commented Nov 29, 2018

codecov bot commented Nov 29, 2018 • edited Loading

Codecov Report

TomAugspurger commented Nov 29, 2018

jreback commented Nov 29, 2018

TomAugspurger commented Nov 29, 2018

TomAugspurger commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

datapythonista commented Dec 3, 2018

vtbassmatt commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Nov 29, 2018 •

edited

Loading

codecov bot commented Nov 29, 2018 •

edited

Loading