Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Datetime/Timestamp.normalize for timezone naive datetimes #23634

Merged
merged 19 commits into from Nov 18, 2018

Conversation

Projects
None yet
5 participants
@mroeschke
Copy link
Member

commented Nov 12, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
       before           after         ratio
     [bb32564c]       [ff171f75]
-         281±2μs          197±2μs     0.70  timeseries.DatetimeIndex.time_normalize('dst')
-     4.26±0.05ms      2.64±0.04ms     0.62  timeseries.DatetimeAccessor.time_dt_accessor_normalize
-     3.78±0.01ms      2.28±0.01ms     0.60  timeseries.DatetimeIndex.time_normalize('repeated')
-     3.79±0.05ms      2.23±0.06ms     0.59  timeseries.DatetimeIndex.time_normalize('tz_naive')


       before           after         ratio
     [bb32564c]       [ff171f75]
-      11.9±0.9μs         1.56±0μs     0.13  timestamp.TimestampOps.time_normalize(None)
@pep8speaks

This comment has been minimized.

Copy link

commented Nov 12, 2018

Hello @mroeschke! Thanks for submitting the PR.

Matt Roeschke
if tz is not None:
tz = maybe_get_tz(tz)
result = _normalize_local(stamps, tz)
else:

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 12, 2018

Member

Is this case never reached?

@codecov

This comment has been minimized.

Copy link

commented Nov 12, 2018

Codecov Report

Merging #23634 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23634      +/-   ##
==========================================
+ Coverage   92.23%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51408    51414       +6     
==========================================
+ Hits        47416    47422       +6     
  Misses       3992     3992
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.29% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.5% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24bce1a...fb11dcf. Read the comment docs.

Matt Roeschke
@@ -40,6 +40,7 @@ from timezones cimport (
# Constants
_zero_time = datetime_time(0, 0)
_no_input = object()
cdef int64_t DAY_NS = 86400000000000

This comment has been minimized.

Copy link
@jreback

jreback Nov 12, 2018

Contributor

we have DAY_NS defined in lots of places, can you move to 1

(bamboo-dev) jreback@dev:~/pandas-dev$ grep -r 86400 pandas/_libs/ --include '*.pyx'
pandas/_libs/tslibs/period.pyx:        {1, 24, 1440, 86400, 86400000, 86400000000, 86400000000000},
pandas/_libs/tslibs/period.pyx:        seconds = unix_date * 86400 + dts.hour * 3600 + dts.min * 60 + dts.sec
pandas/_libs/tslibs/period.pyx:        abstime += 86400
pandas/_libs/tslibs/period.pyx:    while abstime >= 86400:
pandas/_libs/tslibs/period.pyx:        abstime -= 86400
pandas/_libs/tslibs/period.pyx:    # abstime >= 0.0 and abstime <= 86400
pandas/_libs/tslibs/conversion.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx:        m = 1000000000L * 86400 * 7
pandas/_libs/tslibs/timedeltas.pyx:        m = 1000000000L * 86400
pandas/_libs/tslibs/timedeltas.pyx:        86400000000042
pandas/_libs/tslibs/fields.pyx:    micros = np.mod(dtindex, 86400000000000, dtype=np.int64) // 1000LL
pandas/_libs/tslibs/src/datetime/np_datetime.c:    npy_int64 DAY_NS = 86400000000000LL;

prob should be in np_datetime.pyx (and import from there)

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

can you move to the same place you have DAY_SECONDS

# --------------------------------------------------------------
# Timestamp.normalize

@pytest.mark.parametrize('arg', ['2013-11-30', '2013-11-30 12:00:00'])

This comment has been minimized.

Copy link
@jreback

jreback Nov 12, 2018

Contributor

is there a normalize_nat test as well?

This comment has been minimized.

Copy link
@mroeschke

mroeschke Nov 12, 2018

Author Member

We don't define normalize for NaT.

This comment has been minimized.

Copy link
@mroeschke

mroeschke Nov 12, 2018

Author Member

We could have one for Timstamp mirroring (another issue). Probably would just return NaT

@jreback jreback added this to the 0.24.0 milestone Nov 12, 2018

@mroeschke

This comment has been minimized.

Copy link
Member Author

commented Nov 13, 2018

@jreback gathered some of the DAY_NS usage into a constant in np_datetime.pyx and all green.

@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat
# ----------------------------------------------------------------------
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

should prob move this one too (future ok)

This comment has been minimized.

Copy link
@jreback

jreback Nov 16, 2018

Contributor

can you move this one as well

@@ -22,7 +22,8 @@ from np_datetime cimport (check_dts_bounds,
npy_datetime,
dt64_to_dtstruct, dtstruct_to_dt64,
get_datetime64_unit, get_datetime64_value,
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns)
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns,

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

what is DAY_S, you mean DAY_NS right? let's write out these constants.

# ----------------------------------------------------------------------
# time constants

cdef int64_t DAY_S = 86400

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

let's write this out to DAY_SECONDS

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 14, 2018

Member

I think the place for these may be ccalendar

@@ -40,6 +40,7 @@ from timezones cimport (
# Constants
_zero_time = datetime_time(0, 0)
_no_input = object()
cdef int64_t DAY_NS = 86400000000000

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

can you move to the same place you have DAY_SECONDS

@@ -832,7 +832,14 @@ def normalize(self):
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
"""
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz)
if self.tz is None:
not_null = self.notnull()

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

same

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 14, 2018

Member

should this be notna? (does DatetimeArray even have notna or notnull?)

This comment has been minimized.

Copy link
@mroeschke

mroeschke Nov 14, 2018

Author Member

It (DatetimeIndex) apparently has notnull, but not sure if i should be using notna or notnull

This comment has been minimized.

Copy link
@jreback

jreback Nov 17, 2018

Contributor

can you use notna

@mroeschke

This comment has been minimized.

Copy link
Member Author

commented Nov 16, 2018

All green after some flaky azure tests.

@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat
# ----------------------------------------------------------------------
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000

This comment has been minimized.

Copy link
@jreback

jreback Nov 16, 2018

Contributor

can you move this one as well

Matt Roeschke
@jreback
Copy link
Contributor

left a comment

tiny change. ping on green.

@@ -832,7 +832,14 @@ def normalize(self):
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
"""
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz)
if self.tz is None:
not_null = self.notnull()

This comment has been minimized.

Copy link
@jreback

jreback Nov 17, 2018

Contributor

can you use notna

Matt Roeschke added some commits Nov 18, 2018

@mroeschke

This comment has been minimized.

Copy link
Member Author

commented Nov 18, 2018

Ping all green @jreback

@jreback jreback merged commit 3702de2 into pandas-dev:master Nov 18, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181118.6 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the mroeschke:normalize_performance branch Nov 18, 2018

thoo added a commit to thoo/pandas that referenced this pull request Nov 19, 2018

Merge remote-tracking branch 'upstream/master' into io_csv_docstring_…
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...

@mroeschke mroeschke referenced this pull request Dec 23, 2018

Merged

CLN: tslibs imports and unused variables #24401

1 of 1 task complete

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.