Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Datetime/Timestamp.normalize for timezone naive datetimes #23634

Merged
merged 19 commits into from
Nov 18, 2018

Conversation

mroeschke
Copy link
Member

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
       before           after         ratio
     [bb32564c]       [ff171f75]
-         281±2μs          197±2μs     0.70  timeseries.DatetimeIndex.time_normalize('dst')
-     4.26±0.05ms      2.64±0.04ms     0.62  timeseries.DatetimeAccessor.time_dt_accessor_normalize
-     3.78±0.01ms      2.28±0.01ms     0.60  timeseries.DatetimeIndex.time_normalize('repeated')
-     3.79±0.05ms      2.23±0.06ms     0.59  timeseries.DatetimeIndex.time_normalize('tz_naive')


       before           after         ratio
     [bb32564c]       [ff171f75]
-      11.9±0.9μs         1.56±0μs     0.13  timestamp.TimestampOps.time_normalize(None)

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

if tz is not None:
tz = maybe_get_tz(tz)
result = _normalize_local(stamps, tz)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case never reached?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Nov 12, 2018

Codecov Report

Merging #23634 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23634      +/-   ##
==========================================
+ Coverage   92.23%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51408    51414       +6     
==========================================
+ Hits        47416    47422       +6     
  Misses       3992     3992
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.29% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.5% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24bce1a...fb11dcf. Read the comment docs.

@gfyoung gfyoung added Datetime Datetime data dtype Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation Clean labels Nov 12, 2018
@@ -40,6 +40,7 @@ from timezones cimport (
# Constants
_zero_time = datetime_time(0, 0)
_no_input = object()
cdef int64_t DAY_NS = 86400000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have DAY_NS defined in lots of places, can you move to 1

(bamboo-dev) jreback@dev:~/pandas-dev$ grep -r 86400 pandas/_libs/ --include '*.pyx'
pandas/_libs/tslibs/period.pyx:        {1, 24, 1440, 86400, 86400000, 86400000000, 86400000000000},
pandas/_libs/tslibs/period.pyx:        seconds = unix_date * 86400 + dts.hour * 3600 + dts.min * 60 + dts.sec
pandas/_libs/tslibs/period.pyx:        abstime += 86400
pandas/_libs/tslibs/period.pyx:    while abstime >= 86400:
pandas/_libs/tslibs/period.pyx:        abstime -= 86400
pandas/_libs/tslibs/period.pyx:    # abstime >= 0.0 and abstime <= 86400
pandas/_libs/tslibs/conversion.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx:cdef int64_t DAY_NS = 86400000000000LL
pandas/_libs/tslibs/timedeltas.pyx:        m = 1000000000L * 86400 * 7
pandas/_libs/tslibs/timedeltas.pyx:        m = 1000000000L * 86400
pandas/_libs/tslibs/timedeltas.pyx:        86400000000042
pandas/_libs/tslibs/fields.pyx:    micros = np.mod(dtindex, 86400000000000, dtype=np.int64) // 1000LL
pandas/_libs/tslibs/src/datetime/np_datetime.c:    npy_int64 DAY_NS = 86400000000000LL;

prob should be in np_datetime.pyx (and import from there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to the same place you have DAY_SECONDS

# --------------------------------------------------------------
# Timestamp.normalize

@pytest.mark.parametrize('arg', ['2013-11-30', '2013-11-30 12:00:00'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a normalize_nat test as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't define normalize for NaT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have one for Timstamp mirroring (another issue). Probably would just return NaT

@jreback jreback added this to the 0.24.0 milestone Nov 12, 2018
@mroeschke
Copy link
Member Author

@jreback gathered some of the DAY_NS usage into a constant in np_datetime.pyx and all green.

@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat
# ----------------------------------------------------------------------
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should prob move this one too (future ok)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this one as well

@@ -22,7 +22,8 @@ from np_datetime cimport (check_dts_bounds,
npy_datetime,
dt64_to_dtstruct, dtstruct_to_dt64,
get_datetime64_unit, get_datetime64_value,
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns)
pydatetime_to_dt64, NPY_DATETIMEUNIT, NPY_FR_ns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is DAY_S, you mean DAY_NS right? let's write out these constants.

# ----------------------------------------------------------------------
# time constants

cdef int64_t DAY_S = 86400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's write this out to DAY_SECONDS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the place for these may be ccalendar

@@ -40,6 +40,7 @@ from timezones cimport (
# Constants
_zero_time = datetime_time(0, 0)
_no_input = object()
cdef int64_t DAY_NS = 86400000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to the same place you have DAY_SECONDS

@@ -832,7 +832,14 @@ def normalize(self):
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
"""
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz)
if self.tz is None:
not_null = self.notnull()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be notna? (does DatetimeArray even have notna or notnull?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It (DatetimeIndex) apparently has notnull, but not sure if i should be using notna or notnull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use notna

@mroeschke
Copy link
Member Author

All green after some flaky azure tests.

@@ -41,7 +42,6 @@ from nattype cimport NPY_NAT, checknull_with_nat
# ----------------------------------------------------------------------
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this one as well

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny change. ping on green.

@@ -832,7 +832,14 @@ def normalize(self):
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
"""
new_values = conversion.normalize_i8_timestamps(self.asi8, self.tz)
if self.tz is None:
not_null = self.notnull()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use notna

@mroeschke
Copy link
Member Author

Ping all green @jreback

@jreback jreback merged commit 3702de2 into pandas-dev:master Nov 18, 2018
@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

thanks @mroeschke

@mroeschke mroeschke deleted the normalize_performance branch November 18, 2018 18:29
thoo added a commit to thoo/pandas that referenced this pull request Nov 19, 2018
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants