New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start porting offsets to cython #17830

Merged
merged 10 commits into from Oct 28, 2017

Conversation

Projects
None yet
4 participants
@jbrockmendel
Member

jbrockmendel commented Oct 10, 2017

This is mostly cut/paste, avoiding moving any of the classes for the time being. There is a small change in _to_dt64 that will be described in a comment below.

Defined dtstrings for flake8 cleanup in setup.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 13, 2017

CircleCI test errors look unrelated. Are they showing up elsewhere?

@codecov

This comment has been minimized.

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17830 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17830      +/-   ##
==========================================
- Coverage   91.23%   91.21%   -0.03%     
==========================================
  Files         163      163              
  Lines       50105    49909     -196     
==========================================
- Hits        45715    45523     -192     
+ Misses       4390     4386       -4
Flag Coverage Δ
#multiple 89.01% <100%> (-0.02%) ⬇️
#single 40.2% <60%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.15% <100%> (-0.01%) ⬇️
pandas/tseries/frequencies.py 96% <100%> (-0.11%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/common.py 91.42% <0%> (-1.39%) ⬇️
pandas/core/dtypes/concat.py 98.26% <0%> (-0.87%) ⬇️
pandas/core/dtypes/missing.py 90.44% <0%> (-0.18%) ⬇️
pandas/core/generic.py 92.03% <0%> (-0.17%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.11%) ⬇️
pandas/core/series.py 94.89% <0%> (-0.1%) ⬇️
pandas/core/internals.py 94.38% <0%> (-0.07%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bf7f9a...6118488. Read the comment docs.

@codecov

This comment has been minimized.

codecov bot commented Oct 17, 2017

Codecov Report

Merging #17830 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17830      +/-   ##
==========================================
- Coverage   91.24%   91.22%   -0.03%     
==========================================
  Files         163      163              
  Lines       50168    50083      -85     
==========================================
- Hits        45777    45686      -91     
- Misses       4391     4397       +6
Flag Coverage Δ
#multiple 89.03% <100%> (-0.02%) ⬇️
#single 40.23% <60%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 96% <100%> (-0.11%) ⬇️
pandas/tseries/offsets.py 97.15% <100%> (-0.01%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.5% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4489389...ced2c8d. Read the comment docs.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 18, 2017

This can be cut down if the diff is too large for easy review.

Looking forward to getting the dtstrings alias in setup.py --> fix a bunch of flake8 warnings.

@@ -312,7 +312,7 @@ def _get_freq_str(base, mult=1):
# ---------------------------------------------------------------------
# Offset names ("time rules") and related functions
from pandas._libs.tslibs.offsets import _offset_to_period_map

This comment has been minimized.

@jreback

jreback Oct 27, 2017

Contributor

can de-privatize later

@@ -14,6 +14,13 @@
from pandas._libs import tslib, Timestamp, OutOfBoundsDatetime, Timedelta
from pandas.util._decorators import cache_readonly
from pandas._libs.tslib import _delta_to_nanoseconds
from pandas._libs.tslibs.offsets import ( # noqa
ApplyTypeError, CacheableOffset,

This comment has been minimized.

@jreback

jreback Oct 27, 2017

Contributor

you shouldn't need the noqa, that means you are importing extra things that are not needed. IOW some of these are just used internally (now in offsets.pyx)

This comment has been minimized.

@jbrockmendel

jbrockmendel Oct 27, 2017

Member

The two unused imports are WeekDay and CacheableOffset. They're both used in the tests (though CacheableOffset not really). I'll update the imports and get rid of the noqa.

dt.microsecond != 0 or getattr(dt, 'nanosecond', 0) != 0):
return False
return True
# ---------------------------------------------------------------------
# DateOffset

This comment has been minimized.

@jreback

jreback Oct 27, 2017

Contributor

can you add a note in other api changes, that CacheableOffset & Weekday are no longer public (prob not ever used publicly but who knowns).

cimport numpy as np
np.import_array()

This comment has been minimized.

@jreback

jreback Oct 27, 2017

Contributor

ok with de-privatizing things internally here (can be later as well)

pass
# TODO: unused. remove?

This comment has been minimized.

@jreback

jreback Oct 27, 2017

Contributor

is this used?

This comment has been minimized.

@jbrockmendel

jbrockmendel Oct 28, 2017

Member

No. This came up recently in #17914.

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 27, 2017

also pls run the perf for timeseries & offsets and report any issues. should slightly speed things up. be careful with the cacheable stuff though.

@pep8speaks

This comment has been minimized.

pep8speaks commented Oct 28, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 28, 2017 at 06:04 Hours UTC
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 28, 2017

asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries
[...]
       before           after         ratio
     [3ba2cfff]       [92d7e290]
+        544±30ms         665±50ms     1.22  timeseries.ToDatetime.time_format_no_exact
+     16.6±0.02μs      19.6±0.09μs     1.18  timeseries.AsOf.time_asof_single_early
-     3.75±0.07ms         3.38±0ms     0.90  timeseries.DatetimeIndex.time_normalize
-        449±20ms        400±0.6ms     0.89  timeseries.SemiMonthOffset.time_end_decr_rng
-        20.7±3μs      18.5±0.07μs     0.89  timeseries.Offsets.time_timeseries_day_apply
-        17.9±3μs       15.7±0.1μs     0.88  timeseries.Offsets.time_custom_bday_apply
-      3.70±0.2ms       3.05±0.1ms     0.82  timeseries.AsOf.time_asof_nan
-        29.1±7μs      23.2±0.05μs     0.80  timeseries.Offsets.time_custom_bday_cal_incr
-         141±2μs       112±0.02μs     0.80  timeseries.DatetimeIndex.time_unique
-        604±10μs        473±0.6μs     0.78  timeseries.DatetimeIndex.time_reset_index_tz
-         469±5μs        362±0.2μs     0.77  timeseries.DatetimeIndex.time_reset_index
-       429±0.8μs        331±0.8μs     0.77  timeseries.DatetimeIndex.time_timeseries_is_month_start
-      12.2±0.7ms      9.18±0.02ms     0.75  timeseries.ResampleSeries.time_period_downsample_mean
-       199±0.9μs          149±2μs     0.75  timeseries.Offsets.time_custom_bmonthend_incr
-        224±10ms         164±10ms     0.73  timeseries.ToDatetime.time_iso8601_tz_spaceformat
-      3.12±0.2ms         2.20±0ms     0.71  timeseries.ResampleDataFrame.time_min_string
-        2.86±0ms      2.01±0.05ms     0.70  timeseries.ResampleDataFrame.time_mean_numpy
-      8.48±0.5μs      5.88±0.01μs     0.69  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-        44.1±3μs       30.3±0.2μs     0.69  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
-        41.0±1μs       28.2±0.1μs     0.69  timeseries.Offsets.time_custom_bday_cal_decr
-     6.12±0.05ms      4.19±0.05ms     0.68  timeseries.ResampleSeries.time_timestamp_downsample_mean
-           2.37s            1.57s     0.66  timeseries.Iteration.time_iter_periodindex
-       237±0.4μs        154±0.2μs     0.65  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-     8.34±0.01ms      5.40±0.08ms     0.65  timeseries.DatetimeIndex.time_infer_freq_daily
-        608±20ms        388±0.4ms     0.64  timeseries.SeriesArithmetic.time_add_offset_slow
-      43.5±0.8μs       27.6±0.1μs     0.63  timeseries.Offsets.time_custom_bday_decr
-        23.3±3ms      14.6±0.09ms     0.63  timeseries.Iteration.time_iter_periodindex_preexit
-        717±50ms        446±0.7ms     0.62  timeseries.Iteration.time_iter_datetimeindex
-       85.5±10μs       52.5±0.4μs     0.61  timeseries.SemiMonthOffset.time_begin_decr
-        87.3±3μs      53.2±0.05μs     0.61  timeseries.SemiMonthOffset.time_begin_decr_n
-        753±50ms        419±0.6ms     0.56  timeseries.SemiMonthOffset.time_begin_apply_index
-       769±0.6ms        398±0.9ms     0.52  timeseries.SemiMonthOffset.time_begin_incr_rng
-      16.9±0.1ms      8.40±0.01ms     0.50  timeseries.Iteration.time_iter_datetimeindex_preexit
-     5.31±0.02ms         2.51±0ms     0.47  timeseries.AsOf.time_asof_nan_single
-        248±60ms       8.65±0.2ms     0.03  timeseries.SeriesArithmetic.time_add_offset_fast
-        307±40ms      3.81±0.01ms     0.01  timeseries.SeriesArithmetic.time_add_offset_delta

About to re-run with CPU affinity to be on the safe side.

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 28, 2017

what were these doing to be so before?

  •    248±60ms       8.65±0.2ms     0.03  timeseries.SeriesArithmetic.time_add_offset_fast
    
  •    307±40ms      3.81±0.01ms     0.01  timeseries.SeriesArithmetic.time_add_offset_delta
    
@jreback

This comment has been minimized.

Contributor

jreback commented Oct 28, 2017

@jbrockmendel pls always always rebase on master before you run a benchmark. this is against a commit from 10/9. Pretty much anytime you do anything you should rebase.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 28, 2017

Woops, good catch. Re-running now.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 28, 2017

taskset 8 asv continuous -E virtualenv -f 1.1 master HEAD -b timeseries
[...]
       before           after         ratio
     [1977362f]       [92d7e290]
+      27.6±0.1μs       32.0±0.2μs     1.16  timeseries.Offsets.time_custom_bday_cal_decr
-     8.95±0.03μs      8.07±0.05μs     0.90  timeseries.Offsets.time_timeseries_year_apply
-     11.4±0.06μs      10.2±0.06μs     0.90  timeseries.Offsets.time_timeseries_year_incr
-         175±1μs        153±0.7μs     0.87  timeseries.Offsets.time_custom_bmonthbegin_incr_n
-     6.80±0.03μs         5.84±0μs     0.86  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-        109±20ms           40.9ms     0.37  timeseries.DatetimeIndex.time_dti_factorize
-      105±0.04ms         36.0±3ms     0.34  timeseries.DatetimeIndex.time_dti_tz_factorize
-           1.31s          299±3ms     0.23  timeseries.AsOfDataFrame.time_asof_nan
-           1.99s          290±3ms     0.15  timeseries.AsOfDataFrame.time_asof

(re-running)

The most commonly-called function that got moved to cython is _is_normalized, which gets called in onOffset, which in turn gets called in FooOffset.apply and offsets.generate_range. I'm not sure about that factorize and asofs, but that would explain (all but one of) the Offsets timings here.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 28, 2017

Same results again. I'll dig into the custom_bday_apply to see what the deal is.

@jreback jreback added this to the 0.22.0 milestone Oct 28, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 28, 2017

ok this looks fine. ping after perf review (though that looks like a minor issue).

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 28, 2017

Using cProfile I get a tiny speedup with the new version. With %timeit I get results similar to asv but a narrower difference. This is a mystery I'm OK with, given that follow-ons are going to optimize the tar out of these.

@jreback jreback merged commit b2d0d1b into pandas-dev:master Oct 28, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jbrockmendel jbrockmendel deleted the jbrockmendel:tslibs-offsets2 branch Oct 28, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Oct 28, 2017

thanks @jbrockmendel

Using cProfile I get a tiny speedup with the new version. With %timeit I get results similar to asv but a narrower difference. This is a mystery I'm OK with, given that follow-ons are going to optimize the tar out of these.

I wouldn't have expected much as these are mostly used in scalar type things which are 1-offs; and these are not typed much either (so that may help)

wooyekim added a commit to wooyekim/pandas that referenced this pull request Oct 29, 2017

Dobatymo added a commit to Dobatymo/pandas that referenced this pull request Oct 30, 2017

peterpanmj added a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 10, 2017

@jbrockmendel Is there a reason that CacheableOffset and WeekDay are no longer public in tseries.offsets ?
They are not useful? For those using it, what is the alternative?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 10, 2017

WeekDay is just an enum map and CacheableOffest is not usable on its own
both of these are private classes

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 10, 2017

OK (note they are not private in the sense they are non-underscored classes in a module that is publicly exposed, but indeed seems like deprecation is not really needed in this case)

No-Stream added a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment