New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tslibs resolution #18034

Merged
merged 24 commits into from Nov 13, 2017

Conversation

Projects
None yet
4 participants
@jbrockmendel
Member

jbrockmendel commented Oct 30, 2017

One more step towards getting tseries.frequencies into cython (and getting rid of non-cython dependencies of tslib/period). Also moves towards de-coupling period from tslib.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@gfyoung

This comment has been minimized.

Member

gfyoung commented Oct 30, 2017

@jbrockmendel : I presume this is largely copy+paste? Also, a quick asv check might be good just to make sure that nothing slowed down.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 30, 2017

I presume this is largely copy+paste?

Yes.

Also, a quick asv check might be good just to make sure that nothing slowed down.

Will do.

@codecov

This comment has been minimized.

codecov bot commented Oct 30, 2017

Codecov Report

Merging #18034 into master will decrease coverage by 0.07%.
The diff coverage is 76.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18034      +/-   ##
==========================================
- Coverage   91.43%   91.36%   -0.08%     
==========================================
  Files         164      164              
  Lines       50090    49869     -221     
==========================================
- Hits        45798    45561     -237     
- Misses       4292     4308      +16
Flag Coverage Δ
#multiple 89.16% <76.92%> (-0.07%) ⬇️
#single 40.23% <53.84%> (-0.2%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 94.09% <100%> (-1.93%) ⬇️
pandas/core/indexes/period.py 92.9% <60%> (+0.01%) ⬆️
pandas/plotting/_converter.py 63.44% <66.66%> (-1.77%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.48% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e3ad63...7b374d3. Read the comment docs.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 31, 2017

taskset 3 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [6a945180]       [4b6080da]
+     42.3±0.07μs       47.1±0.3μs     1.12  timeseries.SemiMonthOffset.time_begin_apply
+        5.75±0μs      6.41±0.01μs     1.11  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-        1.10±0ms          997±1μs     0.91  timeseries.ResampleSeries.time_1min_5min_mean
-     20.7±0.05μs      18.7±0.05μs     0.90  timeseries.Offsets.time_custom_bday_apply_dt64
-         153±3μs        133±0.5μs     0.87  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-      33.9±0.3μs       28.6±0.1μs     0.84  timeseries.Offsets.time_custom_bday_cal_decr
-      6.15±0.5ms      5.19±0.01ms     0.84  timeseries.DatetimeIndex.time_infer_freq_daily
-        14.7±1ms         11.9±0ms     0.81  timeseries.DatetimeIndex.time_infer_freq_none
-     18.7±0.04μs      14.6±0.04μs     0.78  timeseries.Offsets.time_custom_bday_apply
-      16.3±0.8ms      10.6±0.04ms     0.65  timeseries.AsOfDataFrame.time_asof_single
-           140ms         40.4±1ms     0.29  timeseries.DatetimeIndex.time_dti_factorize
-         159±5ms         37.0±3ms     0.23  timeseries.DatetimeIndex.time_dti_tz_factorize
-           1.96s          270±7ms     0.14  timeseries.AsOfDataFrame.time_asof_nan
-           1.95s          265±4ms     0.14  timeseries.AsOfDataFrame.time_asof
@jreback

This comment has been minimized.

Contributor

jreback commented Oct 31, 2017

your benchmarks look odd

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 31, 2017

your benchmarks look odd

No argument there. I'll rebase and re-run.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 31, 2017

taskset 8 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [4578a039]       [58ced564]
+      28.4±0.1μs       32.1±0.9μs     1.13  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
+           132ms            148ms     1.12  timeseries.DatetimeIndex.time_dti_factorize
-       152±0.6μs        137±0.8μs     0.90  timeseries.Offsets.time_custom_bmonthend_incr
-     22.3±0.08μs      19.8±0.07μs     0.89  timeseries.Offsets.time_custom_bday_incr
-      18.9±0.1μs       16.0±0.1μs     0.85  timeseries.AsOf.time_asof_single_early
-           2.00s         424±40ms     0.21  timeseries.AsOfDataFrame.time_asof
-           2.02s          422±7ms     0.21  timeseries.AsOfDataFrame.time_asof_nan

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 31, 2017

Clipboard again, is there a game plan for this error?

from pandas._libs.tslib import Timedelta
from pandas._libs.tslibs.frequencies import ( # noqa
get_freq_code, _base_and_stride, _period_str_to_code,
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase,
_period_code_map, _reverse_period_code_map)
from pandas._libs.tslibs.resolution import ( # noqa

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

why the noqa?

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 1, 2017

Member

It's public-facing so I didn't want to break anything downstream. Remove unused imports?

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

hmm, what exactly is public facing of these? I am ok with simply removing anything _leading, what is not here?

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 1, 2017

Member

I think the noqa is for DAYS, _maybe_add_count, _weekday_rule_aliases, get_freq_group. I think the private ones here can be removed without breaking anything. DAYS and get_freq_group are imported by and a couple other places.

This comment has been minimized.

@jreback

jreback Nov 2, 2017

Contributor

ok let's remove any non-used, you can put the non-private ones in a whatsnew note

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 2, 2017

Member

Just pushed. Wasn't sure about wording in whatsnew note for the list DAYS.

return (freq // 1000) * 1000
class Resolution(object):

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

this is because this is not a cdef class; make a _Resolution class then provide the public interface via Resolution

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

alternately you can just move the constants here. note should this class be a Singleton pattern? do we even instantiate it?

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 1, 2017

Member

note should this class be a Singleton pattern?

Haven't given it any thought. Maybe the idea was to make it easily subclassable?

do we even instantiate it?

Nope, looks like everywhere it shows up its just Resolution.get_freq_group(...) or similar.

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

ok so it was for grouping things / sharing code.

maybe add to the list to either make this a Singleton, or better yet use instances of these.

This comment has been minimized.

@jbrockmendel
# Frequency Inference
def unique_deltas(ndarray[int64_t] arr):

This comment has been minimized.

@jreback

jreback Nov 1, 2017

Contributor

is this actually used elsewhere? if so should fix that, this should be a cdef method of _FrequencyInferer (or at the very lease cdef only)

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 1, 2017

Member

Nope, just in resolution. Will cdef for now.

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 1, 2017

Member

Related: there's an import of pandas.core.algorithms.unique. Do you know if there is a cython version of that we can use instead? The uses are all for fairly small integer indexes:

if len(unique(self.fields['M'])) > 1:
weekdays = unique(self.index.weekday)
week_of_months = unique((self.index.day - 1) // 7)

In fact, if these are all being called on Index objects, then the Int64Index.unique method might make sense.

@jbrockmendel jbrockmendel referenced this pull request Nov 1, 2017

Open

tslibs TODO list #17652

28 of 59 tasks complete
from pandas._libs.tslib import Timedelta
from pandas._libs.tslibs.frequencies import ( # noqa
get_freq_code, _base_and_stride, _period_str_to_code,
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase,
_period_code_map, _reverse_period_code_map)
from pandas._libs.tslibs.resolution import ( # noqa

This comment has been minimized.

@jreback

jreback Nov 2, 2017

Contributor

ok let's remove any non-used, you can put the non-private ones in a whatsnew note

cdef int RESO_HR = 5
cdef int RESO_DAY = 6
_ONE_MICRO = 1000L

This comment has been minimized.

@jreback

jreback Nov 2, 2017

Contributor

can be de-privatized (add to list or do here ok)

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 2, 2017

Member

to the list it goes

jbrockmendel added some commits Nov 2, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 3, 2017

rebase

jbrockmendel added some commits Nov 3, 2017

@pep8speaks

This comment has been minimized.

pep8speaks commented Nov 5, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 13, 2017 at 00:01 Hours UTC

@jbrockmendel jbrockmendel referenced this pull request Nov 6, 2017

Closed

Remove out-of-date numpy.pxd; remove unused float16_t #18101

0 of 4 tasks complete

jbrockmendel added some commits Nov 8, 2017

@@ -45,7 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`)
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` can now be accessed in `_libs.tslibs.resolution` (:issue:`18034`)

This comment has been minimized.

@jreback

jreback Nov 8, 2017

Contributor

I would just say they are removed from the private API, no need to point to a private file

@@ -13,7 +13,8 @@
from pandas._libs import tslib, period as libperiod
from pandas._libs.tslibs.parsing import DateParseError
from pandas import Period, Timestamp, offsets
from pandas.tseries.frequencies import DAYS, MONTHS
from pandas._libs.tslibs.resolution import DAYS
from pandas.tseries.frequencies import MONTHS

This comment has been minimized.

@jreback

jreback Nov 8, 2017

Contributor

why doesn't MONTHS simply exist in resolution?

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 9, 2017

Member

alias naming, will change

@@ -21,7 +21,8 @@
from pandas.core.base import SpecificationError, AbstractMethodError
from pandas.errors import UnsupportedFunctionCall
from pandas.core.groupby import DataError
from pandas.tseries.frequencies import MONTHS, DAYS
from pandas._libs.tslibs.resolution import DAYS
from pandas.tseries.frequencies import MONTHS

This comment has been minimized.

@jreback

jreback Nov 8, 2017

Contributor

same

assert frequencies.get_freq_group(offsets.Week()) == 4000
assert frequencies.get_freq_group(offsets.Week(weekday=1)) == 4000
assert frequencies.get_freq_group(offsets.Week(weekday=5)) == 4000
assert resolution.get_freq_group('A') == 1000

This comment has been minimized.

@jreback

jreback Nov 8, 2017

Contributor

screaming for parametization (later is ok)

@jbrockmendel jbrockmendel referenced this pull request Nov 10, 2017

Merged

Move frequencies functions to cython #17746

2 of 4 tasks complete
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Nov 11, 2017

If we want to finish the day with a win, this and #18086 are probably about ready.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

rebase this

@@ -45,7 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`)
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the private API (:issue:`18034`)

This comment has been minimized.

@jreback

jreback Nov 12, 2017

Contributor

private -> public

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

I think next to go in.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Nov 12, 2017

I think next to go in.

Awesome. I'll exercise some restraint and not immediately replace.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Nov 12, 2017

Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh

actually I merged a commit, which may fix this (but you rebase after). don't push again. let me review.

@jreback

small changes

@@ -45,6 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)

This comment has been minimized.

@jreback

jreback Nov 12, 2017

Contributor

these tags are not going to work, nor are these in the api.rst so they won't reference anything, just use tseries.frequencies.get_freq_group() etc

This comment has been minimized.

@jbrockmendel

jbrockmendel Nov 12, 2017

Member

OK. Does this take precedence over "don't push again"?

This comment has been minimized.

@jreback

jreback Nov 12, 2017

Contributor

no actually push again with this fix
everything else ok

# ----------------------------------------------------------------------
cpdef resolution(ndarray[int64_t] stamps, tz=None):
cdef:

This comment has been minimized.

@jreback

jreback Nov 12, 2017

Contributor

add to list to doc-string things

@jreback jreback added this to the 0.22.0 milestone Nov 12, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

ok ping on green.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

needs a rebase.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2017

note that if you actually rebase on top of master (as opposed to merge master), then this becomes really simple to see where you are. you prob want to make a single commit when you do this as the conflicts are easier to resolve. or you can use an ide. It is much much easier to see what is going on w/o all of these constant merges. I would highly recommend you follow this workflow. since you rebase fairly frequently this is actually pretty easy to do. you don't actually need to squash (though again would recommend having logical commits).

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 13, 2017

lgtm. ping on green.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Nov 13, 2017

Ping

@jreback jreback merged commit c3cfe90 into pandas-dev:master Nov 13, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Nov 13, 2017

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the jbrockmendel:tslibs-resolution branch Nov 17, 2017

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Nov 20, 2017

put a note - this is non performiant logic here (and duplicative) and this simply should call unique_1d directly
plus no reason to depend on khash directly;

Note: "here" refers to tslibs.resolution.unique_deltas.

So I tried using unique_1d directly and performance was hurt noticeably. Not a huge shock since unique_1d is in py-land. So as an experiment I took unique_1d, cut out a bunch of type-checking because the use case here is always a ndarray[int64_t], and implemented what was left directly in tslibs.resolution. The surprise was that performance was still noticeably worse than under the status quo.

So is there a chance that this is performant logic here after all?

No-Stream added a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment