Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tslibs resolution #18034

Merged
merged 24 commits into from
Nov 13, 2017
Merged

Tslibs resolution #18034

merged 24 commits into from
Nov 13, 2017

Conversation

jbrockmendel
Copy link
Member

One more step towards getting tseries.frequencies into cython (and getting rid of non-cython dependencies of tslib/period). Also moves towards de-coupling period from tslib.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Oct 30, 2017
@gfyoung
Copy link
Member

gfyoung commented Oct 30, 2017

@jbrockmendel : I presume this is largely copy+paste? Also, a quick asv check might be good just to make sure that nothing slowed down.

@jbrockmendel
Copy link
Member Author

I presume this is largely copy+paste?

Yes.

Also, a quick asv check might be good just to make sure that nothing slowed down.

Will do.

@codecov
Copy link

codecov bot commented Oct 30, 2017

Codecov Report

Merging #18034 into master will decrease coverage by 0.07%.
The diff coverage is 76.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18034      +/-   ##
==========================================
- Coverage   91.43%   91.36%   -0.08%     
==========================================
  Files         164      164              
  Lines       50090    49869     -221     
==========================================
- Hits        45798    45561     -237     
- Misses       4292     4308      +16
Flag Coverage Δ
#multiple 89.16% <76.92%> (-0.07%) ⬇️
#single 40.23% <53.84%> (-0.2%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 94.09% <100%> (-1.93%) ⬇️
pandas/core/indexes/period.py 92.9% <60%> (+0.01%) ⬆️
pandas/plotting/_converter.py 63.44% <66.66%> (-1.77%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.48% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e3ad63...7b374d3. Read the comment docs.

@jbrockmendel
Copy link
Member Author

taskset 3 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [6a945180]       [4b6080da]
+     42.3±0.07μs       47.1±0.3μs     1.12  timeseries.SemiMonthOffset.time_begin_apply
+        5.75±0μs      6.41±0.01μs     1.11  timeseries.DatetimeIndex.time_timestamp_tzinfo_cons
-        1.10±0ms          997±1μs     0.91  timeseries.ResampleSeries.time_1min_5min_mean
-     20.7±0.05μs      18.7±0.05μs     0.90  timeseries.Offsets.time_custom_bday_apply_dt64
-         153±3μs        133±0.5μs     0.87  timeseries.Offsets.time_custom_bmonthbegin_decr_n
-      33.9±0.3μs       28.6±0.1μs     0.84  timeseries.Offsets.time_custom_bday_cal_decr
-      6.15±0.5ms      5.19±0.01ms     0.84  timeseries.DatetimeIndex.time_infer_freq_daily
-        14.7±1ms         11.9±0ms     0.81  timeseries.DatetimeIndex.time_infer_freq_none
-     18.7±0.04μs      14.6±0.04μs     0.78  timeseries.Offsets.time_custom_bday_apply
-      16.3±0.8ms      10.6±0.04ms     0.65  timeseries.AsOfDataFrame.time_asof_single
-           140ms         40.4±1ms     0.29  timeseries.DatetimeIndex.time_dti_factorize
-         159±5ms         37.0±3ms     0.23  timeseries.DatetimeIndex.time_dti_tz_factorize
-           1.96s          270±7ms     0.14  timeseries.AsOfDataFrame.time_asof_nan
-           1.95s          265±4ms     0.14  timeseries.AsOfDataFrame.time_asof

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

your benchmarks look odd

@jbrockmendel
Copy link
Member Author

your benchmarks look odd

No argument there. I'll rebase and re-run.

@jbrockmendel
Copy link
Member Author

taskset 8 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [4578a039]       [58ced564]
+      28.4±0.1μs       32.1±0.9μs     1.13  timeseries.Offsets.time_custom_bday_cal_incr_neg_n
+           132ms            148ms     1.12  timeseries.DatetimeIndex.time_dti_factorize
-       152±0.6μs        137±0.8μs     0.90  timeseries.Offsets.time_custom_bmonthend_incr
-     22.3±0.08μs      19.8±0.07μs     0.89  timeseries.Offsets.time_custom_bday_incr
-      18.9±0.1μs       16.0±0.1μs     0.85  timeseries.AsOf.time_asof_single_early
-           2.00s         424±40ms     0.21  timeseries.AsOfDataFrame.time_asof
-           2.02s          422±7ms     0.21  timeseries.AsOfDataFrame.time_asof_nan

@jbrockmendel
Copy link
Member Author

Clipboard again, is there a game plan for this error?

from pandas._libs.tslib import Timedelta
from pandas._libs.tslibs.frequencies import ( # noqa
get_freq_code, _base_and_stride, _period_str_to_code,
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase,
_period_code_map, _reverse_period_code_map)
from pandas._libs.tslibs.resolution import ( # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the noqa?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's public-facing so I didn't want to break anything downstream. Remove unused imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what exactly is public facing of these? I am ok with simply removing anything _leading, what is not here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the noqa is for DAYS, _maybe_add_count, _weekday_rule_aliases, get_freq_group. I think the private ones here can be removed without breaking anything. DAYS and get_freq_group are imported by and a couple other places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's remove any non-used, you can put the non-private ones in a whatsnew note

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed. Wasn't sure about wording in whatsnew note for the list DAYS.

return (freq // 1000) * 1000


class Resolution(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because this is not a cdef class; make a _Resolution class then provide the public interface via Resolution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately you can just move the constants here. note should this class be a Singleton pattern? do we even instantiate it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note should this class be a Singleton pattern?

Haven't given it any thought. Maybe the idea was to make it easily subclassable?

do we even instantiate it?

Nope, looks like everywhere it shows up its just Resolution.get_freq_group(...) or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so it was for grouping things / sharing code.

maybe add to the list to either make this a Singleton, or better yet use instances of these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Frequency Inference


def unique_deltas(ndarray[int64_t] arr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually used elsewhere? if so should fix that, this should be a cdef method of _FrequencyInferer (or at the very lease cdef only)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just in resolution. Will cdef for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: there's an import of pandas.core.algorithms.unique. Do you know if there is a cython version of that we can use instead? The uses are all for fairly small integer indexes:

if len(unique(self.fields['M'])) > 1:
weekdays = unique(self.index.weekday)
week_of_months = unique((self.index.day - 1) // 7)

In fact, if these are all being called on Index objects, then the Int64Index.unique method might make sense.

@jbrockmendel jbrockmendel mentioned this pull request Nov 1, 2017
59 tasks
from pandas._libs.tslib import Timedelta
from pandas._libs.tslibs.frequencies import ( # noqa
get_freq_code, _base_and_stride, _period_str_to_code,
_INVALID_FREQ_ERROR, opattern, _lite_rule_alias, _dont_uppercase,
_period_code_map, _reverse_period_code_map)
from pandas._libs.tslibs.resolution import ( # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's remove any non-used, you can put the non-private ones in a whatsnew note

cdef int RESO_HR = 5
cdef int RESO_DAY = 6

_ONE_MICRO = 1000L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be de-privatized (add to list or do here ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the list it goes

@jreback
Copy link
Contributor

jreback commented Nov 3, 2017

rebase

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 13, 2017 at 00:01 Hours UTC

@@ -45,7 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`)

- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` can now be accessed in `_libs.tslibs.resolution` (:issue:`18034`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just say they are removed from the private API, no need to point to a private file

@@ -13,7 +13,8 @@
from pandas._libs import tslib, period as libperiod
from pandas._libs.tslibs.parsing import DateParseError
from pandas import Period, Timestamp, offsets
from pandas.tseries.frequencies import DAYS, MONTHS
from pandas._libs.tslibs.resolution import DAYS
from pandas.tseries.frequencies import MONTHS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't MONTHS simply exist in resolution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias naming, will change

@@ -21,7 +21,8 @@
from pandas.core.base import SpecificationError, AbstractMethodError
from pandas.errors import UnsupportedFunctionCall
from pandas.core.groupby import DataError
from pandas.tseries.frequencies import MONTHS, DAYS
from pandas._libs.tslibs.resolution import DAYS
from pandas.tseries.frequencies import MONTHS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

assert frequencies.get_freq_group(offsets.Week()) == 4000
assert frequencies.get_freq_group(offsets.Week(weekday=1)) == 4000
assert frequencies.get_freq_group(offsets.Week(weekday=5)) == 4000
assert resolution.get_freq_group('A') == 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screaming for parametization (later is ok)

@jbrockmendel
Copy link
Member Author

If we want to finish the day with a win, this and #18086 are probably about ready.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

rebase this

@@ -45,7 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid `tz` or `tzinfo` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid `freq` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the `tseries.offsets` module (:issue:`17830`)

- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the private API (:issue:`18034`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private -> public

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

I think next to go in.

@jbrockmendel
Copy link
Member Author

I think next to go in.

Awesome. I'll exercise some restraint and not immediately replace.

@jbrockmendel
Copy link
Member Author

Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh

actually I merged a commit, which may fix this (but you rebase after). don't push again. let me review.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes

@@ -45,6 +45,7 @@ Other API Changes
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
- :function:`tseries.frequencies.get_freq_group` and :list:`tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tags are not going to work, nor are these in the api.rst so they won't reference anything, just use tseries.frequencies.get_freq_group() etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Does this take precedence over "don't push again"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no actually push again with this fix
everything else ok

# ----------------------------------------------------------------------

cpdef resolution(ndarray[int64_t] stamps, tz=None):
cdef:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to list to doc-string things

@jreback jreback added this to the 0.22.0 milestone Nov 12, 2017
@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

ok ping on green.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

needs a rebase.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

note that if you actually rebase on top of master (as opposed to merge master), then this becomes really simple to see where you are. you prob want to make a single commit when you do this as the conflicts are easier to resolve. or you can use an ide. It is much much easier to see what is going on w/o all of these constant merges. I would highly recommend you follow this workflow. since you rebase fairly frequently this is actually pretty easy to do. you don't actually need to squash (though again would recommend having logical commits).

@jreback
Copy link
Contributor

jreback commented Nov 13, 2017

lgtm. ping on green.

@jbrockmendel
Copy link
Member Author

Ping

@jreback jreback merged commit c3cfe90 into pandas-dev:master Nov 13, 2017
@jreback
Copy link
Contributor

jreback commented Nov 13, 2017

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the tslibs-resolution branch November 17, 2017 23:12
@jbrockmendel
Copy link
Member Author

put a note - this is non performiant logic here (and duplicative) and this simply should call unique_1d directly
plus no reason to depend on khash directly;

Note: "here" refers to tslibs.resolution.unique_deltas.

So I tried using unique_1d directly and performance was hurt noticeably. Not a huge shock since unique_1d is in py-land. So as an experiment I took unique_1d, cut out a bunch of type-checking because the use case here is always a ndarray[int64_t], and implemented what was left directly in tslibs.resolution. The surprise was that performance was still noticeably worse than under the status quo.

So is there a chance that this is performant logic here after all?

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants