New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix Period and PeriodIndex support of combined offsets aliases #13874

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@agraboso
Contributor

agraboso commented Aug 1, 2016

  • closes #13730
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

Essentially, makes sure any freq string passed to a Period or PeriodIndex goes through _Period._maybe_convert_freq(), which calls to_offset(), which is where the logic for combining aliases is.

All the examples in #13730 result in the correct output, and all existing tests pass. I have not written any new ones yet — I first wanted to get the opinion of the maintainers.

This PR builds on #13868 (without it, some existing tests fail).

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 1, 2016

this seems to be on top of #13868 was that intentional? (usually we try to do independent PR's). if they are not its ok, but then should just put in one / indicate at the top that is the case.

@sinhrks

@jreback jreback added the Frequency label Aug 1, 2016

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 1, 2016

can you run an asv on this, make sure looks good for perf.

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 1, 2016

also would need to add tests from the issue. I dont' think we are actually testing this (and that's why it doesn't fail).

@sinhrks

View changes

pandas/src/period.pyx Outdated
@@ -1203,6 +1208,7 @@ class Period(_Period):
def _ordinal_from_fields(year, month, quarter, day,
hour, minute, second, freq):
freq = Period._maybe_convert_freq(freq)

This comment has been minimized.

@sinhrks

sinhrks Aug 1, 2016

Member

this isn't needed, as _ordinal_from_field is only called from here (after 1130th line)?

This comment has been minimized.

@agraboso

agraboso Aug 2, 2016

Contributor

That's right. I was being overly cautious (maybe some other part of the code might need to call this in the future?) but I can remove it.

@sinhrks sinhrks added the Period label Aug 1, 2016

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 2, 2016

I ran the asv benchmarks. Here are the results:

    before     after       ratio
  [59f25575] [01c55da7]
+    3.73ms    16.80ms      4.51  reindex.frame_drop_dup_na_inplace.time_frame_drop_dup_na_inplace
+  898.72ms      3.12s      3.47  plotting.plot_andrews_curves.time_plot_andrews_curves
-   45.25ms    13.87ms      0.31  period.create_period_index_from_date_range.time_period_index

I have no experience whatsoever with these, so I don't know how to interpret the results above, but I find it perplexing that the two tests that run slower have absolutely nothing to do with time series...

@sinhrks

This comment has been minimized.

Member

sinhrks commented Aug 2, 2016

Thanks for the test. The slowness looks to be caused by regex search. Maybe we should try dict lookup first like get_offset does (related to #4205, merging to_offset and get_offset to make fast/simple impl).

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 2, 2016

@sinhrks I'm confused. The asv test that has to do with periods is faster with this PR...

@sinhrks

This comment has been minimized.

Member

sinhrks commented Aug 2, 2016

Ah sorry, I looked first 2 ratios (looks unrelated to the period). Pls ignore my comment.

@codecov-io

This comment has been minimized.

codecov-io commented Aug 2, 2016

Current coverage is 85.31% (diff: 96.29%)

Merging #13874 into master will increase coverage by <.01%

@@             master     #13874   diff @@
==========================================
  Files           139        139          
  Lines         50143      50146     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42777      42780     +3   
  Misses         7366       7366          
  Partials          0          0          

Powered by Codecov. Last update 7e15923...49a3783

@jreback jreback modified the milestone: 0.19.0 Aug 2, 2016

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 2, 2016

pls rebase. Need tests to close #13730 ; and add to whatsnew.

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 2, 2016

Rebased. I have also added tests for Period, Period.asfreq, PeriodIndex, PeriodIndex.asfreq, PeriodIndex.to_datetime and period_range with combined aliases.

There is one standing issue: the function get_standard_freq() does not accepted combined aliases as of now. Apart from tests, it is only called from three places in the whole codebase:

  • Lines period.py#L609 and period.pyx#L742 both read

    freqstr = frequencies.get_standard_freq(other)
    

    They can — and should — be changed to

    freqstr = other.rule_code
    
  • period.py#L488-L489

    freq = Period._maybe_convert_freq(freq)
    freq = frequencies.get_standard_freq(freq)
    

    can — and should — be changed to

    freq = Period._maybe_convert_freq(freq).freqstr
    

In view of this, I propose deprecating get_standard_freq() (as is already done with get_offset_name()) — or removing it completely if possible.

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 2, 2016

I implemented the changes and deprecation I proposed in my last comment in the last commit, including tests for the deprecation warning anywhere get_standard_freq() was tested.

@sinhrks

View changes

doc/source/whatsnew/v0.19.0.txt Outdated
@@ -864,3 +865,4 @@ Bug Fixes
- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)
- Bug in ``Index`` raises ``KeyError`` displaying incorrect column when column is not in the df and columns contains duplicate values (:issue:`13822`)
- Bug in ``Period`` and ``PeriodIndex`` creating wrong dates when frequency has multiple offsets (:issue:`13874`)

This comment has been minimized.

@sinhrks

sinhrks Aug 2, 2016

Member

"combined offsets" or "combined aliases" should be better.

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 2, 2016

@sinhrks Is "combined offset aliases" okay?

@agraboso agraboso changed the title from BUG: Fix Period and PeriodIndex support of combined alias offsets to BUG: Fix Period and PeriodIndex support of combined offsets aliases Aug 2, 2016

@sinhrks

This comment has been minimized.

Member

sinhrks commented Aug 2, 2016

@agraboso Thx, it sounds ok.

@sinhrks

View changes

pandas/tseries/period.py Outdated
@@ -478,7 +485,7 @@ def asfreq(self, freq=None, how='E'):
"""
how = _validate_end_alias(how)
freq = frequencies.get_standard_freq(freq)
freq = Period._maybe_convert_freq(freq).freqstr

This comment has been minimized.

@sinhrks

This comment has been minimized.

@agraboso

agraboso Aug 3, 2016

Contributor

Do you mean stripping the .freqstr at the end? Sure, that works (tests pass)

This comment has been minimized.

@sinhrks

sinhrks Aug 3, 2016

Member

yes, thx to confirm.

@jreback

View changes

doc/source/whatsnew/v0.19.0.txt Outdated
@@ -684,6 +684,7 @@ Deprecations
- ``pd.tseries.util.pivot_annual`` is deprecated. Use ``pivot_table`` as alternative, an example is :ref:`here <cookbook.pivot>` (:issue:`736`)
- ``pd.tseries.util.isleapyear`` has been deprecated and will be removed in a subsequent release. Datetime-likes now have a ``.is_leap_year`` property. (:issue:`13727`)
- ``Panel4D`` and ``PanelND`` constructors are deprecated and will be removed in a future version. The recommended way to represent these types of n-dimensional data are with the `xarray package <http://xarray.pydata.org/en/stable/>`__. Pandas provides a :meth:`~Panel4D.to_xarray` method to automate this conversion. (:issue:`13564`)
- ``pd.tseries.frequencies.get_standard_freq`` is deprecated. Use `freq.rule_code` if `freq` is a `DateOffset` object, or `pd.tseries.frequencies.to_offset(freq).rule_code` if it is a string or tuple. (:issue:`13874`)

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

was this ever an advertised function? I think we can just remove it.

actually since this is easy, then ok

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

change to match revised msg below

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

write it out as: pandas.tseries.frequencies.get_standard_freq

@jreback

View changes

pandas/tseries/frequencies.py Outdated
return None
msg = ("get_standard_freq(freq) is deprecated. "
"Use freq.rule_code instead if freq is a DateOffset object, "

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

just recommend .to_offset. (which accepts all of these). BTW, can we enhance the doc-string of .to_offset? (add Parameters / more examples)

@jsexauer jsexauer referenced this pull request Aug 3, 2016

Open

DEPR: deprecations from prior versions #6581

0 of 84 tasks complete
@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 3, 2016

@jreback How does it look?

@jreback

View changes

pandas/tseries/frequencies.py Outdated
Parameters
----------
freq : str or tuple or datetime.timedelta or DateOffset or None

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

use commas rather than or

@jreback

View changes

pandas/tseries/frequencies.py Outdated
Parameters
----------
freq : str or tuple or datetime.timedelta or DateOffset or None

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

add a Raises section

@jreback

View changes

pandas/tseries/frequencies.py Outdated
-------
delta : DateOffset
None if freq is None
ValueError if freq is an invalid frequency

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

e.g. this ValueError goes in Raises

@jreback

View changes

pandas/tseries/frequencies.py Outdated
Returns
-------
delta : DateOffset

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

can also add a See Also section to: DateOffset

@jreback

View changes

pandas/tseries/frequencies.py Outdated
@@ -386,51 +386,76 @@ def get_period_alias(offset_str):
_INVALID_FREQ_ERROR = "Invalid frequency: {0}"
def to_offset(freqstr):
def to_offset(freq):

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

this is technically a change, e.g. someone passes by name.

so we can either change it back or deprecate it (deprecating it is fine). you can use the deprecate decorator

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 3, 2016

@jreback Done.

@jreback

View changes

pandas/tseries/frequencies.py Outdated
See Also
--------
DateOffset

This comment has been minimized.

@jreback

jreback Aug 3, 2016

Contributor

make this pandas.DateOffset

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 3, 2016

you are going to have a flake8 minor issue (long lines). otherwise lgtm.

can you run an asv vs master (just -b period) is fine, and show the results.

@jreback jreback added this to the 0.19.0 milestone Aug 3, 2016

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 3, 2016

you are going to have a flake8 minor issue (long lines). otherwise lgtm.

flake8 does not complain. Which are the long lines you refer to? Found and fixed them.

can you run an asv vs master (just -b period) is fine, and show the results.

    before     after       ratio
  [768bf495] [81262a57]
-   54.20ms    14.96ms      0.28  period.create_period_index_from_date_range.time_period_index
@jreback

This comment has been minimized.

Contributor

jreback commented Aug 3, 2016

are u just showing the single change in asv?
there are more period benchmarks

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 3, 2016

Here is the whole output:

· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing into conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 24 total benchmarks (2 commits * 1 environments * 12 benchmarks)
[  0.00%] · For pandas commit hash 81262a57:
[  0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt....................................
[  0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  4.17%] ··· Running gil.nogil_datetime_fields.time_datetime_to_period                                                                                                    18.29ms
[  8.33%] ··· Running gil.nogil_datetime_fields.time_period_to_datetime                                                                                                    55.16ms
[ 12.50%] ··· Running period.create_period_index_from_date_range.time_period_index                                                                                         14.96ms
[ 16.67%] ··· Running period.period_algorithm.time_period_index_drop_duplicates                                                                                            30.93μs
[ 20.83%] ··· Running period.period_algorithm.time_period_index_value_counts                                                                                                1.12ms
[ 25.00%] ··· Running period.period_algorithm.time_period_series_drop_duplicates                                                                                           16.60ms
[ 29.17%] ··· Running period.period_algorithm.time_period_series_value_counts                                                                                               6.94ms
[ 33.33%] ··· Running period.period_setitem.time_period_setitem                                                                                                             failed
[ 37.50%] ··· Running plotting.plot_timeseries_period.time_plot_timeseries_period                                                                                         308.25ms
[ 41.67%] ··· Running timeseries.timeseries_iter_periodindex.time_timeseries_iter_periodindex                                                                                6.57s
[ 45.83%] ··· Running timeseries.timeseries_iter_periodindex_preexit.time_timeseries_iter_periodindex_preexit                                                              55.71ms
[ 50.00%] ··· Running timeseries.timeseries_period_downsample_mean.time_timeseries_period_downsample_mean                                                                  26.72ms
[ 50.00%] · For pandas commit hash 768bf495:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt....
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· Running gil.nogil_datetime_fields.time_datetime_to_period                                                                                                    19.20ms
[ 58.33%] ··· Running gil.nogil_datetime_fields.time_period_to_datetime                                                                                                    51.12ms
[ 62.50%] ··· Running period.create_period_index_from_date_range.time_period_index                                                                                         54.20ms
[ 66.67%] ··· Running period.period_algorithm.time_period_index_drop_duplicates                                                                                            33.60μs
[ 70.83%] ··· Running period.period_algorithm.time_period_index_value_counts                                                                                                1.07ms
[ 75.00%] ··· Running period.period_algorithm.time_period_series_drop_duplicates                                                                                           16.89ms
[ 79.17%] ··· Running period.period_algorithm.time_period_series_value_counts                                                                                               6.42ms
[ 83.33%] ··· Running period.period_setitem.time_period_setitem                                                                                                             failed
[ 87.50%] ··· Running plotting.plot_timeseries_period.time_plot_timeseries_period                                                                                         294.17ms
[ 91.67%] ··· Running timeseries.timeseries_iter_periodindex.time_timeseries_iter_periodindex                                                                                4.01s
[ 95.83%] ··· Running timeseries.timeseries_iter_periodindex_preexit.time_timeseries_iter_periodindex_preexit                                                              40.75ms
[100.00%] ··· Running timeseries.timeseries_period_downsample_mean.time_timeseries_period_downsample_mean                                                                  24.24ms
    before     after       ratio
  [768bf495] [81262a57]
-   54.20ms    14.96ms      0.28  period.create_period_index_from_date_range.time_period_index

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

By the way, I just realized that period.period_setitem.time_period_setitem fails even on master.

@agraboso

This comment has been minimized.

Contributor

agraboso commented Aug 8, 2016

@jreback jreback closed this in 81819b7 Aug 8, 2016

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 8, 2016

thanks!

@jreback jreback referenced this pull request Aug 8, 2016

Closed

BUG: freqstr may be parsed incorrectly #13930

3 of 3 tasks complete

@agraboso agraboso deleted the agraboso:fix-13730 branch Aug 9, 2016

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 1, 2018

CLN: Remove tseries v0.19.0 deprecations
* Remove frequencies.get_standard_freq
* Drop the "freqstr" keyword from frequencies.to_offset

Deprecated in v0.19.0

xref pandas-devgh-13874

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 1, 2018

CLN: Remove tseries v0.19.0 deprecations
* Remove frequencies.get_standard_freq
* Drop the "freqstr" keyword from frequencies.to_offset

Deprecated in v0.19.0

xref pandas-devgh-13874

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 1, 2018

CLN: Remove tseries v0.19.0 deprecations
* Remove frequencies.get_standard_freq
* Drop the "freqstr" keyword from frequencies.to_offset

Deprecated in v0.19.0

xref pandas-devgh-13874

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 1, 2018

CLN: Remove tseries v0.19.0 deprecations
* Remove frequencies.get_standard_freq
* Drop the "freqstr" keyword from frequencies.to_offset

Deprecated in v0.19.0

xref pandas-devgh-13874

@jreback jreback referenced this pull request Jan 1, 2018

Open

DEPR: deprecations log for removed issues #13777

115 of 115 tasks complete

jreback added a commit that referenced this pull request Jan 2, 2018

CLN: Remove tseries v0.19.0 deprecations (#19023)
* Remove frequencies.get_standard_freq
* Drop the "freqstr" keyword from frequencies.to_offset

Deprecated in v0.19.0

xref gh-13874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment