Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: 'D' and offset.Day always represents calendar day #22867

Closed
wants to merge 65 commits into from

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Sep 28, 2018

@pep8speaks
Copy link

pep8speaks commented Sep 28, 2018

Hello @mroeschke! Thanks for updating the PR.

Line 3229:66: W292 no newline at end of file

Comment last updated on December 13, 2018 at 23:43 Hours UTC

@mroeschke mroeschke changed the title WIP: 'D' and offset.Day always represents calendar day [WIP] API: 'D' and offset.Day always represents calendar day Sep 28, 2018
See the :ref:`documentation here <timeseries.dayvscalendarday>` for more information.
:class:`Day` and associated frequency alias ``'D'`` will always respect
calendar day arithmetic
(:issue:`22864`, :issue:`20596`, :issue:`16980`, :issue:`8774`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can add this PR number as well

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 7, 2018

@mroeschke I will try to give this a look (finally, sorry about that). Can you give a short status update? Are there still problems you are encountering, .. ?

@jorisvandenbossche
Copy link
Member

One of the example I used in the discussion is now errorring:

In [1]: idx = pd.date_range("2016-10-30", freq='H', periods=4*24, tz='Europe/Helsinki')                                                                                         

In [2]: pd.Series(index=idx).resample('D').count()                                                                                                                              
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-aaf4c03a8f51> in <module>
----> 1 pd.Series(index=idx).resample('D').count()

~/scipy/pandas/pandas/core/resample.py in f(self, _method)
    864 for method in ['count']:
    865     def f(self, _method=method):
--> 866         return self._downsample(_method)
    867     f.__doc__ = getattr(GroupBy, method).__doc__
    868     setattr(Resampler, method, f)

~/scipy/pandas/pandas/core/resample.py in _downsample(self, how, **kwargs)
    990         **kwargs : kw args passed to how function
    991         """
--> 992         self._set_binner()
    993         how = self._is_cython_func(how) or how
    994         ax = self.ax

~/scipy/pandas/pandas/core/resample.py in _set_binner(self)
    170         """
    171         if self.binner is None:
--> 172             self.binner, self.grouper = self._get_binner()
    173 
    174     def _get_binner(self):

~/scipy/pandas/pandas/core/resample.py in _get_binner(self)
    178         """
    179 
--> 180         binner, bins, binlabels = self._get_binner_for_time()
    181         bin_grouper = BinGrouper(bins, binlabels, indexer=self.groupby.indexer)
    182         return binner, bin_grouper

~/scipy/pandas/pandas/core/resample.py in _get_binner_for_time(self)
    979         if self.kind == 'period':
    980             return self.groupby._get_time_period_bins(self.ax)
--> 981         return self.groupby._get_time_bins(self.ax)
    982 
    983     def _downsample(self, how, **kwargs):

~/scipy/pandas/pandas/core/resample.py in _get_time_bins(self, ax)
   1428         # general version, knowing nothing about relative frequencies
   1429         bins = lib.generate_bins_dt64(
-> 1430             ax_values, bin_edges, self.closed, hasnans=ax.hasnans)
   1431 
   1432         if self.closed == 'right':

~/scipy/pandas/pandas/_libs/lib.pyx in pandas._libs.lib.generate_bins_dt64()

ValueError: Values falls after last bin

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you see the way forward here with _Day. Is the idea that Day will be replaced by that, after deprecating the behaviours that are different?
(but I don't see any deprecations right now?)

@@ -1136,6 +1096,7 @@ Deprecations
- :func:`pandas.types.is_datetimetz` is deprecated in favor of `pandas.types.is_datetime64tz` (:issue:`23917`)
- Creating a :class:`TimedeltaIndex` or :class:`DatetimeIndex` by passing range arguments `start`, `end`, and `periods` is deprecated in favor of :func:`timedelta_range` and :func:`date_range` (:issue:`23919`)
- Passing a string alias like ``'datetime64[ns, UTC]'`` as the `unit` parameter to :class:`DatetimeTZDtype` is deprecated. Use :class:`DatetimeTZDtype.construct_from_string` instead (:issue:`23990`).
- Operations where the offset alias ``'D'`` or :class:`Day` acts as a fixed, absolute duration of 24 hours are deprecated. This includes :class:`Timestamp` arithmetic and all operations with :class:`Timedelta`. :class:`Day` will always represent a calendar day in a future version (:issue:`22864`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some examples here?

For example, with the following I don't see a change or warning:

In [11]: pd.Timestamp("2012-10-28", tz='Europe/Brussels') + pd.offsets.Day()                                                                                                    
Out[11]: Timestamp('2012-10-28 23:00:00+0100', tz='Europe/Brussels')

@mroeschke
Copy link
Member Author

Sorry, I have been primarily working on reverting things and getting the test to pass again by rolling back CalendarDay.

TODO
Add all the depreciation warnings (I anticipate there being quite a few of these). This needs to be included in the following:

  • Day tick arithmetic with other Ticks, Timedeltas, and DatetimeTZ
  • DatetimeIndex.shift (tz-aware only)

Migration
The plan is for _Day (formerly CalendarDay) to just replace Day once the prior Day behavior is replaced.

Concern
During the last dev chat, there was interest for 'D' to be compatible freq argument/offset with both Timedelta and Datetime. I don't see a clear way to make this possible without adding a lot of monkeypatching.

Example: timedelta_range(..., freq='D'); to_offset('D') will return _Day in the future and this offset will need to increment a Timedelta, but _Day + Timedelta is an invalid operation.

@jreback jreback added the Frequency DateOffsets label Dec 9, 2018
@mroeschke mroeschke mentioned this pull request Dec 10, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Dec 10, 2018
@mroeschke
Copy link
Member Author

Closing in favor of #24330 for now (reverts behavior to pre-CalendarDay). We can revisit this after the 0.24.0 release.

@mroeschke mroeschke closed this Dec 18, 2018
@mroeschke mroeschke removed this from the 0.24.0 milestone Dec 18, 2018
@mroeschke mroeschke deleted the make_D_calendar_day branch May 19, 2022 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Make 'D' & offsets.Day always operate as calendar day instead of 24 Hour
4 participants