Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assorted DatetimeIndex bugfixes #24157

Merged
merged 32 commits into from Dec 14, 2018
Merged

BUG: Assorted DatetimeIndex bugfixes #24157

merged 32 commits into from Dec 14, 2018

Conversation

@jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 8, 2018

Bit of a hodge-podge.

  • Deprecate time_rule in offsets.generate_range
  • Improve error messages if NaT gets passed to date_range, or in validate_frequency (closes #11587)
  • Preserve timezone in DatetimeIndex->CategoricalIndex->DatetimeIndex (closes #18664)
  • Preserve freq when indexing DTI/TDI with Ellipsis (closes #21282)
  • closes #11587

@TomAugspurger nothing here is urgent, so if this overlaps with DTA it can stay on the backburner.

@pep8speaks
Copy link

@pep8speaks pep8speaks commented Dec 8, 2018

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov
Copy link

@codecov codecov bot commented Dec 8, 2018

Codecov Report

Merging #24157 into master will decrease coverage by 49.17%.
The diff coverage is 50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24157       +/-   ##
===========================================
- Coverage    92.2%   43.02%   -49.18%     
===========================================
  Files         162      162               
  Lines       51700    51714       +14     
===========================================
- Hits        47670    22251    -25419     
- Misses       4030    29463    +25433
Flag Coverage Δ
#multiple ?
#single 43.02% <50%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 65.01% <25%> (-33.55%) ⬇️
pandas/tseries/offsets.py 46.43% <50%> (-50.41%) ⬇️
pandas/core/arrays/datetimelike.py 48.98% <83.33%> (-47.38%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-98.62%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
... and 122 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c911151...06d0a8e. Read the comment docs.

@codecov
Copy link

@codecov codecov bot commented Dec 8, 2018

Codecov Report

Merging #24157 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24157      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51785    51796      +11     
==========================================
+ Hits        47758    47767       +9     
- Misses       4027     4029       +2
Flag Coverage Δ
#multiple 90.62% <100%> (-0.01%) ⬇️
#single 43.01% <70.58%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 96.41% <100%> (+0.05%) ⬆️
pandas/tseries/offsets.py 96.84% <100%> (-0.01%) ⬇️
pandas/core/arrays/datetimes.py 98.22% <100%> (-0.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01afc42...b927925. Read the comment docs.

@jbrockmendel
Copy link
Member Author

@jbrockmendel jbrockmendel commented Dec 8, 2018

Azure fail is in developer.rst, unrelated.

@jbrockmendel
Copy link
Member Author

@jbrockmendel jbrockmendel commented Dec 8, 2018

@datapythonista is the azure fail caused by something in this PR? It looks unrelated but re-pushing didn’t fix it.

@datapythonista
Copy link
Member

@datapythonista datapythonista commented Dec 8, 2018

master was broken (I merged an old PR that didn't have some linting), but it's been fixed in #24160

If you merge master and push again, the CI should pass now.

Copy link
Contributor

@jreback jreback left a comment

pls add to #6581 for the deprecation

@@ -327,6 +327,10 @@ def __getitem__(self, key):
"numpy.newaxis (`None`) and integer or boolean "
"arrays are valid indices")

if key is Ellipsis:
Copy link
Contributor

@jreback jreback Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a more explicit comment. This is likely a more general fix (IOW this is likely broken for all __getitem__ on EA)

Copy link
Contributor

@jreback jreback Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't this just work as is? why is this not a view (and not a copy)?

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't this just work as is?

as-is DatetimeIndex[...] loses its freq attribute

why is this not a view (and not a copy)?

Since the copy doesn't have deep=True, self.copy() effectively a view?

Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more explicit in your comment, this is not obvious

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a hopefully clearer spot a few lines down

pandas/core/arrays/datetimelike.py Outdated Show resolved Hide resolved
pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved
pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved
ci = pd.CategoricalIndex(dti)
carr = pd.Categorical(dti)
cser = pd.Series(ci)

Copy link
Contributor

@jreback jreback Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take a look

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without adding casting code that makes the verbosity a wash.

Once Categorical(DatetimeArray) works (currently raises bc DatetimeArray doesn't have _constructor) then this test can be parametrized over DatetimeIndex/DatetimeArray

@jbrockmendel
Copy link
Member Author

@jbrockmendel jbrockmendel commented Dec 9, 2018

pls add to #6581 for the deprecation

Will do. Actually, is this private enough that a deprecation cycle is unnecessary? If so we should just remove the arg now and move the whole func to arrays.datetimes, since that is the only place it is used

@jbrockmendel
Copy link
Member Author

@jbrockmendel jbrockmendel commented Dec 11, 2018

@TomAugspurger does this step on your toes?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

LGTM.

I think it'll apply cleanly aside from imports.

@@ -327,6 +327,10 @@ def __getitem__(self, key):
"numpy.newaxis (`None`) and integer or boolean "
"arrays are valid indices")

if key is Ellipsis:
Copy link
Contributor

@jreback jreback Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't this just work as is? why is this not a view (and not a copy)?

@@ -327,6 +327,10 @@ def __getitem__(self, key):
"numpy.newaxis (`None`) and integer or boolean "
"arrays are valid indices")

if key is Ellipsis:
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more explicit in your comment, this is not obvious

pandas/core/arrays/datetimelike.py Outdated Show resolved Hide resolved
pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved
@@ -2487,6 +2488,9 @@ def generate_range(start=None, end=None, periods=None,
"""
if time_rule is not None:
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you need to change any existing tests to account for this?

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, tests.indexes.datetimes.test_date_range test_generate and test_generate_cday

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sufficiently internal that we don't need to do a deprecation cycle?

Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's just drop it entirely (fix the tests). I don't think generate_range is exposed, though maybe add in a note in api breaking changes just to cover.

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. OK with moving this to arrays.datetimes while we're at it?

@@ -1133,7 +1133,7 @@ Deprecations
- Passing a string alias like ``'datetime64[ns, UTC]'`` as the `unit` parameter to :class:`DatetimeTZDtype` is deprecated. Use :class:`DatetimeTZDtype.construct_from_string` instead (:issue:`23990`).
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`).
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`)

- Passing a `time_rule` to `pandas.tseries.offsets.generate_range` is deprecated and will raise a ``TypeError`` in a future version. Pass an ``offset`` instead (:issue:`24157`)
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't jive with what's at the top of the PR. missing 2 notes?

Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about #11587 ?

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it merit a note? All that is changed is an error message. But yes, this does close 11587, I'll update this above.

pandas/core/arrays/datetimelike.py Show resolved Hide resolved
# non-fixed frequencies are not meaningful for timedelta64;
# we retain that error message
raise e
# GH#11587 if index[0] is NaT _generate_range will raise
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too cryptic, pls make more readable if you don't actually know what's happening here

@@ -1652,6 +1654,13 @@ def maybe_convert_dtype(data, copy):
raise TypeError("Passing PeriodDtype data is invalid. "
"Use `data.to_timestamp()` instead")

elif is_categorical_dtype(data):
# GH#18664 preserve tz in going DTI->Categorical->DTI
# TODO: cases where we need to do another pass through this func,
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the TODO case here, what is an example? do you have an xfailed test?

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example is Categorical(TimedeltaIndex) where we should do another pass through maybe_convert_dtype in order to issue the FutureWarning. I'm assuming there are other corner cases here that will need to be caught/tested.

@jreback jreback added this to the 0.24.0 milestone Dec 13, 2018
Copy link
Contributor

@jreback jreback left a comment

small comments, ping on green.

@@ -380,6 +380,7 @@ Backwards incompatible API changes
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)
- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`)
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably won't render as its not in the api.rst but ok

@@ -1310,6 +1310,8 @@ Datetimelike
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)
- Bug in :class:`Categorical.__setitem__` not allowing setting with another ``Categorical`` when both are undordered and have the same categories, but in a different order (:issue:`24142`)
- Bug in :func:`date_range` where using dates with millisecond resolution or higher could return incorrect values or the wrong number of values in the index (:issue:`24110`)
- Bug in :class:`DatetimeIndex` where constructing a :class:`DatetimeIndex` from a :class:`Categorical` or :class:`CategoricalIndex` would incorrectly drop timezone information (:issue:`18664`)
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the note for #11587?

Copy link
Member Author

@jbrockmendel jbrockmendel Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed, should be just below this line.

@@ -2,6 +2,7 @@
from datetime import date, datetime, timedelta
import functools
import operator
import warnings
Copy link
Contributor

@jreback jreback Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob going to fail the build as this is not needed

@jreback
Copy link
Contributor

@jreback jreback commented Dec 13, 2018

linting issue. ping on green.

@jbrockmendel
Copy link
Member Author

@jbrockmendel jbrockmendel commented Dec 13, 2018

Ping

@jreback jreback merged commit a1cf3f6 into pandas-dev:master Dec 14, 2018
3 checks passed
@jreback
Copy link
Contributor

@jreback jreback commented Dec 14, 2018

thanks!

@jbrockmendel jbrockmendel deleted the deets branch Dec 14, 2018
Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment