Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate passing range-like arguments to DatetimeIndex, TimedeltaIndex #23919

Merged
merged 21 commits into from
Nov 28, 2018

Conversation

jbrockmendel
Copy link
Member

Also verify_integrity since allowing that to be False complicates things, and internally if its False we should be using simple_new anyway

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@gfyoung gfyoung added Datetime Datetime data dtype API Design Deprecate Functionality to remove in pandas labels Nov 26, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me, a minor comment.

Further, can you add an assertion for the warning on verify_integrity ?

Also, small suggestion: you now left one DatetimeIndex in there with an assert_produces_warning. I would maybe do such an assertion in a separate test (now that assertion is a bit buried in a bigger test that is for the rest using date_range)

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved
@@ -385,7 +385,7 @@ def test_groupby_groups_datetimeindex(self):
groups = grouped.groups
assert isinstance(list(groups.keys())[0], datetime)

# GH 11442
# GH#11442
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important, but was just wondering: why are you adding the '#' everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for internal consistency (not a big deal, but for grepping purposes). A little bit because I was curious how long it would take before someone asked about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have have the gh-xxxx style as well, slight prefernce for that

with warnings.catch_warnings():
# we ignore warnings from passing verify_integrity=False
# TODO: If we knew what was going in to **d, we might be able to
# go through _simple_new instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check __reduce__, d should be _data and the attributes dict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, that seems to work; just pushed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this broke the legacy pickle tests; reverted.

@codecov
Copy link

codecov bot commented Nov 26, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@30c1290). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #23919   +/-   ##
=========================================
  Coverage          ?   92.31%           
=========================================
  Files             ?      161           
  Lines             ?    51489           
  Branches          ?        0           
=========================================
  Hits              ?    47533           
  Misses            ?     3956           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.71% <100%> (?)
#single 42.43% <53.33%> (?)
Impacted Files Coverage Δ
pandas/util/testing.py 86.09% <100%> (ø)
pandas/core/resample.py 96.99% <100%> (ø)
pandas/io/packers.py 88.08% <100%> (ø)
pandas/core/indexes/timedeltas.py 89.44% <100%> (ø)
pandas/tseries/holiday.py 93.17% <100%> (ø)
pandas/core/indexes/datetimes.py 96.51% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30c1290...cc40717. Read the comment docs.

@@ -131,10 +132,18 @@ def __new__(cls, data=None, unit=None, freq=None, start=None, end=None,
periods=None, closed=None, dtype=None, copy=False,
name=None, verify_integrity=True):

if verify_integrity is not True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't the default of verify_integrity None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because by default we do verify the integrity of a passed frequency. Best guess as to initial motivation is that verify_integrity=False is kind of like fastpath=True and was never really intended to be user-facing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but aren't you deprecating it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the future it will just be set to verify_integrity=True at the top of __new__. (and when the time comes, in the TDA/DTA constructors it will always be True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the variable to exist because there are cases in which we can determine it is not necessary, in which case we can skip a potentially-expensive check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall that (and dont see how it would work), but yah, not passing freq would make verify_integrity unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the variable to exist because there are cases in which we can determine it is not necessary, in which case we can skip a potentially-expensive check.

@jbrockmendel Isn't the end goal to actually remove the verify_integrity keyword from the Index constructors? (not just keep it with a fixed True value). Because otherwise, why are you deprecating verify_integrity=False if we actually want to use it ourselves?
In cases that we determine the check is not necessary, we can use _simple_new ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually verify-integrity will not be in the signature. The first line of the method will just define it to be True.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but then as @jreback said, we should put it at None so we can also deprecate the case for somebody setting it to True explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -385,7 +385,7 @@ def test_groupby_groups_datetimeindex(self):
groups = grouped.groups
assert isinstance(list(groups.keys())[0], datetime)

# GH 11442
# GH#11442
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have have the gh-xxxx style as well, slight prefernce for that

@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

lgtm

@jbrockmendel
Copy link
Member Author

circle fail is Hypothesis; de-facto green

@mroeschke
Copy link
Member

Shouldn't we do the same for PeriodIndex as well?

@jbrockmendel
Copy link
Member Author

Shouldn't we do the same for PeriodIndex as well?

Possibly. There are also fields kwargs there we might want to deprecate. Prefer to do this separately.

@jreback
Copy link
Contributor

jreback commented Nov 28, 2018

@jorisvandenbossche

@jreback jreback added this to the 0.24.0 milestone Nov 28, 2018
@jorisvandenbossche jorisvandenbossche merged commit 6b3490f into pandas-dev:master Nov 28, 2018
@jorisvandenbossche
Copy link
Member

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: deprecate range-generating keywords in DatetimeIndex
7 participants