Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use range in RangeIndex instead of _start etc. #26581

Merged
merged 4 commits into from Jun 5, 2019

Conversation

@topper-123
Copy link
Contributor

commented May 30, 2019

  • xref #26565
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Make RangeIndex use python's range internally, rather than the three scalars _start, _stop and _step.

Python3's range has several nice properties, that were not available in xrange in Python2. For example it's sliceable, and has .index and count methods. These can be used in Pandas, rather than maintaining Pandas-specific code, offering cleaner code and possibly faster operations.

@topper-123 topper-123 added this to the 0.25.0 milestone May 30, 2019

.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start

This comment has been minimized.

Copy link
@topper-123

topper-123 May 30, 2019

Author Contributor

Unfortunately, fast_parquet uses _start etc. which I hadn't predicted, when I started this PR. So this proposes deprecating the attributes _start, _stop and _step, and removing them in 1.0.

This can be bit of a nuisance for downstream maintainers, so don't know if there's opposition to this, but this PR has several benefits as mentioned, so there's a trade-off.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger May 30, 2019

Contributor

I think pyarrow uses them as well. IIRC recently had a discussion that they're semi-officially public for advanced users (do you recall that @jorisvandenbossche)

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

don't recommend ._range.start, just say .start

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

I think its ok to deprecate these, the downstream will have to adjust

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch from 6763b22 to d947faa May 30, 2019

@codecov

This comment has been minimized.

Copy link

commented May 30, 2019

Codecov Report

Merging #26581 into master will decrease coverage by 50.17%.
The diff coverage is 40%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #26581       +/-   ##
===========================================
- Coverage   91.84%   41.67%   -50.18%     
===========================================
  Files         174      174               
  Lines       50644    50625       -19     
===========================================
- Hits        46516    21099    -25417     
- Misses       4128    29526    +25398
Flag Coverage Δ
#multiple ?
#single 41.67% <40%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/io/packers.py 14.53% <ø> (-73.55%) ⬇️
pandas/core/series.py 45.31% <ø> (-48.31%) ⬇️
pandas/core/frame.py 34.99% <ø> (-62.13%) ⬇️
pandas/core/indexes/range.py 49.84% <35.38%> (-48.13%) ⬇️
pandas/core/dtypes/concat.py 52.19% <70%> (-44.38%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c54dd2...d947faa. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented May 30, 2019

Codecov Report

Merging #26581 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26581      +/-   ##
==========================================
- Coverage   91.87%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50672      -20     
==========================================
- Hits        46575    46554      -21     
- Misses       4117     4118       +1
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.78% <42.99%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/io/packers.py 88.08% <ø> (ø) ⬆️
pandas/core/series.py 93.61% <ø> (ø) ⬆️
pandas/core/frame.py 97% <ø> (-0.12%) ⬇️
pandas/core/dtypes/concat.py 96.58% <100%> (+0.01%) ⬆️
pandas/core/dtypes/common.py 96% <100%> (+0.14%) ⬆️
pandas/core/indexes/range.py 98.47% <100%> (+0.42%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/util/testing.py 90.91% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 454b8c5...228b560. Read the comment docs.

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch 5 times, most recently from 8a0750c to 93d281d May 30, 2019

@jreback
Copy link
Contributor

left a comment

can you add a test for the deprecations?

.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

don't recommend ._range.start, just say .start

.. deprecated:: 0.25.0
Use ._range.start or .start instead.
"""
return self._range.start

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

I think its ok to deprecate these, the downstream will have to adjust

The value of the `stop` parameter
.. deprecated:: 0.25.0
Use ._range.stop or .stop instead.

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

same as above (you don't need to put the issue numbers in the code)

The value of the `step` parameter (or ``1`` if this was not supplied)
.. deprecated:: 0.25.0
Use ._range.step or .step instead.

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

same


return RangeIndex._simple_new(start, stop, step, name=self.name)
new_range = self._range[key]
return RangeIndex.from_range(new_range, name=self.name)

This comment has been minimized.

Copy link
@jreback

jreback Jun 1, 2019

Contributor

are you using cls everywhere? (or self.class)? (can be either just be consistent)

This comment has been minimized.

Copy link
@topper-123

topper-123 Jun 1, 2019

Author Contributor

I wasn't, but I've changed it. For constructor method like this I've done self._from_range(...).

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2019

also a note in the whatsnew for the deprecation

@jsexauer jsexauer referenced this pull request Jun 1, 2019

Open

DEPR: deprecations from prior versions #6581

0 of 89 tasks complete

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch from 93d281d to 123013c Jun 1, 2019

@pep8speaks

This comment has been minimized.

Copy link

commented Jun 1, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-04 09:18:53 UTC

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch 13 times, most recently from dee54ed to f14384c Jun 1, 2019

@jreback

jreback approved these changes Jun 2, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2019

lgtm. @TomAugspurger @jorisvandenbossche any objections?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2019

Do we even need to deprecate these or can we just remove? They technically were private before so don't want to set a precedent that have to deprecate internal code going forward as well if we don't have to

@WillAyd see comments above, they are using in pyarrow & fastparquet use them. So best to deprecate.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Jun 2, 2019

Whoops sorry - thanks for clarifying

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

What's the plan on warning? Should we throw a (not visible) DeprecationWarning for now, and promote that to a FutureWarning? Or just add a FutureWarning after pyarrow / fastparquet can update?

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

My plan is to just post an issue on the issue lists for pyarrow & fastparquet with a notice that the attributes will be removed in 1.0 (unless they want us to hold on longer). I think that is sufficient as these are not publicly advertised.

EDIT: My cercern for adding a code message was that it would be visible and therefore annoying for users of those packages. I could add a DeprecationWarning, if that one is more quiet than the usual FutureWarning.

@topper-123 topper-123 referenced this pull request Jun 3, 2019

Merged

PERF: custom ops for RangeIndex.[all|any|__contains__] #26617

3 of 4 tasks complete
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

ok let’s do a DeprecationWarning now
and then post issues should be good

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

I actually see now that DeprecationWarnings are also displayed by default. I thought they were only shown when starting Python with -Wall or similar. Am I missing something?

Uploading anyway now I've done it and there's a demand for the warnings.

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch from f14384c to 233aad0 Jun 3, 2019

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

I actually see now that DeprecationWarnings are also displayed by default

Only if you directly use the deprecated method or attribute, but they should not show up if you eg call a function that uses them (which should be the case for libraries such as pyarrow / fastparquet).

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

Personally, I am -1 on deprecating them (pyarrow/fastparquet only just started to use them, we literally said them it was fine, it's more annoying for them to deal with an attribute name change than for us to just keep it a couple of releases more). But if we're using a deprecation instead of future warning, I won't object if the majority is fine with deprecating.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

Personally, I am -1 on deprecating them (pyarrow/fastparquet only just started to use them, we literally said them it was fine, it's more annoying for them to deal with an attribute name change than for us to just keep it a couple of releases more). But if we're using a deprecation instead of future warning, I won't object if the majority is fine with deprecating.

right this is the purpose of the DeprecationWarning (rather than FutureWarning) as it won't show-up too much; but the deprecation should happen.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

@topper-123 can you file an issue on fastparquet and a JIRA on pyarrow.

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

No problem. I'd like to have this merged before I write the issues, so they can see the issue by just pulling pandas master.

@topper-123 topper-123 added Refactor and removed Clean labels Jun 3, 2019

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

Just added a small clean-up, so __new__ won't have a custom & internal ensure_int, but uses the same ensure_python_int as proposed in #26617. This simplifies the code a bit further.

If this gets accepted before #26617, I'll rebase that and add the requested type hints.

@topper-123 topper-123 force-pushed the topper-123:clean_RangeIndex branch from 98d1881 to 228b560 Jun 4, 2019

@jreback jreback merged commit 6ce7fc7 into pandas-dev:master Jun 5, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190604.6 succeeded
Details
pandas-dev.pandas (Checks) Checks succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

thanks @topper-123

@topper-123 topper-123 deleted the topper-123:clean_RangeIndex branch Jun 5, 2019

vaibhavhrt added a commit to vaibhavhrt/pandas that referenced this pull request Jun 6, 2019

cbrnr added a commit to cbrnr/pandas that referenced this pull request Jun 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.