New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow drop bins when using the cut function #20947

Merged
merged 16 commits into from May 10, 2018

Conversation

Projects
None yet
3 participants
@FANGOD
Contributor

FANGOD commented May 4, 2018

  • if cut(x, bins=[0, 1, 2, 3, 4, 5, 6, 6, 8, 8, 10], labels=False, retbins=True, right=False) will raise ValueError: You can drop duplicate edges by setting the 'duplicates' kwarg, so add 'duplicates' parameters to the cut function.
@codecov

This comment has been minimized.

codecov bot commented May 4, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@8e2a4a9). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20947   +/-   ##
=========================================
  Coverage          ?   91.82%           
=========================================
  Files             ?      153           
  Lines             ?    49488           
  Branches          ?        0           
=========================================
  Hits              ?    45443           
  Misses            ?     4045           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.22% <ø> (?)
#single 41.85% <ø> (?)
Impacted Files Coverage Δ
pandas/core/reshape/tile.py 93.37% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2a4a9...46c9bca. Read the comment docs.

@jreback

ok would need:

  • a doc-strings update
  • tests, see how testing is done for qcut

FANGOD added some commits May 5, 2018

@pep8speaks

This comment has been minimized.

pep8speaks commented May 5, 2018

Hello @FANGOD! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 10, 2018 at 18:27 Hours UTC

FANGOD added some commits May 5, 2018

@FANGOD

This comment has been minimized.

Contributor

FANGOD commented May 5, 2018

duplicates=drop drop non-uniques

>>> pd.cut(s, [0, 2, 4, 6, 10, 10], labels=False, retbins=True,
...    right=False, duplicates='drop')
... # doctest: +ELLIPSIS
(a    0.0
 b    1.0
 c    2.0
 d    3.0
 e    3.0
 dtype: float64, array([0, 2, 4, 6, 8]))
@jreback

the doc-string is fine, but we need an actual test (checking the duplicates kw in test_tile.py), follow the pattern of how qcut is tested

also need a whatsnew entry

@@ -65,6 +65,8 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
The precision at which to store and display the bins labels.
include_lowest : bool, default False
Whether the first interval should be left-inclusive or not.
duplicates : {default 'raise', 'drop'}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.

This comment has been minimized.

@jreback

jreback May 5, 2018

Contributor

add a versionadded tag

add a Raises part of the doc-string

FANGOD added some commits May 7, 2018

@FANGOD

This comment has been minimized.

Contributor

FANGOD commented May 9, 2018

I don't know what to do. Your ci is too 666.

@@ -65,6 +65,12 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
The precision at which to store and display the bins labels.
include_lowest : bool, default False
Whether the first interval should be left-inclusive or not.
duplicates : {default 'raise', 'drop'}, optional
If bin edges are not unique, raise ValueError("Bin edges must be

This comment has been minimized.

@jreback

jreback May 9, 2018

Contributor

you dont need to show the error message here.

This comment has been minimized.

@jreback

jreback May 9, 2018

Contributor

instead add a Returns section in the doc-string, showing that a ValueError can be raised

e 4.0
dtype: float64, array([0, 2, 4, 6, 8]))
``duplicates=drop`` drop non-uniques

This comment has been minimized.

@jreback

jreback May 9, 2018

Contributor

make this a sentence

class TestCut(object):

This comment has been minimized.

@jreback

jreback May 9, 2018

Contributor

don't add a new file, put in pandas/tests/reshaping/test_tile.py

This comment has been minimized.

@FANGOD

FANGOD May 10, 2018

Contributor

Sorry, I didn't find it before.

FANGOD and others added some commits May 10, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented May 10, 2018

ok linted and added a test. ping on green.

@jreback jreback added this to the 0.23.0 milestone May 10, 2018

@jreback jreback merged commit c30f0bb into pandas-dev:master May 10, 2018

0 of 3 checks passed

ci/circleci Your tests are queued behind your running builds
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jreback

This comment has been minimized.

Contributor

jreback commented May 10, 2018

thanks @FANGOD

@FANGOD

This comment has been minimized.

Contributor

FANGOD commented May 11, 2018

Finally. Good experience.

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment