Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fixed wrong order of ordered labels in pd.cut() #16466

Closed
wants to merge 2 commits into from

Conversation

economy
Copy link
Contributor

@economy economy commented May 23, 2017

Changes to how cut instantiates Categorical when a list of labels are passed to it, and the test_cut_pass_labels test for it. If labels are passed in as a Categorical already, behavior isn't altered at all, but if the user specifies a list of labels, the rank ordering of those labels is preserved in cut.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add:

  • a whatsnew note (0.20.2 reshaping bug fix)
  • the test from the original issue

@jreback jreback added Bug Categorical Categorical Data Type labels May 23, 2017
@economy
Copy link
Contributor Author

economy commented May 23, 2017

@jreback happy to do that, can you tell me how/where to add the what's new, and what you're looking for in terms of the original test? all i know is it passes.

@jreback
Copy link
Contributor

jreback commented May 23, 2017

doc/source/whatsnew/v0.20.2.txt

you copy the test from the original issue and assert that it is correct now (you construct the expected) and compare vs running pd.cut. it should fail w/o this fix and pass with it.

@economy
Copy link
Contributor Author

economy commented May 23, 2017

Ok, the what's new is straightforward, in terms of the test, do you just want me to include the output in the original issue? or an actual test in test_tile.py? The test that was failing no longer fails.

@codecov
Copy link

codecov bot commented May 23, 2017

Codecov Report

Merging #16466 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16466   +/-   ##
=======================================
  Coverage   90.42%   90.42%           
=======================================
  Files         161      161           
  Lines       51027    51027           
=======================================
  Hits        46142    46142           
  Misses       4885     4885
Flag Coverage Δ
#multiple 88.26% <100%> (ø) ⬆️
#single 40.17% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/tile.py 90.25% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04356a8...d8bda87. Read the comment docs.

@codecov
Copy link

codecov bot commented May 23, 2017

Codecov Report

Merging #16466 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16466      +/-   ##
==========================================
+ Coverage   90.75%   90.79%   +0.04%     
==========================================
  Files         161      161              
  Lines       51073    51063      -10     
==========================================
+ Hits        46350    46365      +15     
+ Misses       4723     4698      -25
Flag Coverage Δ
#multiple 88.63% <100%> (+0.04%) ⬆️
#single 40.15% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/tile.py 90.25% <100%> (ø) ⬆️
pandas/core/indexes/multi.py 96.56% <0%> (-0.09%) ⬇️
pandas/core/generic.py 92.26% <0%> (-0.05%) ⬇️
pandas/io/pytables.py 93.06% <0%> (-0.05%) ⬇️
pandas/core/window.py 96.48% <0%> (-0.02%) ⬇️
pandas/io/parsers.py 95.66% <0%> (-0.01%) ⬇️
pandas/core/groupby.py 92.04% <0%> (ø) ⬆️
pandas/core/frame.py 97.66% <0%> (ø) ⬆️
pandas/core/reshape/pivot.py 95.08% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee8346d...29128b3. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented May 23, 2017

in the original issue? or an actual test in test_tile.py? The test that was failing no longer fails.

that's exactly the point. the test should succeed now! and failed before the fix.

@economy
Copy link
Contributor Author

economy commented May 23, 2017

Pre-fix:

In [2]: labels = ['okay', 'good', 'bad']
In [3]: pd.cut([.2, .4, .6, 1, 2, 3, 5], 3, labels=labels)

Out [3]:
[okay, okay, okay, okay, good, good, bad]
Categories (3, object): [bad < good < okay]

Post-fix:

In [4]: labels = ['okay', 'good', 'bad']
In [5]: pd.cut([.2, .4, .6, 1, 2, 3, 5], 3, labels=labels)

Out [5]:
[okay, okay, okay, okay, good, good, bad]
Categories (3, object): [okay < good < bad]
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/economy/Documents/pycon-2017/pandas/pandas, inifile: setup.cfg
collected 35 items

reshape/test_tile.py ...................................

======================================================================================== 35 passed in 0.58 seconds =========================================================================================

@jreback
Copy link
Contributor

jreback commented May 23, 2017

@economy so make a test out of that.

@economy
Copy link
Contributor Author

economy commented May 23, 2017

Gotcha, sorry for the confusion.

@@ -37,6 +37,7 @@ Bug Fixes
~~~~~~~~~

- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
- Bug in ``pd.cut`` when ``labels`` are set (:issue:`16459`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in pd.cut does not preserve label ordering when labels are passed

@@ -211,12 +211,18 @@ def test_cut_pass_labels(self):

result = cut(arr, bins, labels=labels)
exp = Categorical(['Medium'] + 4 * ['Small'] + ['Medium', 'Large'],
categories=labels,
Copy link
Contributor

@jreback jreback May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes a lint error (needs an indent)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comments. rebase on master, ping when green.

@jreback jreback added this to the 0.20.2 milestone May 24, 2017
@economy
Copy link
Contributor Author

economy commented May 26, 2017

@jreback finally had a chance to get to this. should be green now

@TomAugspurger
Copy link
Contributor

@economy looks like a conflict in the whatsnew file. Can you rebase?

@TomAugspurger TomAugspurger added Blocker Blocking issue or pull request for an upcoming release Needs Backport labels May 30, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc comments.

- Fixed a compatibility issue with IPython 6.0's tab completion showing deprecation warnings on Categoricals (:issue:`16409`)

- Bug in ``pd.cut`` when ``labels`` are set (:issue:`16459`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add incorrect ordering resulted

@@ -219,6 +220,12 @@ def test_cut_pass_labels(self):
exp = Categorical.from_codes([1] + 4 * [0] + [1, 2], labels)
tm.assert_categorical_equal(result, exp)

labels = ['Good', 'Medium', 'Bad']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add issue reference here

@jreback
Copy link
Contributor

jreback commented Jun 1, 2017

thanks!

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 1, 2017
closes pandas-dev#16459

Author: economy <the.economy@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <jeff.reback@twosigma.com>

Closes pandas-dev#16466 from economy/fix_cut and squashes the following commits:

29128b3 [economy] comments and whatsnew edits
3898b72 [economy] BUG: fixed wrong order of ordered labels in pd.cut()

(cherry picked from commit d419be4)
TomAugspurger pushed a commit that referenced this pull request Jun 4, 2017
closes #16459

Author: economy <the.economy@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <jeff.reback@twosigma.com>

Closes #16466 from economy/fix_cut and squashes the following commits:

29128b3 [economy] comments and whatsnew edits
3898b72 [economy] BUG: fixed wrong order of ordered labels in pd.cut()

(cherry picked from commit d419be4)
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
closes pandas-dev#16459

Author: economy <the.economy@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <jeff.reback@twosigma.com>

Closes pandas-dev#16466 from economy/fix_cut and squashes the following commits:

29128b3 [economy] comments and whatsnew edits
3898b72 [economy] BUG: fixed wrong order of ordered labels in pd.cut()
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
closes pandas-dev#16459

Author: economy <the.economy@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <jeff.reback@twosigma.com>

Closes pandas-dev#16466 from economy/fix_cut and squashes the following commits:

29128b3 [economy] comments and whatsnew edits
3898b72 [economy] BUG: fixed wrong order of ordered labels in pd.cut()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: cut does not respect order of passed labels
3 participants