rank incorrectly orders ordered categories #15420

Closed
dfd opened this Issue Feb 16, 2017 · 1 comment

Comments

Projects
None yet
3 participants

dfd commented Feb 16, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
a = pd.DataFrame(['first', 'second', 'third', 'fourth', 'fifth', 'sixth'], columns=['A'])
a['A'] = a['A'].astype('category', ).cat.set_categories(
    ['first', 'second', 'third', 'fourth', 'fifth', 'sixth'], ordered=True)
a['A'].rank()
# outputs:
# 0    2.0
# 1    4.0
# 2    6.0
# 3    3.0
# 4    1.0
# 5    5.0

Problem description

rank seems to be ignoring the order of ordered categories.

Expected Output

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None


</details>

@dfd Thanks for the report! That is indeed clearly a bug.
For example in sort_values, it takes the correct order into account, but rank was apparently missed.

In [6]: a.A.sort_values()
Out[6]: 
0     first
1    second
2     third
3    fourth
4     fifth
5     sixth
Name: A, dtype: category
Categories (6, object): [first < second < third < fourth < fifth < sixth]

I think this should be a rather easy fix (in the pd.core.algorithms.rank, we should need to check for categorical, and then pass the underlying integer codes). If you would be interested in trying to do a pull request with a fix, always welcome!

jorisvandenbossche added this to the Next Major Release milestone Feb 16, 2017

ikilledthecat referenced this issue Feb 16, 2017

Closed

Rank categorical #15422

0 of 4 tasks complete

@jreback jreback modified the milestone: 0.20.0, Next Major Release Feb 24, 2017

jreback closed this in 3fe85af Feb 24, 2017

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

@AnkurDedania Prasanjit Prakash + AnkurDedania BUG: incorrect ranking in an ordered categorical
check for categorical, and then pass the underlying integer codes.
closes #15420

Author: Prasanjit Prakash <jeet@gmail.com>

Closes #15422 from ikilledthecat/rank_categorical and squashes the following commits:

a7e573b [Prasanjit Prakash] moved test for categorical, in rank, to top
3ba4e3a [Prasanjit Prakash] corrections after rebasing
c43a029 [Prasanjit Prakash] using if/else construct to pick sorting function for categoricals
f8ec019 [Prasanjit Prakash] ask Categorical for ranking function
40d88c1 [Prasanjit Prakash] return values for rank from categorical object
049c0fc [Prasanjit Prakash] GH#15420 added support for na_option when ranking categorical
5e5bbeb [Prasanjit Prakash] BUG: GH#15420 rank for categoricals
ef999c3 [Prasanjit Prakash] merged with upstream master
fbaba1b [Prasanjit Prakash] return values for rank from categorical object
fa0b4c2 [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
9a6b5cd [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
4220e56 [Prasanjit Prakash] BUG: GH15420 - _rank private method on Categorical
6b70921 [Prasanjit Prakash] GH#15420 move rank inside categoricals
bf4e36c [Prasanjit Prakash] GH#15420 added support for na_option when ranking categorical
ce90207 [Prasanjit Prakash] BUG: GH#15420 rank for categoricals
85b267a [Prasanjit Prakash] Added support for categorical datatype in rank - issue#15420
28f8c8f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment