New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergesort is not stable when sorting by a categorical column #16793

Closed
has2k1 opened this Issue Jun 28, 2017 · 2 comments

Comments

Projects
None yet
5 participants
@has2k1
Contributor

has2k1 commented Jun 28, 2017

import pandas as pd
import numpy as np

n = 5  # not a problem for n < 5

df = pd.DataFrame({
    'x': pd.Categorical(np.repeat([1, 2, 3, 4], n), ordered=True)
})

df.sort_values('x', kind='mergesort')

output:

        x
0	1
1	1
2	1
3	1
4	1
8	2
7	2
9	2
5	2
6	2
10	3
11	3
12	3
13	3
14	3
18	4
15	4
16	4
17	4
19	4

Problem description

When sorting (using mergesort) a dataframe by an ordered categorical column, the sorting should be stable. In the example above x==2 and x==4 the values have been scrambled.

Expected Output

The index should remain in order since the column is already sorted.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.1-gentoo
machine: x86_64
processor: Intel(R)
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1 chris-b1 added this to the Next Major Release milestone Jun 29, 2017

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Jun 29, 2017

Contributor

Yeah, looks like we throw away the kind argument on Categoricals, shouldn't be too hard to trace through, PR welcome!

return items.argsort(ascending=ascending)

Contributor

chris-b1 commented Jun 29, 2017

Yeah, looks like we throw away the kind argument on Categoricals, shouldn't be too hard to trace through, PR welcome!

return items.argsort(ascending=ascending)

@ri938

This comment has been minimized.

Show comment
Hide comment
@ri938

ri938 Jul 2, 2017

Contributor

Code currently forbids anything but the default kind='quicksort' from being passed to categorical argsort. See validate_argsort_with_ascending.

Contributor

ri938 commented Jul 2, 2017

Code currently forbids anything but the default kind='quicksort' from being passed to categorical argsort. See validate_argsort_with_ascending.

@chris-b1 chris-b1 referenced this issue Jul 6, 2017

Merged

BUG: kind parameter on categorical argsort #16834

3 of 4 tasks complete

@jreback jreback modified the milestones: 0.20.3, Next Major Release Jul 6, 2017

@jreback jreback closed this in #16834 Jul 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment