Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing sort=True to pandas.concat does not always keep behavior without sort passed #21174

Closed
mjalkio opened this issue May 22, 2018 · 2 comments

Comments

@mjalkio
Copy link

mjalkio commented May 22, 2018

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({"a": [1, 2], "b": [1, 2]}, columns=['b', 'a'])
>>> pd.concat([df])
   b  a
0  1  1
1  2  2
>>> pd.concat([df], sort=True)
   a  b
0  1  1
1  2  2

Problem description

According to the 0.23.0 What's New, we should be using the sort parameter to specify sorting behavior for pandas.concat. The documentation says:

To keep the previous behavior (sorting) and silence the warning, pass sort=True

As you can see from this example, passing sort=True does not always maintain the current behavior of pandas.concat. Intuitively, what's returned when sort=True is passed makes sense (the columns are sorted), but this is not the behavior in v0.23.0 without sort nor the behavior in v0.22.0.

Expected Output

In v0.23.0 the output of pandas.concat should not be different from calls with sort=None and sort=True.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 9.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: 0.8.1
psycopg2: 2.7.3 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

The warning message only shows when the old implementation would have triggered a sort, i.e. when the non-concatenation axis is not already aligned. The warning message and the whatsnew should all be read in that context.

See #21101 for an issue with the warning message.

The old behavior of sorting if and only if the non-concatenation axis is not already aligned isn't very useful IMO. Do you have a use case where it was useful?

@mjalkio
Copy link
Author

mjalkio commented May 22, 2018

Ah okay. We started to see the warning in our logs, so I added sort everywhere because I misinterpreted the What's New.

I agree that the sorting behavior is not useful, I was just taking the approach of trying to silence the warnings without changing any behavior of our code. Thanks for the clarification!

@mjalkio mjalkio closed this as completed May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants