BUG: concat column order behaviors changes after 1.4 #47127

Yikun · 2022-05-26T09:28:53Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

After 1.4:
>>> import pandas as pd
>>> pdf = pd.DataFrame({"A": [0, 2, 4], "B": [1, 3, 5], "C": [6, 7, 8]})
>>> pd.concat([pdf, pdf["C"], pdf["A"]], ignore_index=True, join='outer', sort=True)
     A    B    C    0
0  0.0  1.0  6.0  NaN
1  2.0  3.0  7.0  NaN
2  4.0  5.0  8.0  NaN
3  NaN  NaN  NaN  6.0
4  NaN  NaN  NaN  7.0
5  NaN  NaN  NaN  8.0
6  NaN  NaN  NaN  0.0
7  NaN  NaN  NaN  2.0
8  NaN  NaN  NaN  4.0


Before 1.4:
>>> import pandas as pd
>>> pdf = pd.DataFrame({"A": [0, 2, 4], "B": [1, 3, 5], "C": [6, 7, 8]})
>>> pd.concat([pdf, pdf["C"], pdf["A"]], ignore_index=True, join='outer', sort=True)
     0    A    B    C
0  NaN  0.0  1.0  6.0
1  NaN  2.0  3.0  7.0
2  NaN  4.0  5.0  8.0
3  6.0  NaN  NaN  NaN
4  7.0  NaN  NaN  NaN
5  8.0  NaN  NaN  NaN
6  0.0  NaN  NaN  NaN
7  2.0  NaN  NaN  NaN
8  4.0  NaN  NaN  NaN

Issue Description

concat column order behaviors changes after 1.4

Expected Behavior

>>> pd.concat([pdf, pdf["C"], pdf["A"]], ignore_index=True, join='outer', sort=False)
     A    B    C    0
0  0.0  1.0  6.0  NaN
1  2.0  3.0  7.0  NaN
2  4.0  5.0  8.0  NaN
3  NaN  NaN  NaN  6.0
4  NaN  NaN  NaN  7.0
5  NaN  NaN  NaN  8.0
6  NaN  NaN  NaN  0.0
7  NaN  NaN  NaN  2.0
8  NaN  NaN  NaN  4.0
>>> pd.concat([pdf, pdf["C"], pdf["A"]], ignore_index=True, join='outer', sort=True)
     0    A    B    C
0  NaN  0.0  1.0  6.0
1  NaN  2.0  3.0  7.0
2  NaN  4.0  5.0  8.0
3  6.0  NaN  NaN  NaN
4  7.0  NaN  NaN  NaN
5  8.0  NaN  NaN  NaN
6  0.0  NaN  NaN  NaN
7  2.0  NaN  NaN  NaN
8  4.0  NaN  NaN  NaN

Installed Versions

1.4+

Yikun · 2022-05-28T03:07:03Z

Related first commit: 01b8d2a

@CloseChoice @jreback

simonjayhawkins · 2022-05-28T13:29:32Z

Thanks @Yikun for the report.

Related first commit: 01b8d2a

in _get_combined_index in pandas/core/indexes/api.py in #43833, index = union_indexes(indexes, sort=sort) -> index = union_indexes(indexes, sort=False)

(Pdb) a
indexes = [Index(['A', 'B', 'C'], dtype='object'), Int64Index([0], dtype='int64'), Int64Index([0], dtype='int64')]
intersect = False
sort = True
copy = True
(Pdb) union_indexes(indexes, sort=False)
Index(['A', 'B', 'C', 0], dtype='object')
(Pdb) union_indexes(indexes, sort=True)
Index([0, 'A', 'B', 'C'], dtype='object')
(Pdb)

and the code a few lines later

    if sort:
        try:
            index = index.sort_values()
        except TypeError:
            pass

index.sort_values() raises TypeError: '<' not supported between instances of 'int' and 'str' as _sort_mixed from pandas/core/algorithms.py is not used nor is the same sort logic as implemented in union_indexes(indexes, sort=True)

Previous `pandas` behavior prior to 1.4.3 [did not sort numeric column names](pandas-dev/pandas#47127), but this now occurs. We don't sort within other parsers, so switching this flag to be consistent with previous behavior. There is no clear reason sorting is necessary here.

* Fix #200 * Set namd parser column sorting to False Previous `pandas` behavior prior to 1.4.3 [did not sort numeric column names](pandas-dev/pandas#47127), but this now occurs. We don't sort within other parsers, so switching this flag to be consistent with previous behavior. There is no clear reason sorting is necessary here. Co-authored-by: David Dotson <dotsdl@gmail.com>

…low 1.4.3 behavior ### What changes were proposed in this pull request? Respect Series.concat sort parameter when `num_series == 1` to follow 1.4.3 behavior. ### Why are the changes needed? In #36711, we follow the pandas 1.4.2 behaviors to respect Series.concat sort parameter except `num_series == 1` case. In [pandas 1.4.3](https://github.com/pandas-dev/pandas/releases/tag/v1.4.3), fix the issue pandas-dev/pandas#47127. The bug of `num_series == 1` is also fixed, so we add this PR to follow panda 1.4.3 behavior. ### Does this PR introduce _any_ user-facing change? Yes, we already cover this case in: https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst ``` In Spark 3.4, the Series.concat sort parameter will be respected to follow pandas 1.4 behaviors. ``` ### How was this patch tested? - CI passed - test_concat_index_axis passed with panda 1.3.5, 1.4.2, 1.4.3. Closes #37217 from Yikun/SPARK-39807. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

Yikun added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 26, 2022

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 28, 2022

code sample for pandas-dev#47127

7224941

simonjayhawkins added this to the 1.4.3 milestone May 28, 2022

simonjayhawkins added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 28, 2022

phofl mentioned this issue Jun 3, 2022

REGR: concat not sorting columns for mixed column names #47206

Merged

4 tasks

jreback closed this as completed in #47206 Jun 5, 2022

dotsdl mentioned this issue Jun 30, 2022

[CI] account for changed pandas.concat sorting behavior alchemistry/alchemlyb#201

Merged

Yikun mentioned this issue Jul 18, 2022

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior apache/spark#37217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: concat column order behaviors changes after 1.4 #47127

BUG: concat column order behaviors changes after 1.4 #47127

Yikun commented May 26, 2022 •

edited

Loading

Yikun commented May 28, 2022

simonjayhawkins commented May 28, 2022

BUG: concat column order behaviors changes after 1.4 #47127

BUG: concat column order behaviors changes after 1.4 #47127

Comments

Yikun commented May 26, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

Yikun commented May 28, 2022

simonjayhawkins commented May 28, 2022

Yikun commented May 26, 2022 •

edited

Loading