Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: join_axes-kwarg for pd.concat #21951

Closed
h-vetinari opened this issue Jul 17, 2018 · 3 comments · Fixed by #22318
Closed

DEPR: join_axes-kwarg for pd.concat #21951

h-vetinari opened this issue Jul 17, 2018 · 3 comments · Fixed by #22318
Labels
Deprecate Functionality to remove in pandas
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Jul 17, 2018

The join_axes kwarg of pd.concat is not very clearly documented (took me several tries to get it to work), and its name is not very clear either -- it's actually about restricting the axes that are not being concatenated (i.e. would be 'outer'-joined normally).

In particular, it is basically irrelevant with the deprecation of Panel, since there are no more ax_e_s (plural), only one non-concatenation ax_i_s.

Finally, with reindex and reindex_like, it is redundant as well:

one = pd.DataFrame([[0, 1], [2, 3]], columns=list('ab'))
two = pd.DataFrame([[10, 11], [12, 13]], index=[1, 2], columns=list('bc'))

## simulating 'right'-join for the non-concatenation axis
pd.concat([one, two], join='outer', axis=1, join_axes=two.index) # cryptic error message!
# AssertionError: length of join_axes must not be equal to 1

## only works with list-like join_axes
pd.concat([one, two], join='outer', axis=1, join_axes=[two.index])
#      a    b   b   c
# 1  2.0  3.0  10  11
# 2  NaN  NaN  12  13

## cleaner with reindex?
pd.concat([one, two], join='outer', axis=1).reindex(two.index)
#      a    b     b     c
# 1  2.0  3.0  10.0  11.0
# 2  NaN  NaN  12.0  13.0

Note that the dtype changes due to the intermediate object having NaNs in the rows, but this will be fixed by #21160 anyway. Only question is if performance would be much worse, if concatenating huge Series/DFs before selecting small index-subset.

@h-vetinari h-vetinari mentioned this issue Jul 17, 2018
34 tasks
@WillAyd
Copy link
Member

WillAyd commented Jul 17, 2018

Not terribly familiar with this keyword and agreed it seems odd. I'm +1 for deprecating

@WillAyd WillAyd added the Deprecate Functionality to remove in pandas label Jul 17, 2018
@WillAyd WillAyd added this to the Contributions Welcome milestone Jul 17, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 20, 2018

cc @jreback

@jorisvandenbossche
Copy link
Member

Some remarks:

  • "but this will be fixed by ENH: Integer NA Extension Array  #21160 anyway." -> that is long off to be the default integer type, so I don't think we should use that as an argument now
  • "with reindex and reindex_like, it is redundant as well" -> I don't know the implementation, but I would assume that reindexing after the fact can be less performant? (assuming that with join_axes it reindexes each input before concatenating)

It is certainly true the way it is explained and spelled (eg the fact that you need to pass a list) is certainly outdated now Panel is removed. But we could also consider improving it.

You can eg now use it to basically do a 'left' join.

(note that I am not married to the keyword, I have never used it myself, but just think we should have a bit more discussion about it. It would be interesting to search for usage on github/SO)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants