Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: join_axes-kwarg for pd.concat #21951

Closed
h-vetinari opened this issue Jul 17, 2018 · 3 comments

Comments

@h-vetinari
Copy link
Contributor

commented Jul 17, 2018

The join_axes kwarg of pd.concat is not very clearly documented (took me several tries to get it to work), and its name is not very clear either -- it's actually about restricting the axes that are not being concatenated (i.e. would be 'outer'-joined normally).

In particular, it is basically irrelevant with the deprecation of Panel, since there are no more ax_e_s (plural), only one non-concatenation ax_i_s.

Finally, with reindex and reindex_like, it is redundant as well:

one = pd.DataFrame([[0, 1], [2, 3]], columns=list('ab'))
two = pd.DataFrame([[10, 11], [12, 13]], index=[1, 2], columns=list('bc'))

## simulating 'right'-join for the non-concatenation axis
pd.concat([one, two], join='outer', axis=1, join_axes=two.index) # cryptic error message!
# AssertionError: length of join_axes must not be equal to 1

## only works with list-like join_axes
pd.concat([one, two], join='outer', axis=1, join_axes=[two.index])
#      a    b   b   c
# 1  2.0  3.0  10  11
# 2  NaN  NaN  12  13

## cleaner with reindex?
pd.concat([one, two], join='outer', axis=1).reindex(two.index)
#      a    b     b     c
# 1  2.0  3.0  10.0  11.0
# 2  NaN  NaN  12.0  13.0

Note that the dtype changes due to the intermediate object having NaNs in the rows, but this will be fixed by #21160 anyway. Only question is if performance would be much worse, if concatenating huge Series/DFs before selecting small index-subset.

@h-vetinari h-vetinari referenced this issue Jul 17, 2018

Open

DEPR: let's deprecate #18262

22 of 35 tasks complete
@WillAyd

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

Not terribly familiar with this keyword and agreed it seems odd. I'm +1 for deprecating

@WillAyd WillAyd added the Deprecate label Jul 17, 2018

@WillAyd WillAyd added this to the Contributions Welcome milestone Jul 17, 2018

@gfyoung

This comment has been minimized.

Copy link
Member

commented Jul 20, 2018

@h-vetinari h-vetinari referenced this issue Aug 13, 2018

Merged

DEPR: join_axes-kwarg in pd.concat #22318

4 of 4 tasks complete
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Aug 23, 2018

Some remarks:

  • "but this will be fixed by #21160 anyway." -> that is long off to be the default integer type, so I don't think we should use that as an argument now
  • "with reindex and reindex_like, it is redundant as well" -> I don't know the implementation, but I would assume that reindexing after the fact can be less performant? (assuming that with join_axes it reindexes each input before concatenating)

It is certainly true the way it is explained and spelled (eg the fact that you need to pass a list) is certainly outdated now Panel is removed. But we could also consider improving it.

You can eg now use it to basically do a 'left' join.

(note that I am not married to the keyword, I have never used it myself, but just think we should have a bit more discussion about it. It would be interesting to search for usage on github/SO)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.