Skip to content

Conversation

@lukemanley
Copy link
Member

@lukemanley lukemanley commented Feb 16, 2022

Similar to #45838, but for DataFrame.join.

DataFrame.join already had a fast path for joining with an empty frame but it only covered a limited set of cases. This PR makes the fast path a bit faster and covers additional cases.

import pandas as pd 
import numpy as np

N = 10_000_000

df = pd.DataFrame({'A': np.arange(N)})
df_empty = pd.DataFrame(columns=['B', 'C'], dtype='int64')

%timeit df.join(df_empty, how='inner')
932 ms ± 83.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)       <- main
285 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 16, 2022
@jreback jreback added this to the 1.5 milestone Feb 16, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have sufficient asv's to cover this?

if level is not None and (self._is_multi or other._is_multi):
return self._join_level(other, level, how=how)

if len(other) == 0 and how in ("left", "outer"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a if len(other) and if len(self) clause then sub if's here for the cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@lukemanley
Copy link
Member Author

do we have sufficient asv's to cover this?

Added an asv to cover this.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green

@lukemanley
Copy link
Member Author

@jreback - greenish, errors look unrelated

@mroeschke
Copy link
Member

Could you merge in main one more time? (failures look unrelated but good to be sure)

@lukemanley
Copy link
Member Author

@mroeschke - merged main. greenish again, let me know if you think the error is related

@mroeschke mroeschke merged commit aafa7a9 into pandas-dev:main Feb 22, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley. The failure was unrelated

@lukemanley lukemanley deleted the join-empty-fastpath branch March 2, 2022 01:13
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* faster joins when left and/or right is empty

* whatsnew

* cleanup

* add asv for joining with empty frame

* asv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants