Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: merge_asof with multiple "by" keys #55580

Merged
merged 9 commits into from
Oct 22, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Oct 18, 2023

> asv continuous -f 1.1 upstream/main merge-as-of-multi-by -b join_merge.MergeAsof
.       before           after         ratio
     [59d4e841]       [bfe776d7]
     <main>           <merge-as-of-multi-by>
-      1.66±0.03s         602±50ms     0.36  join_merge.MergeAsof.time_multiby('nearest', None)
-      1.56±0.03s         560±40ms     0.36  join_merge.MergeAsof.time_multiby('nearest', 5)
-      1.48±0.03s         449±50ms     0.30  join_merge.MergeAsof.time_multiby('forward', None)
-      1.60±0.07s         472±30ms     0.29  join_merge.MergeAsof.time_multiby('forward', 5)
-      1.25±0.03s         314±10ms     0.25  join_merge.MergeAsof.time_multiby('backward', None)
-       1.35±0.1s         295±10ms     0.22  join_merge.MergeAsof.time_multiby('backward', 5)

@lukemanley lukemanley added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 18, 2023
@lukemanley lukemanley added this to the 2.2 milestone Oct 18, 2023
rbv = ensure_object(rbv)

mapped = [
_factorize_keys(
Copy link
Member

@mroeschke mroeschke Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this work alright with ExtensionArrays? Do we have tests for merging on by that is a extension column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will work with EAs. I've parameterized an existing test to include EA dtypes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an additional test as this seems to fix #43541

left_by_values[n],
right_by_values[n],
sort=False,
how="outer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with this code but does hard coding "outer" not have any impact? Wondering if there was a case where two dataframes get joined with very few matches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can use how=left here. Updated.

@lukemanley lukemanley changed the title PERF: merge_asof with multiple "by" keys BUG/PERF: merge_asof with multiple "by" keys Oct 21, 2023
@lukemanley lukemanley added the Bug label Oct 21, 2023
@mroeschke mroeschke merged commit ad3f3f7 into pandas-dev:main Oct 22, 2023
42 of 43 checks passed
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the merge-as-of-multi-by branch November 16, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
3 participants