Skip to content

Conversation

lukemanley
Copy link
Member

Follow up to #55670 which was a targeted regression fix for datelike dtypes.

merge_asof currently raises if the dtype of by is something other than int64, uint64, or object. This PR removes that limitation.

| Change   | Before [e48df1cf] <main>   | After [a53a5533] <merge-asof-by-dtypes>   |   Ratio | Benchmark (Parameter)                                |
|----------|----------------------------|-------------------------------------------|---------|------------------------------------------------------|
| -        | 299±30ms                   | 199±20ms                                  |    0.67 | join_merge.MergeAsof.time_multiby('backward', 5)     |
| -        | 311±30ms                   | 202±20ms                                  |    0.65 | join_merge.MergeAsof.time_multiby('backward', None)  |
| -        | 292±20ms                   | 158±20ms                                  |    0.54 | join_merge.MergeAsof.time_by_object('forward', None) |
| -        | 302±30ms                   | 157±10ms                                  |    0.52 | join_merge.MergeAsof.time_by_object('forward', 5)    |
| -        | 411±10ms                   | 200±5ms                                   |    0.49 | join_merge.MergeAsof.time_multiby('forward', 5)      |
| -        | 420±30ms                   | 202±20ms                                  |    0.48 | join_merge.MergeAsof.time_by_object('nearest', None) |
| -        | 457±20ms                   | 215±8ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('forward', None)   |
| -        | 515±10ms                   | 241±6ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('nearest', 5)      |
| -        | 519±10ms                   | 242±4ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('nearest', None)   |
| -        | 419±40ms                   | 185±20ms                                  |    0.44 | join_merge.MergeAsof.time_by_object('nearest', 5)    |
| -        | 159±20ms                   | 64.5±8ms                                  |    0.41 | join_merge.MergeAsof.time_by_int('backward', 5)      |
| -        | 154±10ms                   | 63.5±6ms                                  |    0.41 | join_merge.MergeAsof.time_by_int('backward', None)   |
| -        | 437±20ms                   | 123±20ms                                  |    0.28 | join_merge.MergeAsof.time_by_int('nearest', 5)       |
| -        | 328±30ms                   | 83.9±3ms                                  |    0.26 | join_merge.MergeAsof.time_by_int('forward', None)    |
| -        | 470±30ms                   | 124±10ms                                  |    0.26 | join_merge.MergeAsof.time_by_int('nearest', None)    |
| -        | 324±30ms                   | 81.7±3ms                                  |    0.25 | join_merge.MergeAsof.time_by_int('forward', 5)       |

@lukemanley lukemanley added Bug Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 25, 2023
@lukemanley lukemanley added this to the 2.2 milestone Oct 25, 2023
@lukemanley lukemanley requested a review from WillAyd as a code owner October 25, 2023 02:35
@mroeschke mroeschke merged commit 2f4c93e into pandas-dev:main Oct 25, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the merge-asof-by-dtypes branch November 16, 2023 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge_asof can't handle floats in by column?
2 participants