Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(python): improve performance of align_frames, and add new alignment option #8899

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented May 17, 2023

Closes #8896.

The newer (correct) strategy for aligning frames (which handles duplicate keys) had a bigger impact than expected when aligning a larger number of frames. This patch addresses that by...

  • ...recovering a lot of the earlier performance (instead of being ~12x slower the newer/correct version is ~2x slower).
  • ...offering a new opt-in how strategy that can be faster than the original alignment code; if all frames can be aligned against the first frame, you can set how="left" for a very significant speedup.

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars labels May 17, 2023
@alexander-beedie alexander-beedie added the fix Bug fix label May 17, 2023
@alexander-beedie alexander-beedie merged commit b2e13a5 into pola-rs:main May 17, 2023
16 checks passed
@alexander-beedie
Copy link
Collaborator Author

Will follow-up with a new no_duplicate_keys parameter, for a further speedup (fully-recovering the earlier performance iif we know that the input data does not have duplicate keys).

@alexander-beedie alexander-beedie deleted the enhance-align-frames branch May 17, 2023 19:16
@alexander-beedie alexander-beedie added enhancement New feature or an improvement of an existing feature and removed feature labels May 17, 2023
ritchie46 pushed a commit that referenced this pull request May 20, 2023
alexander-beedie added a commit to alexander-beedie/polars that referenced this pull request May 20, 2023
c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature fix Bug fix performance Performance issues or improvements python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large performance regression in align_frames in 0.17.14
1 participant