Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rust, python): Improve explodes: offsets_to_indexes performance #8964

Merged
merged 3 commits into from
May 22, 2023

Conversation

kpberry
Copy link
Contributor

@kpberry kpberry commented May 21, 2023

I refactored offsets_to_indexes to remove the manual pointer arithmetic and value_count counter, which results in a typical 8-15% speedup for large arrays (1,000 to 10,000,000 elements) and reasonable numbers of randomly generated offsets (between 1% and 50% of the capacity). For small arrays and small numbers of offsets, performance is not significantly impacted. I also added a criterion to the main loop which exits early when the indexes array is already larger than capacity, which results in an arbitrarily large speedup when capacity is exceeded and does not impact performance in typical cases.

I have uploaded my benchmark script here, in case you want to verify the performance improvement.

@github-actions github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels May 21, 2023
while value_count < *offset {
value_count += 1;
idx.push(last_idx)
for (offset_start, offset_end) in offsets.iter().zip(offsets.iter().skip(1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace offsets.iter().skip(1) with offsets[1..].iter() that should be faster as this removes a branch in the hot loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go ahead and make the change, though it doesn't appear to significantly impact performance. If I'm not wrong, offsets.iter().skip(1) only gets called once in this function at the start of the loop, so the difference should be small, if there is one. Re-running the benchmark with this change suggests the performance difference is ±1%, which I would guess is mostly due to noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the skip will be called every iteration and has an unlikely branch. The branch prediction will correctly predict the correct branch, but it can prevent future compiler optimizations to have that branch.

@ritchie46 ritchie46 changed the title perf(rust): Improve offsets_to_indexes performance perf(rust, python): Improve explodes: offsets_to_indexes performance May 22, 2023
@github-actions github-actions bot added the python Related to Python Polars label May 22, 2023
@ritchie46
Copy link
Member

Looks great! Thank you @kpberry

@ritchie46 ritchie46 merged commit 9588767 into pola-rs:main May 22, 2023
11 checks passed
@kpberry
Copy link
Contributor Author

kpberry commented May 22, 2023

No problem, happy to help!

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants