perf(rust, python): Improve explodes: `offsets_to_indexes` performance #8964

kpberry · 2023-05-21T22:13:00Z

I refactored offsets_to_indexes to remove the manual pointer arithmetic and value_count counter, which results in a typical 8-15% speedup for large arrays (1,000 to 10,000,000 elements) and reasonable numbers of randomly generated offsets (between 1% and 50% of the capacity). For small arrays and small numbers of offsets, performance is not significantly impacted. I also added a criterion to the main loop which exits early when the indexes array is already larger than capacity, which results in an arbitrarily large speedup when capacity is exceeded and does not impact performance in typical cases.

I have uploaded my benchmark script here, in case you want to verify the performance improvement.

ritchie46 · 2023-05-22T06:04:19Z

polars/polars-core/src/chunked_array/ops/explode.rs

-        while value_count < *offset {
-            value_count += 1;
-            idx.push(last_idx)
+    for (offset_start, offset_end) in offsets.iter().zip(offsets.iter().skip(1)) {


Can you replace offsets.iter().skip(1) with offsets[1..].iter() that should be faster as this removes a branch in the hot loop.

I can go ahead and make the change, though it doesn't appear to significantly impact performance. If I'm not wrong, offsets.iter().skip(1) only gets called once in this function at the start of the loop, so the difference should be small, if there is one. Re-running the benchmark with this change suggests the performance difference is ±1%, which I would guess is mostly due to noise.

No, the skip will be called every iteration and has an unlikely branch. The branch prediction will correctly predict the correct branch, but it can prevent future compiler optimizations to have that branch.

ritchie46 · 2023-05-22T07:34:33Z

Looks great! Thank you @kpberry

kpberry · 2023-05-22T07:36:44Z

No problem, happy to help!

pola-rs#8964)

kpberry added 2 commits May 21, 2023 16:10

Add nonzero first offset test case for offsets_to_indexes

7f8ecf6

Speed up offsets_to_indexes, remove unsafe block

651b17d

github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels May 21, 2023

ritchie46 reviewed May 22, 2023

View reviewed changes

Replace .iter().skip(1) with [1..].iter()

f353245

ritchie46 changed the title ~~perf(rust): Improve offsets_to_indexes performance~~ perf(rust, python): Improve explodes: offsets_to_indexes performance May 22, 2023

github-actions bot added the python Related to Python Polars label May 22, 2023

ritchie46 merged commit 9588767 into pola-rs:main May 22, 2023
11 checks passed

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023

perf(rust, python): Improve explodes: offsets_to_indexes performance (

771e786

pola-rs#8964)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(rust, python): Improve explodes: `offsets_to_indexes` performance #8964

perf(rust, python): Improve explodes: `offsets_to_indexes` performance #8964

kpberry commented May 21, 2023

ritchie46 May 22, 2023

kpberry May 22, 2023

ritchie46 May 22, 2023

ritchie46 commented May 22, 2023

kpberry commented May 22, 2023

perf(rust, python): Improve explodes: offsets_to_indexes performance #8964

perf(rust, python): Improve explodes: offsets_to_indexes performance #8964

Conversation

kpberry commented May 21, 2023

ritchie46 May 22, 2023

Choose a reason for hiding this comment

kpberry May 22, 2023

Choose a reason for hiding this comment

ritchie46 May 22, 2023

Choose a reason for hiding this comment

ritchie46 commented May 22, 2023

kpberry commented May 22, 2023

perf(rust, python): Improve explodes: `offsets_to_indexes` performance #8964

perf(rust, python): Improve explodes: `offsets_to_indexes` performance #8964