vectorize `find_first_of` for long needle 🪡 #4557

AlexGuteniev · 2024-04-02T19:22:58Z

In benchmark results only the last line changed:

before

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
bm<uint8_t, 2, 3>            2.70 ns         2.35 ns    597333333
bm<uint16_t, 2, 3>           2.71 ns         2.13 ns    689230769
bm<uint8_t, 7, 4>            14.1 ns         10.9 ns    112000000
bm<uint16_t, 7, 4>           12.4 ns         10.6 ns    100000000
bm<uint8_t, 9, 3>            9.77 ns         7.70 ns    190638298
bm<uint16_t, 9, 3>           10.4 ns         8.06 ns    162909091
bm<uint8_t, 22, 5>           10.1 ns         8.02 ns    208372093
bm<uint16_t, 22, 5>          11.0 ns         8.64 ns    151864407
bm<uint8_t, 3056, 7>          273 ns          233 ns      5973333
bm<uint16_t, 3056, 7>         561 ns          425 ns      3089655
bm<uint8_t, 1011, 11>        95.5 ns         79.2 ns     17568627
bm<uint16_t, 1011, 11>       2872 ns         2448 ns       497778

after

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
bm<uint8_t, 2, 3>            2.71 ns         1.80 ns    746666667
bm<uint16_t, 2, 3>           2.71 ns         1.55 ns   1000000000
bm<uint8_t, 7, 4>            14.1 ns         7.47 ns    175686275
bm<uint16_t, 7, 4>           12.3 ns         5.82 ns    190638298
bm<uint8_t, 9, 3>            9.78 ns         6.75 ns    208372093
bm<uint16_t, 9, 3>           10.7 ns         7.54 ns    190638298
bm<uint8_t, 22, 5>           10.3 ns         7.21 ns    169056604
bm<uint16_t, 22, 5>          11.1 ns         8.63 ns    175686275
bm<uint8_t, 3056, 7>          276 ns          205 ns     11200000
bm<uint16_t, 3056, 7>         552 ns          432 ns      2890323
bm<uint8_t, 1011, 11>        94.4 ns         76.2 ns     19478261
bm<uint16_t, 1011, 11>        481 ns          365 ns      3895652

AlexGuteniev · 2024-04-03T07:33:52Z

There are real test failures here.

There were multiple errors.

After fixing all of them, I thought it would be better to preserve the <= 16 version, as the generic becomes noticeable slower.

AlexGuteniev · 2024-04-03T07:35:52Z

Review with "hide whitespace difference" to see the <= 16 branch preserved.

StephanTLavavej · 2024-04-08T21:11:32Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2024-04-09T22:26:30Z

Thanks for improving your vectorized implementation even further! 🚀 🚀 🚀

vectorize find_first_of for long needle

ee91035

AlexGuteniev requested a review from a team as a code owner April 2, 2024 19:22

format

d39748a

StephanTLavavej added the performance Must go faster label Apr 2, 2024

This comment was marked as resolved.

Sign in to view

AlexGuteniev added 2 commits April 3, 2024 10:19

best of parts

a30f11a

format

dc3a1cf

StephanTLavavej self-assigned this Apr 3, 2024

StephanTLavavej approved these changes Apr 3, 2024

View reviewed changes

StephanTLavavej assigned StephanTLavavej and unassigned StephanTLavavej Apr 3, 2024

StephanTLavavej merged commit 5ced5d2 into microsoft:main Apr 9, 2024
35 checks passed

AlexGuteniev deleted the long_needle branch April 10, 2024 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorize `find_first_of` for long needle 🪡 #4557

vectorize `find_first_of` for long needle 🪡 #4557

AlexGuteniev commented Apr 2, 2024 •

edited

This comment was marked as resolved.

AlexGuteniev commented Apr 3, 2024

AlexGuteniev commented Apr 3, 2024

StephanTLavavej commented Apr 8, 2024

StephanTLavavej commented Apr 9, 2024

vectorize find_first_of for long needle 🪡 #4557

vectorize find_first_of for long needle 🪡 #4557

Conversation

AlexGuteniev commented Apr 2, 2024 • edited

This comment was marked as resolved.

AlexGuteniev commented Apr 3, 2024

AlexGuteniev commented Apr 3, 2024

StephanTLavavej commented Apr 8, 2024

StephanTLavavej commented Apr 9, 2024

vectorize `find_first_of` for long needle 🪡 #4557

vectorize `find_first_of` for long needle 🪡 #4557

AlexGuteniev commented Apr 2, 2024 •

edited