Remove internal memcpy implementation #29

hkratz · 2021-04-27T15:31:56Z

LLVM is unrolling it for the 128+ bytes case which never happens. The speedup with this patch is negligable but code size is reduced quite a bit.

It is a shame that all attempts to do this the Rust way (using copy_from_slice() or read_unaligned()/write_unaligned()) have resulted in non-inlined calls to memcpy with noticable slowdown.

This implementation is optimally auto-vectorized and inlined. The compiler can even prove that the len is less < 32 at the first call site and optimizes that check away.

lemire · 2021-04-27T16:07:02Z

non-inlined calls to memcpy with noticable slowdown

I find it surprising that call to memcpy would be inefficient. It is not like the function call itself ought to be expensive.

hkratz · 2021-04-27T19:01:37Z

I find it surprising that call to memcpy would be inefficient. It is not like the function call itself ought to be expensive.

That was a decision I made early on when I was also testing it on inputs < 64 bytes. I will benchmark it again against the memcpy call and throw the custom memcpy out if there is no clear win.

…t worth the added i-cache consumption

hkratz marked this pull request as draft April 27, 2021 16:19

hkratz marked this pull request as ready for review April 28, 2021 16:02

hkratz marked this pull request as draft April 28, 2021 16:02

hkratz added 2 commits April 30, 2021 09:16

todo

83ecb55

optimize internal memcpy impl.

9f00561

hkratz force-pushed the opt-memcpy branch from efb28d7 to 9f00561 Compare April 30, 2021 07:22

remove custom memcpy, benefits are minimally over noise threshold, no…

1934deb

…t worth the added i-cache consumption

hkratz marked this pull request as ready for review April 30, 2021 07:53

hkratz changed the title ~~Optimize internal memcpy impl. for < 64 bytes case~~ Remove internal memcpy implementation Apr 30, 2021

hkratz merged commit e99f60d into main Apr 30, 2021

hkratz deleted the opt-memcpy branch April 30, 2021 12:49

hkratz restored the opt-memcpy branch April 30, 2021 12:49

hkratz deleted the opt-memcpy branch April 30, 2021 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove internal memcpy implementation #29

Remove internal memcpy implementation #29

hkratz commented Apr 27, 2021 •

edited

lemire commented Apr 27, 2021

hkratz commented Apr 27, 2021

Remove internal memcpy implementation #29

Remove internal memcpy implementation #29

Conversation

hkratz commented Apr 27, 2021 • edited

lemire commented Apr 27, 2021

hkratz commented Apr 27, 2021

hkratz commented Apr 27, 2021 •

edited