Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove internal memcpy implementation #29

Merged
merged 3 commits into from Apr 30, 2021
Merged

Remove internal memcpy implementation #29

merged 3 commits into from Apr 30, 2021

Conversation

hkratz
Copy link
Contributor

@hkratz hkratz commented Apr 27, 2021

LLVM is unrolling it for the 128+ bytes case which never happens. The speedup with this patch is negligable but code size is reduced quite a bit.

It is a shame that all attempts to do this the Rust way (using copy_from_slice() or read_unaligned()/write_unaligned()) have resulted in non-inlined calls to memcpy with noticable slowdown.

This implementation is optimally auto-vectorized and inlined. The compiler can even prove that the len is less < 32 at the first call site and optimizes that check away.

@lemire
Copy link
Contributor

lemire commented Apr 27, 2021

non-inlined calls to memcpy with noticable slowdown

I find it surprising that call to memcpy would be inefficient. It is not like the function call itself ought to be expensive.

@hkratz hkratz marked this pull request as draft April 27, 2021 16:19
@hkratz
Copy link
Contributor Author

hkratz commented Apr 27, 2021

I find it surprising that call to memcpy would be inefficient. It is not like the function call itself ought to be expensive.

That was a decision I made early on when I was also testing it on inputs < 64 bytes. I will benchmark it again against the memcpy call and throw the custom memcpy out if there is no clear win.

@hkratz hkratz marked this pull request as ready for review April 28, 2021 16:02
@hkratz hkratz marked this pull request as draft April 28, 2021 16:02
@hkratz hkratz marked this pull request as ready for review April 30, 2021 07:53
@hkratz hkratz changed the title Optimize internal memcpy impl. for < 64 bytes case Remove internal memcpy implementation Apr 30, 2021
@hkratz hkratz merged commit e99f60d into main Apr 30, 2021
@hkratz hkratz deleted the opt-memcpy branch April 30, 2021 12:49
@hkratz hkratz restored the opt-memcpy branch April 30, 2021 12:49
@hkratz hkratz deleted the opt-memcpy branch April 30, 2021 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants