-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCC 10.2.1 Results #6
Comments
The benchmark is not quite correctly implemented for the following reasons:
|
Bottomline: the library is probably Ok but the benchmark is a nonsense. |
Just post the right benchmark code ? |
another variable here is that it's a false assumption (at least one i had myself) that standard libraries aren't using vector instructions. I read some of the libc source code, and they use handwritten AVX2 for memcpy, memcmp and a few others when the architecture supports it. And i tested this on a machine that maxed at AVX2 instructions. So that could easily explain these results. (And they had comments in there that they don't implement AVX512 because they've experimented and determined that the frequency downgrade is detrimental to overall application performance.) also even if the benchmark might not be ideal, it's still legitimate and shows that head to head performance it at least one subset of all possible implementations (whether it captures a realistic pattern or not idk?). but ya what @zhanglistar basically said, we'd all love to flip the tables on libc again! |
I have run ClickHouse performance test and can confirm that glibc's memcpy is better than FastMemcpy (at least on one machine): Although the mean difference is very small: 0.5%. Maximum speedup (that I'm confident) is about 16% on the following query: We have to continue using custom memcpy instead of glibc's to maintain compatibility with old glibc. |
gcc version 10.2.1 20201007 releases/gcc-10.2.0-350-g136256c32d (Clear Linux OS for Intel Architecture)
The text was updated successfully, but these errors were encountered: