-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime SSE4.2 detection: #20
Conversation
Detect presence of SSE4.2 as early as possible and choose between SSE and non-SSE string hashing functions at runtime. This allows to build luajit without -msse4.2, but still get a performace gain on supported hardware.
What motivate this change? SSE 4.1 is 10+ years old. I thought most intel-CPU based (including Atom) servers are supporting SSE4.1 I personally deeply hate inline asm for couple of reasons:
Being portable is not a matter of black-and-white, it's matter of degree. Although the original code isn't very portable, I would say it's more portable than your change... While I used to make a living by playing with asm instruction, it does not prevent me to view inline-asm as an eyesore .... I guess maybe we can avoid using inline-asm by multi-versioning the hash function. The concept is sketched bellow: if is_sse41_supported() The hash_sse41_func() is compiled with sse4.1 support, while the rest src code don't. I think they have no problem to be lined together (as long as you are not using O4/LTO optimization). In short, I personally hate the idea of inline-asm. I would leave the trade-off |
Keywords being: most, intel and servers. Also, we need SSE4.2, not 4.1. My dev computer is AMD-based without SSE4.2 support, and, looking at original issue - i'm not alone.
And yeah, you right: proper way would be to separate sse4.2-optimized hashing into stand-alone source (not header like now) file. And build it and only it with -msse4.2 and only on x86. |
I have some improvements for this (e.g. using gcc intrinsics instead of asm), but sadly my test box with sse4.2 is down atm, so it'll have to wait. |
Okay, sorry for the delay. I've revorked runtime detection to be closer to the original code and to use gcc intrinsics. |
BTW, i've noticed pull-request for arm crc support. We can use the same technique there (just set an additional flag for arm >= 8.1) |
It would be nice to at least hear some feedback on this. |
It's on our TODO list. Sorry for the delay. @thibaultcha will you follow up this PR please? Thanks! |
@isage I really appreciate your work on this and this is indeed very important. I really like to get this merged as soon as we can manage. Hopefully the next OpenResty release can have this. |
If you look at the review pages on NewEgg for the old AMD Phenom chips, you'll see we aren't the only people who are still using these after nearly a decade. Phenoms are the microprocessor equivalent of a 1995 Toyota Corolla with 300K miles on the odometer. This change will be valuable to many developers who use old equipment, and I hope it makes it into the mainline. |
Yeah, sure. It will be part of the next next OpenResty release for sure. The next OpenResty release's merge window is now already closed. We'll get OpenResty 1.15.8.1 out first, in the next couple of weeks. |
@siddhesh It'll be great if you can have a look at this one too :) Thanks! |
I think the right way to do this would be to always build
|
@siddhesh Do you mind creating a separate PR to incorporate all the changes you proposed? Many thanks! |
Obsoleted by PR #75. |
Hooray! |
As per discussion in #39
Detect presence of SSE4.2 as early as possible
and choose between SSE and non-SSE string hashing functions at runtime.
This allows to build luajit without -msse4.2, but still get a performace gain
on supported hardware.
(benchmark.cxx is currently broken, and travis.yml needs changing too)