-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't we need "vzeroupper" after call avx codes? #24
Comments
If your code is switching between AVX and SSE code chronically then yes it could help. You can measure the number of transitions with tools such as sde and estimate what penalty you are getting. |
@gbtucker After call ISA-L, maybe I would call to libraries that include SSE code, maybe not. So I think for avoiding potential AVX-SSE penalties,should I add VZEROUPPER at the end of any function that uses 256-bit AVX instructions? In the 11.3.1 "Mixing Intel® AVX and Intel SSE in Function Calls" , I found “Assembly/Compiler Coding Rule 71”, it said "Add VZEROUPPER instruction after 256-bit AVX instructions are executed and before any function call that might execute SSE code. Add VZEROUPPER at the end of any function that uses 256-bit AVX instructions." And the manual also sadi "This instruction has zero latency." in 11.3 So maybe the VZEROUPPER throughput is slow,I found "VZEROUPPER instruction throughput is slow, and is not recommended to preface a transition to AVX code after SEE code execution. The throughput of VZEROALL is also slow. Using either the VZEROUPPER or the VZEROALL instruction is likely to result in performance loss." in 15.2.7.1. But it's for "KNIGHTS LANDING MICROARCHITECTURE AND SOFTWARE OPTIMIZATION", I'm not sure it will happen on the other microarchitectures? |
It's true it is not as much of an issue on newer architecture. We avoided
putting on all functions as default and haven't seen significant issues
from conflicts.
…On Sun, Sep 10, 2017 at 11:29 PM, Temple3x ***@***.***> wrote:
@gbtucker <https://github.com/gbtucker>
In OPTIMIZATION manual 11.3 :
"In Skylake microarchitecture, the SSE block can executed from a Clean
state without the penalty of upper-bits dependency and blend operation"
Does it means we don't need vzeroupper when we use a CPU in Skylake
microarchitecture?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AClffTRR6o2WNUzcK2cF9hB-VG5em2RMks5shNNHgaJpZM4PDadV>
.
|
This instruction is recommended when transitioning between AVX and legacy SSE code - it will eliminate performance penalties caused by false dependencies.
but I can't find vzeroupper anywhere
maybe I should do vzeroupper myself?
The text was updated successfully, but these errors were encountered: