Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't we need "vzeroupper" after call avx codes? #24

Closed
templexxx opened this issue Aug 26, 2017 · 3 comments
Closed

don't we need "vzeroupper" after call avx codes? #24

templexxx opened this issue Aug 26, 2017 · 3 comments

Comments

@templexxx
Copy link

This instruction is recommended when transitioning between AVX and legacy SSE code - it will eliminate performance penalties caused by false dependencies.

but I can't find vzeroupper anywhere

maybe I should do vzeroupper myself?

@gbtucker
Copy link
Contributor

If your code is switching between AVX and SSE code chronically then yes it could help. You can measure the number of transitions with tools such as sde and estimate what penalty you are getting.

@templexxx
Copy link
Author

@gbtucker
thank you for your replay and sde is a really cool tool

After call ISA-L, maybe I would call to libraries that include SSE code, maybe not. So I think for avoiding potential AVX-SSE penalties,should I add VZEROUPPER at the end of any function that uses 256-bit AVX instructions? In the 11.3.1 "Mixing Intel® AVX and Intel SSE in Function Calls" , I found “Assembly/Compiler Coding Rule 71”, it said "Add VZEROUPPER instruction after 256-bit AVX instructions are executed and before any function call that might execute SSE code. Add VZEROUPPER at the end of any function that uses 256-bit AVX instructions."

And the manual also sadi "This instruction has zero latency." in 11.3

So maybe the VZEROUPPER throughput is slow,I found "VZEROUPPER instruction throughput is slow, and is not recommended to preface a transition to AVX code after SEE code execution. The throughput of VZEROALL is also slow. Using either the VZEROUPPER or the VZEROALL instruction is likely to result in performance loss." in 15.2.7.1. But it's for "KNIGHTS LANDING MICROARCHITECTURE AND SOFTWARE OPTIMIZATION", I'm not sure it will happen on the other microarchitectures?

@gbtucker
Copy link
Contributor

gbtucker commented Sep 11, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants