Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX v256_cleanup() fixed #12964

Merged
merged 1 commit into from
Oct 29, 2018
Merged

Conversation

savuor
Copy link
Contributor

@savuor savuor commented Oct 26, 2018

This pullrequest changes

_mm256_zeroupper is replaced by _mm256_zeroall

This should fix an incorrect behavior of vx_cleanup on AVX+ baselines on some compilers (for example GCC 7.3.0, Ubuntu 18.04).

Details

_mm256_zeroupper unpredictably spoils local variables even when there's no other intrinsics used in the code.
_mm256_zeroall doesn't cause such errors at the same time having similar purpose and performance.

@alalek
Copy link
Member

alalek commented Oct 26, 2018

https://software.intel.com/en-us/articles/avoiding-avx-sse-transition-penalties

  1. Problem is observed with AVX2 baseline only.
  2. There are many of these vx_cleanup() calls in the middle of functions. This doesn't look correct, make no sense and not working.
  3. Also "transitions" should be handled before AVX code too: SSE->AVX switch.

I believe we should follow these rules instead of blind hacking:

  1. Any "exported" OpenCV function should make transitions calls before and after executing any code (CPU_BASELINE is AVX+). Ideal place for that is CV_INSTRUMENT_REGION(); macro.
  2. AVX+ "dispatched" OpenCV functions should make transitions calls before and after executing any code (CPU_BASELINE is SSE*). Partially handled by __CV_AVX_GUARD macro (via CV_INSTRUMENT_REGION() - but it is missing in new updates). Perhaps this should be moved into CV_CPU_CALL_*() macros for dispatched calls.
  3. Inline functions/operators should not call this at all.

@vpisarev
Copy link
Contributor

@alalek, vx_cleanup() should be much cheaper (maybe even much cheaper) than CV_INSTRUMENT_REGION() call. So, I suggest to use vx_cleanup() explicitly where it's needed. Of course, CV_INSTRUMENT_REGION() may be changed as well, to play on the safe side. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants