OE-27 "Wide Universal Intrinsics" discussion #11022

vpisarev · 2018-03-06T15:21:23Z

the feature request about evolution proposal OE-27

vpisarev · 2018-03-07T11:40:40Z

@seiko2plus, I've added link to your #10708

seiko2plus · 2018-03-08T02:49:29Z

@vpisarev, #10708 going to mapping all universal intrinsics to avx2, I also made some changes to make it friendly with OE-27.

* core:OE-27 prepare universal intrinsics to expand (#11022) * core:OE-27 prepare universal intrinsics to expand (#11022) * core: Add universal intrinsics for AVX2 * updated implementation of wide univ. intrinsics; converted several OpenCV HAL functions: sqrt, invsqrt, magnitude, phase, exp to the wide universal intrinsics. * converted log to universal intrinsics; cleaned up the code a bit; added v_lut_deinterleave intrinsics. * core: Add universal intrinsics for AVX2 * fixed multiple compile errors * fixed many more compile errors and hopefully some test failures * fixed some more compile errors * temporarily disabled IPP to debug exp & log; hopefully fixed Doxygen complains * fixed some more compile errors * fixed v_store(short*, v_float16&) signatures * trying to fix the test failures on Linux * fixed some issues found by alalek * restored IPP optimization after the patch with AVX wide intrinsics has been properly tested * restored IPP optimization after the patch with AVX wide intrinsics has been properly tested

pemmanuelviel · 2020-07-03T23:10:23Z

@vpisarev I plan to port in OpenCV repo my assembly SSEx and intrinsic AVX-2 implementations of some distances for FLANN.
Are the wide universal instrinsics fully functional, in particular the ones for CV_SIMD256? Is there another doc than the one for "not wide" universal intrinsics? Thanks

alalek · 2020-07-03T23:16:21Z

@pemmanuelviel There is page about universal intrinsics in OpenCV Documentation.

pemmanuelviel · 2020-07-04T08:50:13Z

@alalek Thank you for the link. This is the doc for the "not-wide" universal intrinsics I was mentioning.
But as I didn't see details on the 256 avx registers I would like to know if there is any doc on "wide" universal intrinsics, as well as if they are fully functional.
I would prefer porting directly on 256 bits wide universal intrinsics equivalent of AVX-2, than the equivalents of SSEx intrinsics. Going from SSEx intrinsics to AVX-x intrinsics on Intel architectures is not only a mater of registers size. Actually as both SSE-x and AVX-x intrinsics are implemented in the way
C = A Op B,
the assembly instructions for SSE-x mostly work only with two registers and take the form
A = A Op B
Having a single SSE-x intrinsic instruction mapping a sequence of several assembly instructions might be the reason why the performance difference between SIMD assembly and intrinsic code is quite noticeable with SSE-x.

terfendail · 2020-07-09T16:08:28Z

Wide universal intrinsics are implemented for AVX2 and AVX512 architectures and are already used in core and impgproc modules.

Unfortunately there is no special documentation for wide universal intrinsics. However they were implemented in accordance with OE-27

Actually there are just a few changes to universal intrinsics idea:

vector types don't contain length anymore (e.g. v_uint8 instead of v_uint8x16)
intrinsics name should start with vx_ if it's impossible to deduce vector size from input values(e.g. vx_load instead of v_load, but v_fma retain the same name)
address evaluation, loop steps etc MUST use type::nlanes( e.g. v_uint8::nlanes) instead of explicit vector length value

WUI always use the most wide vector size available for selected instruction set(i.e. if AVX512 support is enabled vector length for WUI will be 512-bit)

vpisarev added the evolution label Mar 6, 2018

vpisarev added this to the 4.0 milestone Mar 6, 2018

seiko2plus added a commit to seiko2plus/opencv that referenced this issue Mar 8, 2018

core:OE-27 prepare universal intrinsics to expand (opencv#11022)

7ba3459

seiko2plus mentioned this issue Mar 8, 2018

REF: Remove raw intrinsics from arithmetic operations #10708

Closed

8 tasks

kinchungwong mentioned this issue Apr 5, 2018

Remove integer-only restriction from universal intrinsic v_extract. Affects SSE2 only. #11242

Closed

vpisarev pushed a commit to vpisarev/opencv that referenced this issue Jul 11, 2018

core:OE-27 prepare universal intrinsics to expand (opencv#11022)

6c9e1a7

vpisarev mentioned this issue Jul 12, 2018

Wide univ intrinsics #11953

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OE-27 "Wide Universal Intrinsics" discussion #11022

OE-27 "Wide Universal Intrinsics" discussion #11022

vpisarev commented Mar 6, 2018

vpisarev commented Mar 7, 2018

seiko2plus commented Mar 8, 2018

pemmanuelviel commented Jul 3, 2020

alalek commented Jul 3, 2020

pemmanuelviel commented Jul 4, 2020

terfendail commented Jul 9, 2020

OE-27 "Wide Universal Intrinsics" discussion #11022

OE-27 "Wide Universal Intrinsics" discussion #11022

Comments

vpisarev commented Mar 6, 2018

vpisarev commented Mar 7, 2018

seiko2plus commented Mar 8, 2018

pemmanuelviel commented Jul 3, 2020

alalek commented Jul 3, 2020

pemmanuelviel commented Jul 4, 2020

terfendail commented Jul 9, 2020