Power8 VSX optimizations for core module #9763

seiko2plus · 2017-10-02T23:15:00Z

Power8 VSX optimizations for core module

Issue:
PowerPC Power8 VSX SIMD optimizations #7207

Architecture:
Power8 PPC64LE (64 bit PowerPC Little Endian mode) and above

What about Big-endian support?
Big-endian is basically dead on modern Power and I'm not sure about adding a support for it , maybe I should ! IDK.

Tested Compilers:
GCC (4.9.2, 5.4.1, 6.4.0, 7.2.0)
CLANG(4.0.1-6, 6.0.0-svn310776-1)

Missings:
Why there's no support for Half-precision?
there's no VSX instructions for Half-precision convert in Power8 ISA but Power9 has it and I'm planning to add it as feature.

barkovv · 2017-10-03T10:47:54Z

@seiko2plus
It's funny but I've worked to this issue to and my code fails to compile with GCC or XLC (clang-4 is fine). Should we combine code to get compiler independent code?

https://github.com/barkovv/opencv/tree/vsx
My code passes "opencv_test_core" tests. Compiled with clang-4.

Anyway I would like to list the clang-specific problems here (if you have plans to adapt your code for clang):

clang-5 has regression with vec_xl (load from unaligned adress). This intrin maps to vec_ld for some reason. clang-4 is fine here.
inline asm with VSX arguments ("wa", "wd", "wf" arguments) is buggy (value of result register isn't stored in __vector ) (this bug exists either in clang-4 or clang-5)
all uint64 related intrins from <altivec.h> (I mean vector unsigned long long and vector signed long long) want "unsigned long long" and "signed long long" arguments while uint64 maps to "unsigned long". There is no silent conversion from uint64 to "unsigned long" so all calls are ambious.

Have you got floating point comparation errors in hal_intrin.float32x4 & hal_intrin.float64x2 tests? ( 1.0 != 0.99998....)

Nice work

mshabunin · 2017-10-03T14:55:50Z

@seiko2plus , can you provide an instruction describing how to build OpenCV for powerpc in Ubuntu x86_64? It would be great to have a cmake toolchain file similar to those we have for ARM (https://github.com/opencv/opencv/tree/master/platforms/linux).

seiko2plus · 2017-10-06T03:52:13Z

Sorry for the delayed response,
@barkovv During working on supports clang , I found these issues and many more
I almost covered all these issues and solutions in vsx_utils.hpp

Have you got floating point comparation errors in hal_intrin.float32x4 & hal_intrin.float64x2 tests? ( 1.0 != 0.99998....)

No but I think this error caused by vec_rsqrte, you should use vec_rsqrt instead for accuracy

It's funny but I've worked to this issue to and my code fails to compile with GCC or XLC (clang-4 is fine). Should we combine code to get compiler independent code?

I don't know what I'm supposed to say but I didn't mean to disappoint you.
I looked at your code, you did well too but it would be better to look at the vsx_utils.hpp and check how I worked around clang issues.

@mshabunin

can you provide an instruction describing how to build OpenCV for powerpc in Ubuntu x86_64?
It would be great to have a cmake toolchain file similar to those we have for ARM (https://github.com/opencv/opencv/tree/master/platforms/linux).

I'm using power8 vm provided by osuosl, I didn't try to build opencv for power8 in GNU/Linux x86_64 before, but I'm willing to give it a try and maybe I would be able to make cmake toolchain for powerpc

vpisarev · 2017-10-09T13:33:30Z

@seiko2plus, thank you very much, good support for another architecture is a very valuable addition to the library! I have one (but relatively big) request regarding this patch. As I see, besides providing another implementation of universal intrinsics, there is also a bunch of other functions, implementing directly in VSX intrinsics. I wonder if you could convert the corresponding loops to universal intrinsics (and then in most cases SSE/NEON branches can be deleted)? The reason is simple. VSX branches will be much less tested than SSE/NEON, and we ourselves could not test or even build this code on our test machines. If you convert those branches to universal intrinsics then this code is much better tested in real life.

seiko2plus · 2017-10-09T20:49:30Z

@vpisarev I agree with you, I removed VSX raw from the branch and going to replace it later with universal intrinsics in separate branches but that may limit VSX abilities and affect the desirable performance, so not everything is going to implement by universal intrinsics.

Recently I implemented a toolchain for powerpc using "IBM Advance Toolchain" because of these reasons that you already mentioned and I'm going to release it once I finish testing.

vpisarev · 2017-10-10T10:02:41Z

great! thank you once again! 👍

seiko2plus changed the title ~~Add vsx core~~ Power8 VSX optimizations for core module Oct 3, 2017

barkovv mentioned this pull request Oct 7, 2017

core: Increased maximum abs_error for hal_intrin.float* tests #9758

Closed

seiko2plus added 2 commits October 9, 2017 00:32

Added support for VSX

d077778

Added universal intrinsic for VSX

4b968d1

seiko2plus force-pushed the addVsxCore branch 2 times, most recently from 3958e32 to 772fbb1 Compare October 9, 2017 10:24

vpisarev self-assigned this Oct 9, 2017

seiko2plus force-pushed the addVsxCore branch from 772fbb1 to 4b968d1 Compare October 9, 2017 20:49

opencv-pushbot merged commit 4b968d1 into opencv:master Oct 10, 2017

opencv-pushbot pushed a commit that referenced this pull request Oct 10, 2017

Merge pull request #9763 from seiko2plus:addVsxCore

7d55c09

seiko2plus mentioned this pull request Dec 22, 2017

Optimize 2 opencv 2 files (merge.cpp/split.cpp) with PowerPC VSX intrinsics. #10371

Closed

vpisarev mentioned this pull request Aug 6, 2018

[$] PowerPC Power8 VSX SIMD optimizations #7207

Closed

alalek mentioned this pull request Aug 30, 2019

Added MSA implementations for mips platforms. #15422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Power8 VSX optimizations for core module #9763

Power8 VSX optimizations for core module #9763

Uh oh!

seiko2plus commented Oct 2, 2017 •

edited

Loading

Uh oh!

barkovv commented Oct 3, 2017 •

edited

Loading

Uh oh!

mshabunin commented Oct 3, 2017

Uh oh!

seiko2plus commented Oct 6, 2017 •

edited

Loading

Uh oh!

vpisarev commented Oct 9, 2017

Uh oh!

seiko2plus commented Oct 9, 2017

Uh oh!

vpisarev commented Oct 10, 2017

Uh oh!

Uh oh!

Uh oh!

Power8 VSX optimizations for core module #9763

Power8 VSX optimizations for core module #9763

Uh oh!

Conversation

seiko2plus commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!