-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Power8 VSX optimizations for core module #9763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@seiko2plus https://github.com/barkovv/opencv/tree/vsx Anyway I would like to list the clang-specific problems here (if you have plans to adapt your code for clang):
Have you got floating point comparation errors in hal_intrin.float32x4 & hal_intrin.float64x2 tests? ( 1.0 != 0.99998....) Nice work |
@seiko2plus , can you provide an instruction describing how to build OpenCV for powerpc in Ubuntu x86_64? It would be great to have a cmake toolchain file similar to those we have for ARM (https://github.com/opencv/opencv/tree/master/platforms/linux). |
Sorry for the delayed response,
No but I think this error caused by vec_rsqrte, you should use vec_rsqrt instead for accuracy
I don't know what I'm supposed to say but I didn't mean to disappoint you.
I'm using power8 vm provided by osuosl, I didn't try to build opencv for power8 in GNU/Linux x86_64 before, but I'm willing to give it a try and maybe I would be able to make cmake toolchain for powerpc |
3958e32
to
772fbb1
Compare
@seiko2plus, thank you very much, good support for another architecture is a very valuable addition to the library! I have one (but relatively big) request regarding this patch. As I see, besides providing another implementation of universal intrinsics, there is also a bunch of other functions, implementing directly in VSX intrinsics. I wonder if you could convert the corresponding loops to universal intrinsics (and then in most cases SSE/NEON branches can be deleted)? The reason is simple. VSX branches will be much less tested than SSE/NEON, and we ourselves could not test or even build this code on our test machines. If you convert those branches to universal intrinsics then this code is much better tested in real life. |
@vpisarev I agree with you, I removed VSX raw from the branch and going to replace it later with universal intrinsics in separate branches but that may limit VSX abilities and affect the desirable performance, so not everything is going to implement by universal intrinsics. Recently I implemented a toolchain for powerpc using "IBM Advance Toolchain" because of these reasons that you already mentioned and I'm going to release it once I finish testing. |
great! thank you once again! 👍 |
Power8 VSX optimizations for core module
Issue:
PowerPC Power8 VSX SIMD optimizations #7207
Architecture:
Power8 PPC64LE (64 bit PowerPC Little Endian mode) and above
What about Big-endian support?
Big-endian is basically dead on modern Power and I'm not sure about adding a support for it , maybe I should ! IDK.
Tested Compilers:
GCC (4.9.2, 5.4.1, 6.4.0, 7.2.0)
CLANG(4.0.1-6, 6.0.0-svn310776-1)
Missings:
Why there's no support for Half-precision?
there's no VSX instructions for Half-precision convert in Power8 ISA but Power9 has it and I'm planning to add it as feature.