[SVE] Add SVE support to DFT #182

shibatch · 2018-03-27T04:13:37Z

This patch adds SVE support to the DFT library.

The DFT library strongly depends on the vector length, and thus it does not mix well with vector-length agnosticism. In this patch, the utilized vector length can be specified by CONFIG macro in helpersve.h. The DFT subroutine uses a dispatcher to choose the kernel subroutine from those compiled for 2048, 1024, 512 and 256-bit vector lengths.

fpetrogalli

Please update the changelog file with a new "Next Release" section, adding the information that you have inserted DFT for SVE, listing for which vector lengths you have done it. Thank you!

fpetrogalli · 2018-04-23T14:48:11Z

Please also add a paste of the SVE runs with armie of the tests related to SVE DFT.

shibatch · 2018-04-24T16:01:57Z

Below is the output of testing programs. SVE is not selected when testing DFT with the current setting, since the vector length is specified to 128 bit. I am planning to add better testing of DFT subroutines.

[uduki]~/work/sleef3/sleef-Add_SVE_support_to_DFT2/build-aarch64$ ctest -j 6
Test project /import/namihei.naist.jp/home/n-sibata/work/sleef3/sleef-Add_SVE_support_to_DFT2/build-aarch64
      Start  1: iut
      Start  2: iutadvsimd
      Start  3: iutsve
      Start  4: gnuabi_compatibility_ADVSIMD
      Start  5: gnuabi_compatibility_SVE
      Start  6: gnuabi_compatibility_SVE_masked
 1/18 Test  #4: gnuabi_compatibility_ADVSIMD ......   Passed    0.04 sec
 2/18 Test  #6: gnuabi_compatibility_SVE_masked ...   Passed    0.02 sec
 3/18 Test  #5: gnuabi_compatibility_SVE ..........   Passed    0.03 sec
      Start  7: naivetestdp_1
      Start  8: naivetestdp_2
      Start  9: naivetestdp_3
 4/18 Test  #7: naivetestdp_1 .....................   Passed    0.61 sec
      Start 10: naivetestdp_4
 5/18 Test  #8: naivetestdp_2 .....................   Passed    1.22 sec
 6/18 Test  #9: naivetestdp_3 .....................   Passed    1.21 sec
      Start 11: naivetestdp_5
      Start 12: naivetestdp_10
 7/18 Test #10: naivetestdp_4 .....................   Passed    1.02 sec
      Start 13: naivetestsp_1
 8/18 Test #11: naivetestdp_5 .....................   Passed    1.13 sec
 9/18 Test #13: naivetestsp_1 .....................   Passed    0.41 sec
      Start 14: naivetestsp_2
      Start 15: naivetestsp_3
10/18 Test #15: naivetestsp_3 .....................   Passed    1.11 sec
      Start 16: naivetestsp_4
11/18 Test #14: naivetestsp_2 .....................   Passed    1.14 sec
      Start 17: naivetestsp_5
12/18 Test #16: naivetestsp_4 .....................   Passed    0.95 sec
      Start 18: naivetestsp_10
13/18 Test #17: naivetestsp_5 .....................   Passed    1.16 sec
14/18 Test #12: naivetestdp_10 ....................   Passed    6.34 sec
15/18 Test #18: naivetestsp_10 ....................   Passed    5.64 sec
16/18 Test  #1: iut ...............................   Passed  464.64 sec
17/18 Test  #2: iutadvsimd ........................   Passed  501.22 sec
18/18 Test  #3: iutsve ............................   Passed  1106.60 sec

100% tests passed, 0 tests failed out of 18

Total Test time (real) = 1106.70 sec
[uduki]~/work/sleef3/sleef-Add_SVE_support_to_DFT2/build-aarch64$ armie -s -msve-vector-bits=2048 -- bin/naivetestdp 10
Path(random) :6(0) 4(3)
ISA : AArch64 SVE 2048-bit 2048 bit double
complex  forward   : OK
complex  backward  : OK
real     forward   : OK
real     backward  : OK
real alt forward   : OK
real alt backward  : OK

0x7fa1786e8c: 0x04e0e3e8 6
...

0x7fa198aa8c: 0x043f51bf 6
armie exiting after executing 13968 SVE instructions at a rate of 1.14 per fault.
[uduki]~/work/sleef3/sleef-Add_SVE_support_to_DFT2/build-aarch64$ armie -s -msve-vector-bits=256 -- bin/naivetestdp 10
Path(random) :5(3) 3(0) 2(3)
ISA : AArch64 SVE 256-bit 256 bit double
complex  forward   : OK
complex  backward  : OK
real     forward   : OK
real     backward  : OK
real alt forward   : OK
real alt backward  : OK

0x7fa301cc90: 0x2518e140 1
...
0x7fa31de8b4: 0x04e0e3e8 6
armie exiting after executing 122301 SVE instructions at a rate of 2.09 per fault.
[uduki]~/work/sleef3/sleef-Add_SVE_support_to_DFT2/build-aarch64$

fpetrogalli · 2018-04-24T19:55:12Z

CHANGELOG.md

@@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

+## Next Release
+- SVE target support is added to libm.


please mention these two items under section ### Added, as requested by the "keep a changelog" format

fpetrogalli · 2018-04-24T20:02:15Z

src/arch/helpersve.h

+#elif CONFIG == 8
+// 256-bit vector length
+#define ISANAME "AArch64 SVE 256-bit"
+#define LOG2VECTLENDP 2


This is misleading to me. LOG2VECTLENDP is not the log-size of the vectors, but of the partial vectors you are using. I think you should add a comment saying that these VLENs are used for the DFT of the partial vectors.

fpetrogalli · 2018-04-24T20:04:15Z

src/arch/helpersve.h

+
+#ifdef LOG2VECTLENDP
+#define LOG2VECTLENSP (LOG2VECTLENDP+1)
+#define VECTLENDP (1 << LOG2VECTLENDP)


I want to make also that the sure that the VECTLENDP = svcntd() and VECTLENDP = svcntw() don't get overwritten when building libsleef and libsleefgnuabi. Could you please raise an #error at this point if any of VECTLENDP or VECTLENSP are already defined at this point?

fpetrogalli · 2018-04-24T20:16:44Z

src/arch/helpersve.h

+// Operations for DFT
+
+static INLINE vdouble vposneg_vd_vd(vdouble d) {
+  vmask pnmask = svreinterpret_s32_u64(svlsl_n_u64_x(ptrue, svindex_u64(0, 1), 63));


I understand you are creating a positive/negative patterns to even/odd lanes here. Any chance you could avoid using vmask for these operations (and the v*subadd operation) and use native predication by building the repeated predicate patterns withDUPQ?

See 6.21.4.4 of https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf

Something like:

vsubadd (x,y) = vadd(ptrue, vadd(dupq(true,false), x, y), vsub(dupq(false,true), x, y)

I think that's not a good idea since it includes three (or two) FP add operations, and FP operations are considered to be expensive and slow. Those operations are also dependent of the output by the previous instruction. We assume that the ALUs for the unmasked elements are not used, but that might not be the case since power-gating may take some time to kick in.

I came up with a good way to remove vmask without additional FP operation.
This should reduce register pressure.

static INLINE vdouble vposneg_vd_vd(vdouble d) { return svneg_f64_m(d, svdupq_n_b64(false, true), d); }

* An error will be generated if VECTLENDP or VECTLENSP are redefined at helpersve.h * Added explanation of the meaning of VECTLENDP and VECTLENSP to helpersve.h.

no message

784817a

shibatch requested a review from fpetrogalli March 27, 2018 04:42

shibatch added 3 commits March 28, 2018 12:51

Small optimization

24428a8

Another small optimiztion

c8ca3bf

Further optimization

99b3ded

fpetrogalli requested changes Apr 23, 2018

View reviewed changes

Updated changelog

45651fb

fpetrogalli requested changes Apr 24, 2018

View reviewed changes

shibatch added 2 commits April 25, 2018 12:08

* Updated CHANGELOG.md.

09b4cbc

* An error will be generated if VECTLENDP or VECTLENSP are redefined at helpersve.h * Added explanation of the meaning of VECTLENDP and VECTLENSP to helpersve.h.

no message

48b85ea

fpetrogalli approved these changes Apr 27, 2018

View reviewed changes

shibatch merged commit 09fecaa into master Apr 27, 2018

shibatch deleted the Add_SVE_support_to_DFT2 branch April 27, 2018 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SVE] Add SVE support to DFT #182

[SVE] Add SVE support to DFT #182

shibatch commented Mar 27, 2018

fpetrogalli left a comment

fpetrogalli commented Apr 23, 2018

shibatch commented Apr 24, 2018

fpetrogalli Apr 24, 2018

fpetrogalli Apr 24, 2018

fpetrogalli Apr 24, 2018

fpetrogalli Apr 24, 2018

shibatch Apr 25, 2018

shibatch Apr 25, 2018

[SVE] Add SVE support to DFT #182

[SVE] Add SVE support to DFT #182

Conversation

shibatch commented Mar 27, 2018

fpetrogalli left a comment

Choose a reason for hiding this comment

fpetrogalli commented Apr 23, 2018

shibatch commented Apr 24, 2018

fpetrogalli Apr 24, 2018

Choose a reason for hiding this comment

fpetrogalli Apr 24, 2018

Choose a reason for hiding this comment

fpetrogalli Apr 24, 2018

Choose a reason for hiding this comment

fpetrogalli Apr 24, 2018

Choose a reason for hiding this comment

shibatch Apr 25, 2018

Choose a reason for hiding this comment

shibatch Apr 25, 2018

Choose a reason for hiding this comment