[SVE] Add SVE support to DFT #182
Conversation
Please update the changelog file with a new "Next Release" section, adding the information that you have inserted DFT for SVE, listing for which vector lengths you have done it. Thank you! |
Please also add a paste of the SVE runs with armie of the tests related to SVE DFT. |
Below is the output of testing programs. SVE is not selected when testing DFT with the current setting, since the vector length is specified to 128 bit. I am planning to add better testing of DFT subroutines.
|
@@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file. | |||
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | |||
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). | |||
|
|||
## Next Release | |||
- SVE target support is added to libm. |
fpetrogalli
Apr 24, 2018
Collaborator
please mention these two items under section ### Added
, as requested by the "keep a changelog" format
please mention these two items under section ### Added
, as requested by the "keep a changelog" format
#elif CONFIG == 8 | ||
// 256-bit vector length | ||
#define ISANAME "AArch64 SVE 256-bit" | ||
#define LOG2VECTLENDP 2 |
fpetrogalli
Apr 24, 2018
Collaborator
This is misleading to me. LOG2VECTLENDP
is not the log-size of the vectors, but of the partial vectors you are using. I think you should add a comment saying that these VLENs are used for the DFT of the partial vectors.
This is misleading to me. LOG2VECTLENDP
is not the log-size of the vectors, but of the partial vectors you are using. I think you should add a comment saying that these VLENs are used for the DFT of the partial vectors.
|
||
#ifdef LOG2VECTLENDP | ||
#define LOG2VECTLENSP (LOG2VECTLENDP+1) | ||
#define VECTLENDP (1 << LOG2VECTLENDP) |
fpetrogalli
Apr 24, 2018
Collaborator
I want to make also that the sure that the VECTLENDP = svcntd()
and VECTLENDP = svcntw()
don't get overwritten when building libsleef and libsleefgnuabi. Could you please raise an #error
at this point if any of VECTLENDP
or VECTLENSP
are already defined at this point?
I want to make also that the sure that the VECTLENDP = svcntd()
and VECTLENDP = svcntw()
don't get overwritten when building libsleef and libsleefgnuabi. Could you please raise an #error
at this point if any of VECTLENDP
or VECTLENSP
are already defined at this point?
// Operations for DFT | ||
|
||
static INLINE vdouble vposneg_vd_vd(vdouble d) { | ||
vmask pnmask = svreinterpret_s32_u64(svlsl_n_u64_x(ptrue, svindex_u64(0, 1), 63)); |
fpetrogalli
Apr 24, 2018
Collaborator
I understand you are creating a positive/negative patterns to even/odd lanes here. Any chance you could avoid using vmask
for these operations (and the v*subadd
operation) and use native predication by building the repeated predicate patterns withDUPQ
?
See 6.21.4.4 of https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf
Something like:
vsubadd (x,y) = vadd(ptrue, vadd(dupq(true,false), x, y),
vsub(dupq(false,true), x, y)
I understand you are creating a positive/negative patterns to even/odd lanes here. Any chance you could avoid using vmask
for these operations (and the v*subadd
operation) and use native predication by building the repeated predicate patterns withDUPQ
?
See 6.21.4.4 of https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf
Something like:
vsubadd (x,y) = vadd(ptrue, vadd(dupq(true,false), x, y),
vsub(dupq(false,true), x, y)
shibatch
Apr 25, 2018
Author
Owner
I think that's not a good idea since it includes three (or two) FP add operations, and FP operations are considered to be expensive and slow. Those operations are also dependent of the output by the previous instruction. We assume that the ALUs for the unmasked elements are not used, but that might not be the case since power-gating may take some time to kick in.
I think that's not a good idea since it includes three (or two) FP add operations, and FP operations are considered to be expensive and slow. Those operations are also dependent of the output by the previous instruction. We assume that the ALUs for the unmasked elements are not used, but that might not be the case since power-gating may take some time to kick in.
shibatch
Apr 25, 2018
Author
Owner
I came up with a good way to remove vmask without additional FP operation.
This should reduce register pressure.
static INLINE vdouble vposneg_vd_vd(vdouble d) {
return svneg_f64_m(d, svdupq_n_b64(false, true), d);
}
I came up with a good way to remove vmask without additional FP operation.
This should reduce register pressure.
static INLINE vdouble vposneg_vd_vd(vdouble d) {
return svneg_f64_m(d, svdupq_n_b64(false, true), d);
}
* An error will be generated if VECTLENDP or VECTLENSP are redefined at helpersve.h * Added explanation of the meaning of VECTLENDP and VECTLENSP to helpersve.h.
This patch adds SVE support to the DFT library.
The DFT library strongly depends on the vector length, and thus it does not mix well with vector-length agnosticism. In this patch, the utilized vector length can be specified by CONFIG macro in helpersve.h. The DFT subroutine uses a dispatcher to choose the kernel subroutine from those compiled for 2048, 1024, 512 and 256-bit vector lengths.