-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intrinsic for the high-performance templated operator [x86] #1047
Conversation
…-peel-times option - Brought bp1p from github.com/CEED/benchmarks/blob/master/tests/mfem_bps to test the kernels
Some updates to the SIMD intrinsics branch
Merged in |
There are errors from this PR in the autotest runs on
|
I'm not sure why, on my desktop, I did not get this error. It maybe okay to remove the Lines 300 to 301 in 069ad59
|
Resolved conflicts: CHANGELOG
in class TVector -- this was causing compilation errors when CUDA is enabled.
Re-merged in |
The tux429 runs look OK now, but there are new errors on tux426
|
It's good we are testing AVX512 too. 😄 It looks like this particular intrinsic is available only when |
I guess we can always fall back on AutoSIMD<double,8,64> r;
r.m512d = _mm512_sub_pd(_mm512_set1_pd(0.0),v.m512d);
return r; However, there may be a better/faster alternative. |
Here is one alternative: which expanded becomes: _mm512_castsi512_pd(
_mm512_xor_epi32(
_mm512_castpd_si512(a),
_mm512_set1_epi64(0x8000000000000000))); I'll push this in a moment. @tzanio, can you try it on tux426? |
Re-merged in |
Re-merged in |
Enables the
performance
templated classes to use specific vector intrinsics on the following architectures:Lassen
withxlc++
,Vulcan
.It can be enabled with
MFEM_USE_SIMD=YES
.Results on
Vulcan
,master
:Vulcan
,x86
, with theSIMD
enabled:Vulcan
,x86
, withSIMD
andJIT
to use vectorization only when needed: