You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Greetings, could someone please explain what sort of compiler optimization happens behind the scenes when one uses seq_exec?
Because for a simple 3d finite difference nested loop, I see that the RAJA version is about four times faster than the C code.
Here is the C code. It does not show the body of the loop for brevity, but it is the same as the RAJA version:
for (int k = 0; k < nz; ++k ) {
for (int j = 0; j < ny; ++j ) {
for (int i = 0; i < nx; ++i ) {
A[i + nx * (j + ny * k)] = ...
and RAJA version of the same loop as such
using EXEC_POLICY_3D =
RAJA::KernelPolicy<
RAJA::statement::For<2, RAJA::seq_exec, // k
RAJA::statement::For<1, RAJA::seq_exec, // j
RAJA::statement::For<0, RAJA::seq_exec, // i
RAJA::statement::Lambda<0>
>
>
>
>;
RAJA::kernel<EXEC_POLICY_3D>(
RAJA::make_tuple( RAJA::TypedRangeSegment<int>(0, nz),
RAJA::TypedRangeSegment<int>(0, ny),
RAJA::TypedRangeSegment<int>(0, nx) ),
[=] RAJA_DEVICE ( int k, int j, int i) {
A[i + nx * (j + ny * k)] = ...
Note that I use the same compiler flags for both codes:
That is, there are no pragmas or other annotations applied in RAJA internals. That said, we often observe cases where RAJA code runs faster than native C-style code, but it is not clear why. However, 4x faster seems extraordinary. Have you compared the assembly code for the two versions?
Greetings, could someone please explain what sort of compiler optimization happens behind the scenes when one uses
seq_exec
?Because for a simple 3d finite difference nested loop, I see that the RAJA version is about four times faster than the C code.
Here is the C code. It does not show the body of the loop for brevity, but it is the same as the RAJA version:
and RAJA version of the same loop as such
Note that I use the same compiler flags for both codes:
And I am using M3 MacBook.
Thanks
The text was updated successfully, but these errors were encountered: