Possible translation for OpenACC loop seq #24

Lyphion · 2024-06-18T11:20:38Z

Currently OpenMP doesn't support the OpenACC loop seq construct and no direct translation is present/possible.
A possible translation could be to use the bind(thread) construct instead. According to this paper and my own tests the following code snippets produce correct results with comparable performance.

OpenACC:

!$acc parallel
!$acc loop seq
do j = 1, n
!$acc loop
  do i = 1, n
    b(i) = b(i) / j + a(i,j)
  end do
end do
!$acc end parallel

OpenMP:

!$omp target teams
!$omp loop bind(thread)
do j = 1, n
!$omp loop
  do i = 1, n
    b(i) = b(i) / j + a(i,j)
  end do
end do
!$omp end target teams

For better transparency a feature flag is useful and appropriate.

The text was updated successfully, but these errors were encountered:

Lyphion · 2024-06-23T12:28:53Z

After further investigation the correctness of the translation depends on the Compiler and used Hardware. When using Nvidia Tools and Hardware the translation is correct. With Intel the result doesn't match the expected one.
For that reason, the possible translation should be included in the experimental section.

hservatg · 2024-06-25T11:05:37Z

Hey @Lyphion -- do you mind sharing which intel compiler did you try? Thanks

I'm a bit swamped these days -- but I'll try to work on this when I have some time.

Lyphion · 2024-06-25T18:00:23Z

All my tests are done with Fortran.

ifx 2024.1.2 or 2024.2 for Intel (the old Fortran Compiler Classic doesn't support my hardware)
nvfortran 24.3 for Nvidia

This was just an idea, if you like it but don't have much time, I could also design a implementation/draft.

hservatg · 2024-07-01T22:45:55Z

Hello,

I'm not sure about this proposal. According to the OpenACC spec for loop construct / seq:

2153 2.9.5 seq clause
2154 The seq clause specifies that the associated loop or loops are to be executed sequentially by the
2155 accelerator. This clause will override any automatic parallelization or vectorization.

however, a !$omp loop bind(thread) would parallelize the loop construct over the threads and that would not honor the OpenACC semantics of the original code.

The example you posted works because the parallel region does not spawn threads (or workers in OpenACC jargon). However, what if threads/workers are spawned? Not sure that the translation using your suggestion would be valid.

Lyphion · 2024-07-02T10:31:39Z

I know that this is more like a shortcut or hack. As I already mentioned it doesn't work on all platforms for that reason. But in some instances it really helps with the performance and in the case of the Nvidia Compiler it prints the same Debug-Log when compiling. Converting an outer sequential loop into an OpenMP construct would require to spawn a new kernel on each iteration which hurt the performance.

Thanks for investigating my idea. The documentation/manual of OpenMP and OpenACC are a bit confusing and open in some parts.

If you are skeptical about it, we can leave it as it is and I refactor my code on my side without tool support.

hservatg · 2024-07-25T11:14:36Z

I've been thinking on the topic and discussing it with some colleagues. I think that the appropriate solution would be to translate the !$acc loop seq into a no-op (currently it is translated as !$omp loop, which is wrong). Basically !$acc loop seq prevents a loop of being parallelized by the OpenACC compiler -- so it shall run serially by a given thread.

Sorry if this does not align with your expectations but this shall be the most semantically equivalent translation.

Lyphion · 2024-07-25T15:35:30Z

I totally agree with you about the solution. For my own testing I also tried translating it into a no-op and it work good enough for me. The user must keep in mind, that all instructions between the outer sequential loop (!$acc loop seq) and a inner parallel one are most likely run by all threads, so nothing should be calculated/saved here.

I'd like to thank you again for checking and researching. Your tool and feedback really helped me.

hservatg self-assigned this Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible translation for OpenACC loop seq #24

Possible translation for OpenACC loop seq #24

Lyphion commented Jun 18, 2024

Lyphion commented Jun 23, 2024

hservatg commented Jun 25, 2024

Lyphion commented Jun 25, 2024 •

edited

Loading

hservatg commented Jul 1, 2024

Lyphion commented Jul 2, 2024

hservatg commented Jul 25, 2024

Lyphion commented Jul 25, 2024

Possible translation for OpenACC loop seq #24

Possible translation for OpenACC loop seq #24

Comments

Lyphion commented Jun 18, 2024

Lyphion commented Jun 23, 2024

hservatg commented Jun 25, 2024

Lyphion commented Jun 25, 2024 • edited Loading

hservatg commented Jul 1, 2024

Lyphion commented Jul 2, 2024

hservatg commented Jul 25, 2024

Lyphion commented Jul 25, 2024

Lyphion commented Jun 25, 2024 •

edited

Loading