-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible translation for OpenACC loop seq #24
Comments
After further investigation the correctness of the translation depends on the Compiler and used Hardware. When using Nvidia Tools and Hardware the translation is correct. With Intel the result doesn't match the expected one. |
Hey @Lyphion -- do you mind sharing which intel compiler did you try? Thanks I'm a bit swamped these days -- but I'll try to work on this when I have some time. |
All my tests are done with Fortran.
This was just an idea, if you like it but don't have much time, I could also design a implementation/draft. |
Hello, I'm not sure about this proposal. According to the OpenACC spec for loop construct / seq:
however, a The example you posted works because the parallel region does not spawn threads (or workers in OpenACC jargon). However, what if threads/workers are spawned? Not sure that the translation using your suggestion would be valid. |
I know that this is more like a shortcut or hack. As I already mentioned it doesn't work on all platforms for that reason. But in some instances it really helps with the performance and in the case of the Nvidia Compiler it prints the same Debug-Log when compiling. Converting an outer sequential loop into an OpenMP construct would require to spawn a new kernel on each iteration which hurt the performance. Thanks for investigating my idea. The documentation/manual of OpenMP and OpenACC are a bit confusing and open in some parts. If you are skeptical about it, we can leave it as it is and I refactor my code on my side without tool support. |
I've been thinking on the topic and discussing it with some colleagues. I think that the appropriate solution would be to translate the Sorry if this does not align with your expectations but this shall be the most semantically equivalent translation. |
I totally agree with you about the solution. For my own testing I also tried translating it into a no-op and it work good enough for me. The user must keep in mind, that all instructions between the outer sequential loop (!$acc loop seq) and a inner parallel one are most likely run by all threads, so nothing should be calculated/saved here. I'd like to thank you again for checking and researching. Your tool and feedback really helped me. |
Currently OpenMP doesn't support the OpenACC
loop seq
construct and no direct translation is present/possible.A possible translation could be to use the
bind(thread)
construct instead. According to this paper and my own tests the following code snippets produce correct results with comparable performance.OpenACC:
OpenMP:
For better transparency a feature flag is useful and appropriate.
The text was updated successfully, but these errors were encountered: