Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Aggressive Vectorization Compilation mode #72

Closed
crtrott opened this issue Aug 19, 2015 · 7 comments
Closed

Add Aggressive Vectorization Compilation mode #72

crtrott opened this issue Aug 19, 2015 · 7 comments
Labels
Feature Request Create new capability; will potentially require voting

Comments

@crtrott
Copy link
Member

crtrott commented Aug 19, 2015

In principal Intels interpretation of #pragma ivdep can always be added to Kokkos code, since our parallel_for kernels have to be outer loop vectorizable, and intels #pragma ivdep only applies to the immediately following block, and not to nested blocks. That said: this is the current compiler, and the behavior is ill defined, so we don't want to add that by default. This compiler option would add the #pragma ivdep and thus help considerably with vectorization.

@crtrott crtrott added the Feature Request Create new capability; will potentially require voting label Aug 19, 2015
@nmhamster
Copy link
Contributor

We discussed that I'm nervous about potential errors this may bring to code which will be hard to find and fix, particularly in the initial ports of code to use Kokkos. Its something for us to consider but we should prioritize user experience right now IMHO.

@crtrott
Copy link
Member Author

crtrott commented Aug 19, 2015

Yeah I know ;-). I only want to open an issue to track the issue, and give a space to leave comments. I got some feedback from the Fluid Dynamics folks who liked the idea of getting that mode (which is not on by default). Btw. I would be interested in an example where it actually fails, so far I couldn't contrive one (which does not mean that I believe that there is none ...).

@nmhamster
Copy link
Contributor

Are you asking for an example which is runnable on the GPU under "Kokkos rules" or code which you are actually going to get in the OpenMP-to-Kokkos transition phase? These two things are not born equal.

@crtrott
Copy link
Member Author

crtrott commented Aug 19, 2015

I was more thinking about in the OpenMP-to-Kokkos transition phase, but if you have an example which actually runs correct with the Cuda backend and fails with OpenMP backend while marking loops with ivdep that would be even more interesting.

@crtrott
Copy link
Member Author

crtrott commented Sep 9, 2015

After discussions today at the Kokkos meeting I pushed this change in. To enable it either define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION to 1 or when building with the Makefile system use KOKKOS_OPTIONS=aggressive_vectorization as an option. There is no CMake option right now. It also only affects the OpenMP backend with the Intel compiler currently.

@crtrott crtrott closed this as completed Sep 9, 2015
@Slaedr
Copy link

Slaedr commented Mar 8, 2019

@crtrott What are your thoughts on similarly adding an omp simd option so that other compilers can also vectorize chunks? I tried a very simple loop body which clang 7 would not vectorize, unless I added the omp simd pragma. See this.

@Slaedr
Copy link

Slaedr commented Mar 8, 2019

Since not all functors will be vectorizable like that, perhaps it's better to add another execution space, OpenMPSIMD or something, which has omp simd instead of ivdep? Then the appropriate space could be used depending on the functor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Create new capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

3 participants