[OpenMP] Disable early vectorization of loads/stores in the runtime

We are having a hard time optimizing some vectorized loads/stores later on which causes this optimization to degrade performance. Differential Revision: https://reviews.llvm.org/D158656
llvm · Aug 23, 2023 · 80906ce · 80906ce
1 parent 283998d
commit 80906ce
Showing 1 changed file with 8 additions and 2 deletions.
diff --git a/openmp/libomptarget/DeviceRTL/CMakeLists.txt b/openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -109,8 +109,14 @@ set(src_files
   ${source_directory}/Workshare.cpp
 )
 
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512)
-set(link_opt_flags  -O3        -openmp-opt-disable -attributor-enable=module)
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime 
+# has been linked into the user program.
+set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
+set(link_opt_flags  -O3        -openmp-opt-disable -attributor-enable=module -vectorize-slp=false )
 set(link_export_flag -passes=internalize -internalize-public-api-file=${source_directory}/exports)
 
 # Prepend -I to each list element