[OFFLOAD] Build DeviceRTL with spirv backend #171011

fineg74 · 2025-12-07T04:19:27Z

This PR adds configuration to build DeviceRTL with SPIRV backend. It is primarily used for level-zero plugin for Intel GPUs

github-actions · 2025-12-07T04:19:45Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

sarnex · 2025-12-08T19:32:00Z

clang/lib/Headers/spirvintrin.h

+void __spirv_MemoryBarrier(uint32_t memory_scope, uint32_t semantics);
+
+
+// Returns the number of blocks in the 'x' dimension.


Did you check this file for correctness? I literally used an LLM to generate it just so we could compile stuff and didn't look at it.

sarnex · 2025-12-08T19:32:16Z

clang/lib/Headers/spirvintrin.h

+
+// Type aliases to the address spaces used by the SPIR-V backend.
+//
+// TODO: FIX


probably we can remove the // TODO

sarnex · 2025-12-08T19:33:03Z

openmp/device/include/LibC.h


 namespace ompx {

+#if defined(__SPIRV__)


We could add a comment saying SPIR-V doesn't support varargs for anything besides the printf intrinsic so that's why we're doing this

sarnex · 2025-12-08T19:33:50Z

openmp/device/include/DeviceTypes.h


-using ParallelRegionFnTy = void *;
+#ifdef __SPIRV__
+using FnPtrTy = __attribute__((address_space(9))) void *;


in spirvintrin.h we could add using unsigned ProgramAS = 9; and then use ProgramAS here instead of a random 9.

we could also have that as an empty #define in the other headers so we wouldn't need an #ifdef here

sarnex · 2025-12-08T19:34:35Z

openmp/device/src/Synchronization.cpp

+  SequentiallyConsistent = 0x10
+} MemorySemantics_t;
+
+extern "C" uint32_t __spirv_AtomicIAdd(uint32_t *, int, int, uint32_t);


how much of this can we move to spirvintrin.h?

I hope all :) This definitely does look like the wrong place.

sarnex · 2025-12-08T19:35:13Z

openmp/device/CMakeLists.txt

  ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
 )

-list(APPEND compile_options -flto)


I'm working on a patch in Clang so we can link BC directly, maybe we want to hold off on this change until we see if that's gonna be merged or not

#169572

sarnex · 2025-12-08T19:35:52Z

clang/lib/Headers/gpuintrin.h

 _Pragma("omp end declare target");

-#if defined(__NVPTX__)
+#if defined(__SPIRV__)


nit: can we make SPIRV the last entry before the elif OpenMP?

sarnex · 2025-12-08T19:36:09Z

clang/lib/Headers/spirvintrin.h

+#define false 0
+#endif
+
+_Pragma("omp begin declare target device_type(nohost)");


do we still need this omp stuff? I remember some talk about not needing it anymore

sarnex · 2025-12-08T19:36:49Z

openmp/device/src/LibC.cpp

  return ::vprintf(Format, vlist);
 }
 } // namespace ompx
+#endif


nit: need newline at eof

sarnex · 2025-12-08T19:37:04Z

clang/lib/Headers/spirvintrin.h

+#if !defined(__cplusplus)
+_Pragma("pop_macro(\"bool\")");
+#endif
+#endif // __SPIRVINTRIN_H


nit: need newline at eof

github-actions · 2025-12-08T19:41:16Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions cpp,h -- clang/lib/Headers/spirvintrin.h clang/lib/Headers/gpuintrin.h openmp/device/include/DeviceTypes.h openmp/device/include/LibC.h openmp/device/include/State.h openmp/device/src/Allocator.cpp openmp/device/src/LibC.cpp openmp/device/src/Parallelism.cpp openmp/device/src/Synchronization.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/clang/lib/Headers/spirvintrin.h b/clang/lib/Headers/spirvintrin.h
index 84166a455..f666c155d 100644
--- a/clang/lib/Headers/spirvintrin.h
+++ b/clang/lib/Headers/spirvintrin.h
@@ -23,7 +23,7 @@ _Pragma("omp begin declare variant match(device = {arch(spirv64)})");
 // Type aliases to the address spaces used by the SPIR-V backend.
 //
 // TODO: FIX
-#define __gpu_private  
+#define __gpu_private
 #define __gpu_constant
 #define __gpu_local
 #define __gpu_global __attribute__((address_space(1)))
@@ -38,8 +38,7 @@ uint64_t __spirv_BuiltInWorkgroupSize(int i);
 uint64_t __spirv_BuiltInLocalInvocationId(int i);
 
 #ifdef __cplusplus
-template <typename... Args>
-int __spirv_ocl_printf(Args...);
+template <typename... Args> int __spirv_ocl_printf(Args...);
 #endif
 
 // Subgroup functions
@@ -47,28 +46,31 @@ __SPIRV_VAR_QUALIFIERS uint32_t __spirv_BuiltInSubgroupLocalInvocationId;
 __SPIRV_VAR_QUALIFIERS uint32_t __spirv_BuiltInSubgroupSize;
 
 // Group non-uniform operations
-uint64_t __spirv_GroupNonUniformBallot(uint32_t execution_scope, bool predicate);
-uint32_t __spirv_GroupNonUniformBroadcastFirst(uint32_t execution_scope, uint32_t value);
-uint32_t __spirv_GroupNonUniformShuffle(uint32_t execution_scope, uint32_t value, uint32_t id);
+uint64_t __spirv_GroupNonUniformBallot(uint32_t execution_scope,
+                                       bool predicate);
+uint32_t __spirv_GroupNonUniformBroadcastFirst(uint32_t execution_scope,
+                                               uint32_t value);
+uint32_t __spirv_GroupNonUniformShuffle(uint32_t execution_scope,
+                                        uint32_t value, uint32_t id);
 
 // Synchronization
-void __spirv_ControlBarrier(uint32_t execution_scope, uint32_t memory_scope, uint32_t semantics);
+void __spirv_ControlBarrier(uint32_t execution_scope, uint32_t memory_scope,
+                            uint32_t semantics);
 void __spirv_MemoryBarrier(uint32_t memory_scope, uint32_t semantics);
 
-
 // Returns the number of blocks in the 'x' dimension.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_x(void) {
-   return __spirv_BuiltInNumWorkgroups(0);
+  return __spirv_BuiltInNumWorkgroups(0);
 }
 
 // Returns the number of blocks in the 'y' dimension.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_y(void) {
-   return __spirv_BuiltInNumWorkgroups(1);
+  return __spirv_BuiltInNumWorkgroups(1);
 }
 
 // Returns the number of blocks in the 'z' dimension.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_z(void) {
-   return __spirv_BuiltInNumWorkgroups(2);
+  return __spirv_BuiltInNumWorkgroups(2);
 }
 
 // Returns the 'x' dimension of the current block's id.
@@ -117,14 +119,18 @@ _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_thread_id_z(void) {
 }
 
 // Returns the size of a warp, always 32 on NVIDIA hardware.
-_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) { return __spirv_BuiltInSubgroupSize; }
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) {
+  return __spirv_BuiltInSubgroupSize;
+}
 
 // Returns the id of the thread inside of a warp executing together.
-_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_id(void) { return __spirv_BuiltInSubgroupLocalInvocationId; }
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_id(void) {
+  return __spirv_BuiltInSubgroupLocalInvocationId;
+}
 
 // Returns the bit-mask of active threads in the current warp.
-_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_lane_mask(void) { 
- return __spirv_GroupNonUniformBallot(3, 1);
+_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_lane_mask(void) {
+  return __spirv_GroupNonUniformBallot(3, 1);
 }
 // Copies the value from the first active thread in the warp to the rest.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t
@@ -139,11 +145,13 @@ _DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_ballot(uint64_t __lane_mask,
 }
 // Waits for all the threads in the block to converge and issues a fence.
 _DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_threads(void) {
-   __spirv_ControlBarrier(4, 2, 0x8); // Workgroup scope, acquire/release semantics
+  __spirv_ControlBarrier(4, 2,
+                         0x8); // Workgroup scope, acquire/release semantics
 }
 // Waits for all threads in the warp to reconverge for independent scheduling.
 _DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_lane(uint64_t __lane_mask) {
-  __spirv_ControlBarrier(4, 3, 0x8); // Subgroup scope, acquire/release semantics
+  __spirv_ControlBarrier(4, 3,
+                         0x8); // Subgroup scope, acquire/release semantics
 }
 // Shuffles the the lanes inside the warp according to the given index.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t
@@ -171,7 +179,6 @@ __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
   return __gpu_match_all_u32_impl(__lane_mask, __x);
 }
 
-
 // Returns the current lane mask if every lane contains __x.
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
@@ -181,12 +188,12 @@ __gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
 // Returns true if the flat pointer points to 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return false; // TODO
-  //return to_local(ptr) != 0;
+  // return to_local(ptr) != 0;
 }
 // Returns true if the flat pointer points to 'local' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_private(void *ptr) {
   return false;
-  //return to_private(ptr) != 0; // TODO
+  // return to_private(ptr) != 0; // TODO
 }
 // Terminates execution of the calling thread.
 _DEFAULT_FN_ATTRS [[noreturn]] static __inline__ void __gpu_exit(void) {
diff --git a/openmp/device/include/LibC.h b/openmp/device/include/LibC.h
index a67323b58..2b598be44 100644
--- a/openmp/device/include/LibC.h
+++ b/openmp/device/include/LibC.h
@@ -21,7 +21,7 @@ template <size_t N, typename... Args>
 int printf(const char (&Format)[N], Args... args) {
   return __spirv_ocl_printf(Format, args...);
 }
-#else    
+#else
 int printf(const char *Format, ...);
 #endif
 
diff --git a/openmp/device/include/State.h b/openmp/device/include/State.h
index 338f5a7f8..31dc1540d 100644
--- a/openmp/device/include/State.h
+++ b/openmp/device/include/State.h
@@ -219,7 +219,7 @@ lookup32(ValueKind Kind, bool IsReadonly, IdentTy *Ident, bool ForceTeamState) {
   __builtin_unreachable();
 }
 
-[[gnu::always_inline, gnu::flatten]] inline FnPtrTy&
+[[gnu::always_inline, gnu::flatten]] inline FnPtrTy &
 lookupPtr(ValueKind Kind, bool IsReadonly, bool ForceTeamState) {
   switch (Kind) {
   case state::VK_ParallelRegionFn:
diff --git a/openmp/device/src/Parallelism.cpp b/openmp/device/src/Parallelism.cpp
index 1d18bddb8..86cdc8a14 100644
--- a/openmp/device/src/Parallelism.cpp
+++ b/openmp/device/src/Parallelism.cpp
@@ -260,7 +260,7 @@ __kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr,
                                           1u, true, ident,
                                           /*ForceTeamState=*/true);
     state::ValueRAII ParallelRegionFnRAII(state::ParallelRegionFn, wrapper_fn,
-                                          (FnPtrTy)nullptr, true, ident,
+                                          (FnPtrTy) nullptr, true, ident,
                                           /*ForceTeamState=*/true);
     state::ValueRAII ActiveLevelRAII(icv::ActiveLevel, 1u, 0u, true, ident,
                                      /*ForceTeamState=*/true);

adurang · 2025-12-09T16:35:27Z

clang/lib/Headers/spirvintrin.h

+}
+
+// Returns the size of a warp, always 32 on NVIDIA hardware.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) { return __spirv_BuiltInSubgroupSize; }


I don't think this line is following the correct format

adurang · 2025-12-09T16:41:35Z

openmp/device/src/Allocator.cpp

 // without 'libc' support.
 extern "C" {
-#if defined(__AMDGPU__) && !defined(OMPTARGET_HAS_LIBC)
+#if (defined(__AMDGPU__) || defined(__SPIRV__)) && !defined(OMPTARGET_HAS_LIBC)


i think it's better to have a define in the dependent hedears that like NEEDS_ALLOC than have a list of architectures in independent files. Also, update the comment above.

Build DeviceRTL with spirv backend

996c2e1

Merge branch 'main' into l0RTL

5e746a8

sarnex reviewed Dec 8, 2025

View reviewed changes

adurang reviewed Dec 9, 2025

View reviewed changes

		void __spirv_MemoryBarrier(uint32_t memory_scope, uint32_t semantics);


		// Returns the number of blocks in the 'x' dimension.

[OFFLOAD] Build DeviceRTL with spirv backend #171011

Are you sure you want to change the base?

[OFFLOAD] Build DeviceRTL with spirv backend #171011

Uh oh!

Conversation

fineg74 commented Dec 7, 2025

Uh oh!

github-actions bot commented Dec 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adurang Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adurang Dec 9, 2025 •

edited

Loading