-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[OFFLOAD] Build DeviceRTL with spirv backend #171011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
| void __spirv_MemoryBarrier(uint32_t memory_scope, uint32_t semantics); | ||
|
|
||
|
|
||
| // Returns the number of blocks in the 'x' dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check this file for correctness? I literally used an LLM to generate it just so we could compile stuff and didn't look at it.
|
|
||
| // Type aliases to the address spaces used by the SPIR-V backend. | ||
| // | ||
| // TODO: FIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we can remove the // TODO
|
|
||
| namespace ompx { | ||
|
|
||
| #if defined(__SPIRV__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a comment saying SPIR-V doesn't support varargs for anything besides the printf intrinsic so that's why we're doing this
|
|
||
| using ParallelRegionFnTy = void *; | ||
| #ifdef __SPIRV__ | ||
| using FnPtrTy = __attribute__((address_space(9))) void *; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in spirvintrin.h we could add using unsigned ProgramAS = 9; and then use ProgramAS here instead of a random 9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also have that as an empty #define in the other headers so we wouldn't need an #ifdef here
| SequentiallyConsistent = 0x10 | ||
| } MemorySemantics_t; | ||
|
|
||
| extern "C" uint32_t __spirv_AtomicIAdd(uint32_t *, int, int, uint32_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much of this can we move to spirvintrin.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope all :) This definitely does look like the wrong place.
| ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp | ||
| ) | ||
|
|
||
| list(APPEND compile_options -flto) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a patch in Clang so we can link BC directly, maybe we want to hold off on this change until we see if that's gonna be merged or not
| _Pragma("omp end declare target"); | ||
|
|
||
| #if defined(__NVPTX__) | ||
| #if defined(__SPIRV__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we make SPIRV the last entry before the elif OpenMP?
| #define false 0 | ||
| #endif | ||
|
|
||
| _Pragma("omp begin declare target device_type(nohost)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this omp stuff? I remember some talk about not needing it anymore
| return ::vprintf(Format, vlist); | ||
| } | ||
| } // namespace ompx | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: need newline at eof
| #if !defined(__cplusplus) | ||
| _Pragma("pop_macro(\"bool\")"); | ||
| #endif | ||
| #endif // __SPIRVINTRIN_H No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: need newline at eof
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp,h -- clang/lib/Headers/spirvintrin.h clang/lib/Headers/gpuintrin.h openmp/device/include/DeviceTypes.h openmp/device/include/LibC.h openmp/device/include/State.h openmp/device/src/Allocator.cpp openmp/device/src/LibC.cpp openmp/device/src/Parallelism.cpp openmp/device/src/Synchronization.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/clang/lib/Headers/spirvintrin.h b/clang/lib/Headers/spirvintrin.h
index 84166a455..f666c155d 100644
--- a/clang/lib/Headers/spirvintrin.h
+++ b/clang/lib/Headers/spirvintrin.h
@@ -23,7 +23,7 @@ _Pragma("omp begin declare variant match(device = {arch(spirv64)})");
// Type aliases to the address spaces used by the SPIR-V backend.
//
// TODO: FIX
-#define __gpu_private
+#define __gpu_private
#define __gpu_constant
#define __gpu_local
#define __gpu_global __attribute__((address_space(1)))
@@ -38,8 +38,7 @@ uint64_t __spirv_BuiltInWorkgroupSize(int i);
uint64_t __spirv_BuiltInLocalInvocationId(int i);
#ifdef __cplusplus
-template <typename... Args>
-int __spirv_ocl_printf(Args...);
+template <typename... Args> int __spirv_ocl_printf(Args...);
#endif
// Subgroup functions
@@ -47,28 +46,31 @@ __SPIRV_VAR_QUALIFIERS uint32_t __spirv_BuiltInSubgroupLocalInvocationId;
__SPIRV_VAR_QUALIFIERS uint32_t __spirv_BuiltInSubgroupSize;
// Group non-uniform operations
-uint64_t __spirv_GroupNonUniformBallot(uint32_t execution_scope, bool predicate);
-uint32_t __spirv_GroupNonUniformBroadcastFirst(uint32_t execution_scope, uint32_t value);
-uint32_t __spirv_GroupNonUniformShuffle(uint32_t execution_scope, uint32_t value, uint32_t id);
+uint64_t __spirv_GroupNonUniformBallot(uint32_t execution_scope,
+ bool predicate);
+uint32_t __spirv_GroupNonUniformBroadcastFirst(uint32_t execution_scope,
+ uint32_t value);
+uint32_t __spirv_GroupNonUniformShuffle(uint32_t execution_scope,
+ uint32_t value, uint32_t id);
// Synchronization
-void __spirv_ControlBarrier(uint32_t execution_scope, uint32_t memory_scope, uint32_t semantics);
+void __spirv_ControlBarrier(uint32_t execution_scope, uint32_t memory_scope,
+ uint32_t semantics);
void __spirv_MemoryBarrier(uint32_t memory_scope, uint32_t semantics);
-
// Returns the number of blocks in the 'x' dimension.
_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_x(void) {
- return __spirv_BuiltInNumWorkgroups(0);
+ return __spirv_BuiltInNumWorkgroups(0);
}
// Returns the number of blocks in the 'y' dimension.
_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_y(void) {
- return __spirv_BuiltInNumWorkgroups(1);
+ return __spirv_BuiltInNumWorkgroups(1);
}
// Returns the number of blocks in the 'z' dimension.
_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_z(void) {
- return __spirv_BuiltInNumWorkgroups(2);
+ return __spirv_BuiltInNumWorkgroups(2);
}
// Returns the 'x' dimension of the current block's id.
@@ -117,14 +119,18 @@ _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_thread_id_z(void) {
}
// Returns the size of a warp, always 32 on NVIDIA hardware.
-_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) { return __spirv_BuiltInSubgroupSize; }
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) {
+ return __spirv_BuiltInSubgroupSize;
+}
// Returns the id of the thread inside of a warp executing together.
-_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_id(void) { return __spirv_BuiltInSubgroupLocalInvocationId; }
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_id(void) {
+ return __spirv_BuiltInSubgroupLocalInvocationId;
+}
// Returns the bit-mask of active threads in the current warp.
-_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_lane_mask(void) {
- return __spirv_GroupNonUniformBallot(3, 1);
+_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_lane_mask(void) {
+ return __spirv_GroupNonUniformBallot(3, 1);
}
// Copies the value from the first active thread in the warp to the rest.
_DEFAULT_FN_ATTRS static __inline__ uint32_t
@@ -139,11 +145,13 @@ _DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_ballot(uint64_t __lane_mask,
}
// Waits for all the threads in the block to converge and issues a fence.
_DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_threads(void) {
- __spirv_ControlBarrier(4, 2, 0x8); // Workgroup scope, acquire/release semantics
+ __spirv_ControlBarrier(4, 2,
+ 0x8); // Workgroup scope, acquire/release semantics
}
// Waits for all threads in the warp to reconverge for independent scheduling.
_DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_lane(uint64_t __lane_mask) {
- __spirv_ControlBarrier(4, 3, 0x8); // Subgroup scope, acquire/release semantics
+ __spirv_ControlBarrier(4, 3,
+ 0x8); // Subgroup scope, acquire/release semantics
}
// Shuffles the the lanes inside the warp according to the given index.
_DEFAULT_FN_ATTRS static __inline__ uint32_t
@@ -171,7 +179,6 @@ __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
return __gpu_match_all_u32_impl(__lane_mask, __x);
}
-
// Returns the current lane mask if every lane contains __x.
_DEFAULT_FN_ATTRS static __inline__ uint64_t
__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
@@ -181,12 +188,12 @@ __gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
// Returns true if the flat pointer points to 'shared' memory.
_DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
return false; // TODO
- //return to_local(ptr) != 0;
+ // return to_local(ptr) != 0;
}
// Returns true if the flat pointer points to 'local' memory.
_DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_private(void *ptr) {
return false;
- //return to_private(ptr) != 0; // TODO
+ // return to_private(ptr) != 0; // TODO
}
// Terminates execution of the calling thread.
_DEFAULT_FN_ATTRS [[noreturn]] static __inline__ void __gpu_exit(void) {
diff --git a/openmp/device/include/LibC.h b/openmp/device/include/LibC.h
index a67323b58..2b598be44 100644
--- a/openmp/device/include/LibC.h
+++ b/openmp/device/include/LibC.h
@@ -21,7 +21,7 @@ template <size_t N, typename... Args>
int printf(const char (&Format)[N], Args... args) {
return __spirv_ocl_printf(Format, args...);
}
-#else
+#else
int printf(const char *Format, ...);
#endif
diff --git a/openmp/device/include/State.h b/openmp/device/include/State.h
index 338f5a7f8..31dc1540d 100644
--- a/openmp/device/include/State.h
+++ b/openmp/device/include/State.h
@@ -219,7 +219,7 @@ lookup32(ValueKind Kind, bool IsReadonly, IdentTy *Ident, bool ForceTeamState) {
__builtin_unreachable();
}
-[[gnu::always_inline, gnu::flatten]] inline FnPtrTy&
+[[gnu::always_inline, gnu::flatten]] inline FnPtrTy &
lookupPtr(ValueKind Kind, bool IsReadonly, bool ForceTeamState) {
switch (Kind) {
case state::VK_ParallelRegionFn:
diff --git a/openmp/device/src/Parallelism.cpp b/openmp/device/src/Parallelism.cpp
index 1d18bddb8..86cdc8a14 100644
--- a/openmp/device/src/Parallelism.cpp
+++ b/openmp/device/src/Parallelism.cpp
@@ -260,7 +260,7 @@ __kmpc_parallel_51(IdentTy *ident, int32_t, int32_t if_expr,
1u, true, ident,
/*ForceTeamState=*/true);
state::ValueRAII ParallelRegionFnRAII(state::ParallelRegionFn, wrapper_fn,
- (FnPtrTy)nullptr, true, ident,
+ (FnPtrTy) nullptr, true, ident,
/*ForceTeamState=*/true);
state::ValueRAII ActiveLevelRAII(icv::ActiveLevel, 1u, 0u, true, ident,
/*ForceTeamState=*/true);
|
| } | ||
|
|
||
| // Returns the size of a warp, always 32 on NVIDIA hardware. | ||
| _DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) { return __spirv_BuiltInSubgroupSize; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line is following the correct format
| // without 'libc' support. | ||
| extern "C" { | ||
| #if defined(__AMDGPU__) && !defined(OMPTARGET_HAS_LIBC) | ||
| #if (defined(__AMDGPU__) || defined(__SPIRV__)) && !defined(OMPTARGET_HAS_LIBC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's better to have a define in the dependent hedears that like NEEDS_ALLOC than have a list of architectures in independent files. Also, update the comment above.
This PR adds configuration to build DeviceRTL with SPIRV backend. It is primarily used for level-zero plugin for Intel GPUs