New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase sycl-nliber2 into develop #3231
Conversation
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
There are a couple different paths
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
(based on bjoo's code)
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
For host, now use device.is_host() to determine which SYCL device to use, as this appears to work under both clang++ and icpx
Retest this please. |
Retest this please. |
Retest this please |
Retest this please. |
Retest this please. |
OK. This is finally passing. |
@@ -161,6 +165,8 @@ KOKKOS_INTERNAL_COMPILER_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VE | |||
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Apple LLVM) | |||
KOKKOS_INTERNAL_COMPILER_HCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),HCC) | |||
KOKKOS_INTERNAL_COMPILER_GCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),GCC) | |||
KOKKOS_INTERNAL_COMPILER_COMPUTEPP := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Codeplay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything corresponding in the CMake configuration. Is this required?
compute++ ${SYCLFLAGS} ${FLAGS} ${cpp_file} | ||
${host_compiler} ${FLAGS} -include ${sycl_file} ${cpp_file} | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need/want this file?
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Space.cpp | ||
Kokkos_SYCL_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Instance.cpp | ||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Instance.cpp | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I haven't checked the Makefile system.
core/src/Kokkos_Atomic.hpp
Outdated
@@ -90,6 +90,10 @@ | |||
|
|||
#define KOKKOS_ENABLE_HIP_ATOMICS | |||
|
|||
#elif defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL_GPU) | |||
|
|||
#define KOKKOS_ENABLE_SYCL_ATOMICS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to discuss what we do for the atomics but I don't think this needs to be part of this pull request.
Currently, KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL_GPU
is not defined and the build fails if we use KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL
(which is defined) instead.
class SYCLHostUSMSpace; ///< Memory space on SYCL CPU as device | ||
class SYCLDeviceUSMSpace; ///< Memory space on SYCL GPU as device | ||
class SYCLSharedUSMSpace; ///< Memory space shared USM | ||
class SYCL; ///< Execution space for SYCL GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss what the purpose of these three memory spaces are and when to use them.
/* only compile this file if SYCL is enabled for Kokkos */ | ||
#ifdef KOKKOS_ENABLE_SYCL | ||
|
||
//#include <SYCL/Kokkos_SYCL_Internal.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this include file?
namespace Experimental { | ||
namespace Impl { | ||
|
||
int SYCLInternal::was_finalized = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not a bool?
return 8 * 64 * 40; // 20480 kaveri | ||
#else | ||
return 32 * 8 * 40; // 81920 fiji and hawaii | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong...
|
||
void SYCL::impl_initialize(SYCL::SYCLDevice d) { | ||
Impl::SYCLInternal::singleton().initialize(d.get_device()); | ||
#if defined(KOKKOS_ENABLE_PROFILING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go away...
isTriviallyCopyable<Driver>(); | ||
isTriviallyCopyable<decltype(driver.m_functor)>(); | ||
driver.m_policy.space().impl_internal_space_instance()->m_queue->wait(); | ||
#ifndef SYCL_USE_BIND_LAUNCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this (anymore), do we?
An indirect kernel is one where we have a functor that is not trivially copyable and so is explicitly constructed by the host in USM shared memory before being passed "by pointer" (inside a reference_wrapper) to SYCL parallel_for. This is to address the limitation that SYCL data types can only be implicitly copied to the device if they are trivially copyable.
Added support for non-trivially copyable kernels under SYCL
implementation to prepare for changing the USM memory type from shared to device. Current version is functionally equivalent (but not interface equivalent) to the previous version.
before allocating a new one.
tends to go together with default constructible and might as well do it while the design of the class is in my head), as that is much cleaner than wrapping it in an optional.
Fixed USMObjectMem move constructor so that its implementation is actually noexcept (like its declaration says it is). Note: this assumes the implementation of sycl::queue is just a shared_ptr underneath, which it is in all known implementations.
Use USM device memory for IndirectKernel
Work in progress