Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase sycl-nliber2 into develop #3231

Closed
wants to merge 87 commits into from

Conversation

nliber
Copy link
Contributor

@nliber nliber commented Jul 28, 2020

Work in progress

crtrott and others added 30 commits July 20, 2020 14:19
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
There are a couple different paths
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
Includes a sycl_cp_wrapper to make Codeplays compute++ compiler behave
like a single pass compiler. Kokkos::initialize will create a cl::sycl::queue
For host, now use device.is_host() to determine
which SYCL device to use, as this appears to work
under both clang++ and icpx
@masterleinad
Copy link
Contributor

Retest this please.

@masterleinad
Copy link
Contributor

Retest this please.

@masterleinad
Copy link
Contributor

Retest this please

@masterleinad
Copy link
Contributor

Retest this please.

@masterleinad
Copy link
Contributor

Retest this please.

@masterleinad
Copy link
Contributor

OK. This is finally passing.

@@ -161,6 +165,8 @@ KOKKOS_INTERNAL_COMPILER_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VE
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Apple LLVM)
KOKKOS_INTERNAL_COMPILER_HCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),HCC)
KOKKOS_INTERNAL_COMPILER_GCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),GCC)
KOKKOS_INTERNAL_COMPILER_COMPUTEPP := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Codeplay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything corresponding in the CMake configuration. Is this required?

compute++ ${SYCLFLAGS} ${FLAGS} ${cpp_file}
${host_compiler} ${FLAGS} -include ${sycl_file} ${cpp_file}
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need/want this file?

$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Space.cpp
Kokkos_SYCL_Instance.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Instance.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/SYCL/Kokkos_SYCL_Instance.cpp
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I haven't checked the Makefile system.

@@ -90,6 +90,10 @@

#define KOKKOS_ENABLE_HIP_ATOMICS

#elif defined(KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL_GPU)

#define KOKKOS_ENABLE_SYCL_ATOMICS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to discuss what we do for the atomics but I don't think this needs to be part of this pull request.
Currently, KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL_GPU is not defined and the build fails if we use KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_SYCL (which is defined) instead.

class SYCLHostUSMSpace; ///< Memory space on SYCL CPU as device
class SYCLDeviceUSMSpace; ///< Memory space on SYCL GPU as device
class SYCLSharedUSMSpace; ///< Memory space shared USM
class SYCL; ///< Execution space for SYCL GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss what the purpose of these three memory spaces are and when to use them.

/* only compile this file if SYCL is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_SYCL

//#include <SYCL/Kokkos_SYCL_Internal.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this include file?

namespace Experimental {
namespace Impl {

int SYCLInternal::was_finalized = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not a bool?

return 8 * 64 * 40; // 20480 kaveri
#else
return 32 * 8 * 40; // 81920 fiji and hawaii
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong...


void SYCL::impl_initialize(SYCL::SYCLDevice d) {
Impl::SYCLInternal::singleton().initialize(d.get_device());
#if defined(KOKKOS_ENABLE_PROFILING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go away...

isTriviallyCopyable<Driver>();
isTriviallyCopyable<decltype(driver.m_functor)>();
driver.m_policy.space().impl_internal_space_instance()->m_queue->wait();
#ifndef SYCL_USE_BIND_LAUNCH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this (anymore), do we?

An indirect kernel is one where we have a functor
that is not trivially copyable and so is explicitly
constructed by the host in USM shared memory before being passed
"by pointer" (inside a reference_wrapper) to SYCL parallel_for.

This is to address the limitation that SYCL
data types can only be implicitly copied to the device if they
are trivially copyable.
Added support for non-trivially copyable kernels under SYCL
implementation to prepare for changing the USM memory type from
shared to device.  Current version is functionally equivalent
(but not interface equivalent) to the previous version.
tends to go together with default constructible and might as
well do it while the design of the class is in my head), as that
is much cleaner than wrapping it in an optional.
Fixed USMObjectMem move constructor so that its implementation
is actually noexcept (like its declaration says it is).

Note:  this assumes the implementation of sycl::queue is just
a shared_ptr underneath, which it is in all known implementations.
Use USM device memory for IndirectKernel
@crtrott crtrott closed this Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants