Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omptarget Updates #3169

Merged
merged 13 commits into from
Jul 11, 2020
Merged

Omptarget Updates #3169

merged 13 commits into from
Jul 11, 2020

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Jul 8, 2020

Significant Update to OpenMPTarget

  • requires C++17
  • works on Intel GPUs
  • Compilation and tests fixed/worked around so that full build/test passes.
  • Fixed some stuff for reference counting.

@@ -63,6 +63,16 @@ IF(KOKKOS_CXX_COMPILER_ID STREQUAL Clang)
IF (INTERNAL_HAVE_CRAY_COMPILER) #not actually Clang
SET(KOKKOS_CLANG_IS_CRAY TRUE)
ENDIF()
# The clang based Intel compiler reports as Clang to most versions of CMake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going towards all vendor compiler being identified by CMake as Clang...

@@ -75,6 +78,7 @@ SET(ClangOpenMPFlag -fopenmp=libomp)

COMPILER_SPECIFIC_FLAGS(
Clang ${ClangOpenMPFlag} -Wno-openmp-mapping
IntelClang -fiopenmp -Wno-openmp-mapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using ClangOpenMPFlag if you bother setting it earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not fopenmp=libomp, its fiopenmp with an "i", because its not really clang ….

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but you define a variable line 48 and never use it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably because it wasn't used there before some recent update, which came into this branch via a rebase

core/perf_test/CMakeLists.txt Show resolved Hide resolved
core/perf_test/CMakeLists.txt Show resolved Hide resolved
core/src/impl/Kokkos_ClockTic.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc.hpp Show resolved Hide resolved
@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jul 8, 2020
Copy link

@dhollman dhollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks fine

Kokkos::Experimental::ScatterNonAtomic>
: Sum<ValueType, DeviceType> {
Kokkos::Experimental::ScatterNonAtomic> {
ValueType& value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean for this to be public? If so, putting public: two lines below makes it look like you didn't intend to do that.

Kokkos::Experimental::ScatterAtomic>
: Sum<ValueType, DeviceType> {
Kokkos::Experimental::ScatterAtomic> {
ValueType& value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}
KOKKOS_FORCEINLINE_FUNCTION void operator--() { update(ValueType(-1)); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably should have a reason for not using {} initialization with numeric literals to avoid accidental narrowing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what if ValueType is unsigned? It should still be OK, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what if ValueType is unsigned? It should still be OK, right?

It will do the right thing

Comment on lines 222 to 236
KOKKOS_FORCEINLINE_FUNCTION void operator+=(ValueType const& rhs) {
this->join(this->reference(), rhs);
}
KOKKOS_FORCEINLINE_FUNCTION void operator++() {
this->join(this->reference(), 1);
}
KOKKOS_FORCEINLINE_FUNCTION void operator++(int) {
this->join(this->reference(), 1);
update(rhs);
}
KOKKOS_FORCEINLINE_FUNCTION void operator++() { update(1); }
KOKKOS_FORCEINLINE_FUNCTION void operator++(int) { update(1); }
KOKKOS_FORCEINLINE_FUNCTION void operator-=(ValueType const& rhs) {
this->join(this->reference(), ValueType(-rhs));
}
KOKKOS_FORCEINLINE_FUNCTION void operator--() {
this->join(this->reference(), ValueType(-1));
}
KOKKOS_FORCEINLINE_FUNCTION void operator--(int) {
this->join(this->reference(), ValueType(-1));
update(ValueType(-rhs));
}
KOKKOS_FORCEINLINE_FUNCTION void operator--() { update(ValueType(-1)); }
KOKKOS_FORCEINLINE_FUNCTION void operator--(int) { update(ValueType(-1)); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these return void?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they did before. And yeah we probably should fix that as general ScatterView overhaul. Also #3162

Comment on lines 75 to 84
template <typename T>
KOKKOS_INLINE_FUNCTION void atomic_mul(volatile T* const dest, const T val) {
(void)atomic_fetch_mul(dest, val);
}

template <typename T>
KOKKOS_INLINE_FUNCTION void atomic_div(volatile T* const dest, const T val) {
(void)atomic_fetch_div(dest, val);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this have to do with OMPTarget?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing which is why its also in this PR: #3162

#if defined(KOKKOS_ENABLE_OPENMPTARGET) && defined(KOKKOS_ENABLE_CXX17)
#define KOKKOS_IMPL_IF_ON_HOST if constexpr (omp_is_initial_device() == true)
#else
#define KOKKOS_IMPL_IF_ON_HOST if (true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment explaining why this is sufficient on pre-C++17, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not, but OpenMPTarget now requires C++17, I can remove the KOKKOS_ENABLE_CXX17 here.

Kokkos::parallel_reduce(nw, functor_type(nw), result1_v, result2,
Kokkos::parallel_reduce("int_combined-reduce_mixed",
Kokkos::RangePolicy<TEST_EXECSPACE>(0, nw),
functor_type(nw), result1_v, result2,
Kokkos::Sum<int64_t, Kokkos::HostSpace>{result3_v});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a driveby change on combined reducers? If so, should it be in a separate pull request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not necessarily a drive-by change. It is a change specifically necessary for OpenMPTarget since OpenMPTarget doesn't have something like UVM or HostPinned memory which was used previously for these tests.

@masterleinad
Copy link
Contributor

The formatting is off and the CUDA-11.0-NVCC-C++17-RDC CI complains:

/var/jenkins/workspace/Kokkos/install/include/Cuda/Kokkos_Cuda_Parallel.hpp(471): error: calling a constexpr __host__ function("operator()") from a __device__ function("exec_range") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
          detected during:
            instantiation of "std::enable_if<std::is_same<TagType, void>::value, void>::type Kokkos::Impl::ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda>::exec_range<TagType>(Kokkos::Impl::ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda>::Member) const [with FunctorType=lambda [](int)->void, Traits=<Kokkos::CudaUVMSpace::execution_space>, TagType=void]" 
(493): here
            instantiation of "void Kokkos::Impl::ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda>::operator()() const [with FunctorType=lambda [](int)->void, Traits=<Kokkos::CudaUVMSpace::execution_space>]" 
/var/jenkins/workspace/Kokkos/install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(124): here
            instantiation of "void Kokkos::Impl::cuda_parallel_launch_local_memory(DriverType) [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int)->void, Kokkos::RangePolicy<Kokkos::CudaUVMSpace::execution_space>, Kokkos::CudaUVMSpace::execution_space>]" 
/var/jenkins/workspace/Kokkos/install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(478): here
            instantiation of "cudaFuncAttributes Kokkos::Impl::CudaParallelLaunch<DriverType, Kokkos::LaunchBounds<0U, 0U>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory>::get_cuda_func_attributes() [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int)->void, Kokkos::RangePolicy<Kokkos::CudaUVMSpace::execution_space>, Kokkos::CudaUVMSpace::execution_space>]" 
(501): here
            instantiation of "void Kokkos::Impl::ParallelFor<FunctorType, Kokkos::RangePolicy<Traits...>, Kokkos::Cuda>::execute() const [with FunctorType=lambda [](int)->void, Traits=<Kokkos::CudaUVMSpace::execution_space>]" 
/var/jenkins/workspace/Kokkos/install/include/Kokkos_Parallel.hpp(176): here
            instantiation of "void Kokkos::parallel_for(const ExecPolicy &, const FunctorType &, const std::__cxx11::string &, std::enable_if<Kokkos::is_execution_policy<ExecPolicy>::value, void>::type *) [with ExecPolicy=Kokkos::RangePolicy<Kokkos::CudaUVMSpace::execution_space>, FunctorType=lambda [](int)->void]" 
/var/jenkins/workspace/Kokkos/install/include/Kokkos_Parallel.hpp(218): here
            instantiation of "void Kokkos::parallel_for(const std::__cxx11::string &, const ExecPolicy &, const FunctorType &) [with ExecPolicy=Kokkos::RangePolicy<Kokkos::CudaUVMSpace::execution_space>, FunctorType=lambda [](int)->void]" 
/var/jenkins/workspace/Kokkos/core/unit_test/TestViewSubview.hpp(973): here
            instantiation of "void TestViewSubview::Impl::test_1d_assign_impl<Space,LayoutSub,Layout,LayoutOrg,MemTraits>() [with Space=Kokkos::CudaUVMSpace, LayoutSub=Kokkos::LayoutLeft, Layout=Kokkos::LayoutLeft, LayoutOrg=Kokkos::LayoutLeft, MemTraits=void]" 
/var/jenkins/workspace/Kokkos/core/unit_test/TestViewSubview.hpp(1734): here
            instantiation of "void TestViewSubview::test_1d_assign<Space,MemTraits>() [with Space=Kokkos::CudaUVMSpace, MemTraits=void]" 
/var/jenkins/workspace/Kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp(51): here

@crtrott
Copy link
Member Author

crtrott commented Jul 11, 2020

I ignored the ScatterView comments since that came in via a different PR, and we agreed to do another overhaul on that.

@crtrott crtrott merged commit eb93176 into kokkos:develop Jul 11, 2020
@crtrott crtrott deleted the omptarget-intel branch July 11, 2020 03:21
@@ -49,11 +49,19 @@ SET(SOURCES
)

IF(Kokkos_ENABLE_HIP)
# FIXME requires TeamPolicy
# FIXME HIP requires TeamPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the FIXME_HIP token

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you don't because it only said FIXME before ;-)

ENDIF()

IF(Kokkos_ENABLE_OPENMPTARGET)
# FIXME OPENMPTARGET requires TeamPolicy Reductions and Custom Reduction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mixing FIXME OPENMPTARGET with WORKAROUND OPENMPTARGET. The point was to have unique token that can be searched for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never used WORKAROUND in a cmake file, so I followed what was in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants