-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow allocate to be called with execution space #4826
Conversation
fc42ebb
to
9768d51
Compare
core/src/HIP/Kokkos_HIP_Space.cpp
Outdated
@@ -307,6 +307,35 @@ SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>:: | |||
"HostSpace"); | |||
} | |||
|
|||
SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>:: | |||
SharedAllocationRecord( | |||
const Kokkos::Experimental::HIP& exec_space, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be named arg_exec_space
for consistency?
29dee26
to
47c66a1
Compare
core/src/impl/Kokkos_ViewArray.hpp
Outdated
static_cast<Kokkos::Impl::ViewCtorProp<void, memory_space> const &>( | ||
arg_prop) | ||
.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, would you mind factoring this expression out to a local mem_space
variable? I think it would make the resulting distinction between the two calls easier to follow.
core/src/impl/Kokkos_ViewMapping.hpp
Outdated
record = record_type::allocate( | ||
static_cast<Kokkos::Impl::ViewCtorProp<void, memory_space> const&>( | ||
arg_prop) | ||
.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto factoring out a mem_space
local variable
17df8ae
to
b184f37
Compare
b184f37
to
f9c4b5b
Compare
core/src/Cuda/Kokkos_CudaSpace.cpp
Outdated
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaDeviceSynchronize()); | ||
cudaStream_t stream = exec_space.cuda_stream(); | ||
error_code = cudaMallocAsync(&ptr, arg_alloc_size, stream); | ||
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamSynchronize(stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't just do this: this is a change in behavior which we need to discuss more. Specifically the base allocate will now not fence everything. I think this would be more agreeable if the version without executions space stays with the old behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we should discuss that but we still fence the stream/execution space provided anyway.
Retest this please. |
3d09b2b
to
c2a2788
Compare
deda7df
to
b6b1f58
Compare
|
Retest this please. |
Why was |
|
The scope of synchronization from |
Just as a note, this PR change broke resilient Kokkos. It was fixed by adding the new constructor to our code, but it did cause an issue. |
Based on #4823. This pull request allows allocating using a specific memory space. For
Cuda
it replaces acudaDeviceSynchronize
with acudaStreamSynchronize
(which could of course also be done already). As an alternative we could also use a dedicated stream (or the copy stream) instead of the one contained in the execution space passed.Similar considerations hold for
SYCL
where we are always using a specific execution space instance/sycl::queue
for memory allocations.