Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow allocate to be called with execution space #4826

Merged
merged 26 commits into from
Jul 7, 2022

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Feb 25, 2022

Based on #4823. This pull request allows allocating using a specific memory space. For Cuda it replaces a cudaDeviceSynchronize with a cudaStreamSynchronize (which could of course also be done already). As an alternative we could also use a dedicated stream (or the copy stream) instead of the one contained in the execution space passed.
Similar considerations hold for SYCL where we are always using a specific execution space instance/sycl::queue for memory allocations.

@masterleinad masterleinad added the Question For Kokkos internal and external contributors and users label Feb 25, 2022
@masterleinad masterleinad changed the title Implement desired fence behavior for view initialization Allow allocate to be called with execution space Feb 25, 2022
@masterleinad masterleinad force-pushed the allocate_with_exec_space branch 6 times, most recently from fc42ebb to 9768d51 Compare February 28, 2022 20:58
@@ -307,6 +307,35 @@ SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
"HostSpace");
}

SharedAllocationRecord<Kokkos::Experimental::HIPSpace, void>::
SharedAllocationRecord(
const Kokkos::Experimental::HIP& exec_space,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be named arg_exec_space for consistency?

Comment on lines 386 to 388
static_cast<Kokkos::Impl::ViewCtorProp<void, memory_space> const &>(
arg_prop)
.value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, would you mind factoring this expression out to a local mem_space variable? I think it would make the resulting distinction between the two calls easier to follow.

record = record_type::allocate(
static_cast<Kokkos::Impl::ViewCtorProp<void, memory_space> const&>(
arg_prop)
.value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto factoring out a mem_space local variable

KOKKOS_IMPL_CUDA_SAFE_CALL(cudaDeviceSynchronize());
cudaStream_t stream = exec_space.cuda_stream();
error_code = cudaMallocAsync(&ptr, arg_alloc_size, stream);
KOKKOS_IMPL_CUDA_SAFE_CALL(cudaStreamSynchronize(stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't just do this: this is a change in behavior which we need to discuss more. Specifically the base allocate will now not fence everything. I think this would be more agreeable if the version without executions space stays with the old behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we should discuss that but we still fence the stream/execution space provided anyway.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad requested a review from crtrott May 31, 2022 17:06
@masterleinad masterleinad removed the Question For Kokkos internal and external contributors and users label May 31, 2022
@masterleinad masterleinad added this to the Tentative 3.7 Release milestone Jun 6, 2022
@masterleinad
Copy link
Contributor Author

CUDA-11.6-NVHPC failing in KokkosCore_UnitTest_CudaTimingBased is clearly unrelated,

@masterleinad
Copy link
Contributor Author

Retest this please.

@dalg24 dalg24 merged commit 7ffb61a into kokkos:develop Jul 7, 2022
@dalg24
Copy link
Member

dalg24 commented Jul 12, 2022

Why was deallocate not affected by this change too?

@masterleinad
Copy link
Contributor Author

deallocate is normally not called explicitly and implicitly we don't have an execution space AFAIK. Also, deallocations imply a global fence at the moment so there is not much to gain (unless we try to reduce the scope of that fence).

@PhilMiller
Copy link
Contributor

The scope of synchronization from deallocate is something I'd eventually like to discuss - the C++ memory model only requires that an allocation that returns space intersecting with a preceding deallocation 'synchronizes-with' that deallocation. There's no requirement for global visibility of anything.

@ElisabethGiem
Copy link

Just as a note, this PR change broke resilient Kokkos. It was fixed by adding the new constructor to our code, but it did cause an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants