Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize deep_copy_space early avoiding printing to std::cerr for Cuda #5151

Merged
merged 3 commits into from
Jul 11, 2022

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Jun 27, 2022

We are currently printing an error message to std::cerr when finalizing Cuda execution spaces as the first commit shows.

The problem is that we are finalizing deep_copy_space since that needs to deallocate data which calls a fence on the default execution space and we check if the execution space has already unset m_dev at this point. Wd didn't see this so far, since we are only printing to std::cerr but not throwing an exception or aborting.
The proposed fix is to finalize deep_copy_space before unsettling m_dev.

While looking at this, I also found that some of the code in finalize is guarded by a check for scratch space which didn't make sense to me.

Throwing in finalize is debatable, of course, but makes the problem obvious (in the CI).

@masterleinad masterleinad marked this pull request as ready for review June 27, 2022 20:52
@masterleinad masterleinad changed the title Throw exception when verify_is_initialized failed Finalize deep_copy_space early avoiding printing to std::cerr for Cuda Jun 27, 2022
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would say looks good. (I would suggest we call abort() instead of throwing an exception though.)
But, I suppose you are looking into this because of @stanmoore1 report on Slack

Stan Moore [12:14 PM]
I'm still seeing this error in LAMMPS:

Kokkos::Cuda::Cuda instance constructor : ERROR device not initialized

and I don't quite follow how either of Cuda::Cuda() or Cuda::Cuda(cudaStream_t, bool) would get called.

Please elaborate.

@masterleinad
Copy link
Contributor Author

The failing stacktrace looks like

abort@libc.so.6
Kokkos::Impl::CudaInternal::verify_is_initialized@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:336
Kokkos::Cuda::Cuda@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:880
Kokkos::Impl::HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>::get_record@../../lib/kokkos/core/src/impl/Kokkos_SharedAlloc_timpl.hpp:265
Kokkos::Impl::CudaInternal::finalize@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:731
Kokkos::Impl::CudaInternal::finalize@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:774
Kokkos::Cuda::impl_finalize@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:875
Kokkos::Impl::CudaSpaceInitializer::finalize@../../lib/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:947
Kokkos::Impl::ExecSpaceManager::finalize_spaces@../../lib/kokkos/core/src/impl/Kokkos_Core.cpp:125
Kokkos::Impl::(anonymous@../../lib/kokkos/core/src/impl/Kokkos_Core.cpp:561
Kokkos::finalize@../../lib/kokkos/core/src/impl/Kokkos_Core.cpp:1033
LAMMPS_NS::KokkosLMP::finalize@../kokkos.cpp:350
main@../main.cpp:108

@dalg24
Copy link
Member

dalg24 commented Jun 28, 2022

Thanks so it is

RecordCuda::decrement(RecordCuda::get_record(m_scratchFlags));

which calls
if (alloc_ptr) {
typename MemorySpace::execution_space exec_space;
Kokkos::Impl::DeepCopy<HostSpace, MemorySpace, decltype(exec_space)>(
exec_space, &head, head_cuda, sizeof(SharedAllocationHeader));
exec_space.fence(
"HostInaccessibleSharedAllocationRecordCommon::get_record(): fence "
"after copying header to HostSpace");
}

and the issue is our default constructing an execution space on L265.
This is a release 3.6 defect (introduced in #4478)

I suppose we should use Cuda::impl_static_fence(...) after the deep copy instead of fence().

@masterleinad
Copy link
Contributor Author

I suppose we should use Cuda::impl_static_fence(...) after the deep copy instead of fence().

I still think we should just fence the stream we are using for the copy.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah lets abort.

@masterleinad
Copy link
Contributor Author

Yeah lets abort.

See fbea785.

@masterleinad masterleinad added this to In progress in Kokkos Release 3.7 -- 2022 Target Date via automation Jul 11, 2022
@masterleinad masterleinad added Blocks Promotion Overview issue for release-blocking bugs labels Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Development

Successfully merging this pull request may close these issues.

None yet

3 participants