Fix CUDA stream memory leak #3170

masterleinad · 2020-07-08T17:23:56Z

Rombur · 2020-07-08T18:18:42Z

This doesn't compile with UVM and the stream test fails for the other configurations.

dhollman

I'd prefer not to have std::shared_ptr in something this fundamental. One option is we could use a Kokkos::View<Impl::CudaInternal, Kokkos::HostSpace> as a first step, and then conditionally do the finalize in the destructor of Impl::CudaInternal or something like that.

dhollman · 2020-07-08T18:16:06Z

core/src/Kokkos_Cuda.hpp

  }
  uint32_t impl_instance_id() const noexcept { return 0; }

 private:
-  Impl::CudaInternal* m_space_instance;
+  std::shared_ptr<Impl::CudaInternal> m_space_instance;


This adds overhead everywhere in every application, even if they don't use streams. Also, anyone I'm not sure we're comfortable with that. Also, any downstream users who are (probably by accident, but still) copying execution space instances on the device will now get SEGFAULTs in code that previously worked fine.

why would it segfault? You mean If they try to assign it inside a kernel?

alternatively I would just reference count it explicitly (i.e. add an int, atomic increment it in copy and decrement it destructor (though that makes it non-trivial copyable)

no, if they try to copy. std::shared_ptr has a nontrivial copy constructor that CUDA would try to invoke on the device. Since a lot of our users ignore warnings, it would cause a crash.

(though that makes it non-trivial copyable)

It's non-trivially copyable anyway. std::shared_ptr isn't trivially copyable.

maybe we should add a desul::scoped_shared_ptr<AtomicScope>

CI does not pass

calewis

I looked, but have nothing to add. But I am trying to start the review process.

masterleinad · 2020-07-09T22:03:31Z

The shared_ptr implementation was failing because we seem to copy instances of this class on the device in the tasking framework (which is only enabled if Kokkos is compiled with relocatable device code) and the destructor would invoke a host function.
I couldn't get it to work with Kokkos::View since I was not able to figure out the correct combination of includes and forward declaration.
Hence, I went with @crtroot suggestion to basically reimplement the relevant parts of shared_ptr in the class (as a poor man's version). In particular, it is important that all the special member functions are actually __host__ __device__ functions for the tests to pass. I disabled any reference counting on the device, though to not mess with the pointers.
If that solution is acceptable, I am happy to encapsulate it a little better.

dalg24 · 2020-07-09T22:09:57Z

core/src/Cuda/Kokkos_Cuda_Instance.cpp

+  Kokkos::atomic_sub(m_counter, 1);
+  if (*m_counter <= 0) {


You need an atomic_sub_fetch or something

this needs sub_fetch and then check whether its zero.

int count = atomic_sub_fetch(m_counter,1); if(count==0) …

masterleinad · 2020-07-10T16:35:33Z

This seems to pass CI finally and I don't see any memory leaks in core/unit_test/KokkosCore_UnitTest_CudaInterOpStreams anymore.

masterleinad · 2020-07-10T19:18:47Z

cmake/kokkos_enable_devices.cmake

@@ -74,7 +74,7 @@ SET(ClangOpenMPFlag -fopenmp=libomp)
  ENDIF()

  COMPILER_SPECIFIC_FLAGS(
-    Clang      ${ClangOpenMPFlag}
+    Clang      ${ClangOpenMPFlag} -Wno-openmp-mapping


This should go away after rebasing.

crtrott · 2020-07-11T00:21:44Z

core/src/Cuda/Kokkos_Cuda_Instance.cpp

+#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
+  if (m_counter == nullptr) return;
+  int const count = Kokkos::atomic_fetch_sub(m_counter, 1);
+  if (count <= 1) {


we should consider throwing if count is 0, since that would indicate a reference counting failure. We definitely shouldn't delete for anything other than 1.

I think we can avoid throwing here. If we arrive here and the counter is less than 1 someone else will cleanup and there is nothing left to do.

crtrott · 2020-07-11T00:22:45Z

core/src/Cuda/Kokkos_Cuda_Instance.cpp

+  if (count <= 1) {
+    delete m_counter;
+    m_counter = nullptr;
+    if (m_use_stream) {


this check (and the variable m_use_stream) should be unnecessary. Either it was the singleton, and hence m_counter == nullptr and the code returned early, or it was constructed from a stream and needs to call finalize.

Yes, I agree. I should have revisited after deciding how to handle the singleton case.

crtrott · 2020-07-11T00:24:21Z

core/unit_test/cuda/TestCuda_InterOp_Streams.cpp

@@ -193,7 +193,7 @@ TEST(cuda, raw_cuda_streams) {
  CUDA_SAFE_CALL(cudaDeviceSynchronize());
  cudaStreamDestroy(stream);

-  int* h_p = new int[100];


ah does that resolve a leak?

Yes, we never freed the memory allocated here.

crtrott

Almost good to go. We should throw if the counter is ever returning zero (because it indicates a double free somewhere, and we don't need the m_use_stream, since m_counter==null implies m_use_stream==false and vice versa.

masterleinad · 2020-07-11T15:00:27Z

@crtrott I addressed your comments.

dalg24 previously approved these changes Jul 8, 2020

View reviewed changes

crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jul 8, 2020

dhollman reviewed Jul 8, 2020

View reviewed changes

calewis requested review from calewis and removed request for calewis July 8, 2020 18:56

calewis self-assigned this Jul 8, 2020

calewis removed their assignment Jul 8, 2020

dhollman assigned calewis Jul 8, 2020

calewis reviewed Jul 8, 2020

View reviewed changes

masterleinad force-pushed the fix_memory_leak_cuda_streams branch from 141b6f3 to f081d2e Compare July 9, 2020 21:32

dalg24 reviewed Jul 9, 2020

View reviewed changes

masterleinad force-pushed the fix_memory_leak_cuda_streams branch from f081d2e to 4902ac7 Compare July 10, 2020 03:44

masterleinad mentioned this pull request Jul 10, 2020

Missing Kokkos::atomic_*_fetch #3179

Closed

masterleinad commented Jul 10, 2020

View reviewed changes

crtrott reviewed Jul 11, 2020

View reviewed changes

crtrott requested changes Jul 11, 2020

View reviewed changes

masterleinad added 5 commits July 11, 2020 10:29

Fix CUDA stream memory leak

e346319

Don't destroy the stream in CudaInternal::finalize

adad5fe

Use custom implementation instead of std::shared_ptr

ec5c51f

make was_initialized/finalized non-static

c0dc2e7

Remove m_use_stream, improve destructor

a3954a2

masterleinad force-pushed the fix_memory_leak_cuda_streams branch from 5d77000 to a3954a2 Compare July 11, 2020 14:59

crtrott approved these changes Jul 11, 2020

View reviewed changes

crtrott merged commit 5699770 into kokkos:develop Jul 11, 2020

dalg24 mentioned this pull request Feb 10, 2021

Smart pointer with shared object ownership compatible with host and device #3791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA stream memory leak #3170

Fix CUDA stream memory leak #3170

masterleinad commented Jul 8, 2020

Rombur commented Jul 8, 2020

dhollman left a comment

dhollman Jul 8, 2020

crtrott Jul 8, 2020

crtrott Jul 8, 2020

dhollman Jul 8, 2020

dhollman Jul 8, 2020

crtrott Jul 8, 2020

calewis left a comment

masterleinad commented Jul 9, 2020

dalg24 Jul 9, 2020

crtrott Jul 9, 2020

crtrott Jul 9, 2020

masterleinad commented Jul 10, 2020 •

edited

masterleinad Jul 10, 2020

crtrott Jul 11, 2020

masterleinad Jul 11, 2020

crtrott Jul 11, 2020

masterleinad Jul 11, 2020

crtrott Jul 11, 2020

masterleinad Jul 11, 2020

crtrott left a comment

masterleinad commented Jul 11, 2020

Fix CUDA stream memory leak #3170

Fix CUDA stream memory leak #3170

Conversation

masterleinad commented Jul 8, 2020

Rombur commented Jul 8, 2020

dhollman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calewis left a comment

Choose a reason for hiding this comment

masterleinad commented Jul 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Jul 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crtrott left a comment

Choose a reason for hiding this comment

masterleinad commented Jul 11, 2020

masterleinad commented Jul 10, 2020 •

edited