[BUG] cudaErrorMemoryAllocation with KOKKOS on Volta GPU #1473

danicholson · 2019-05-22T14:50:37Z

Summary

My KOKKOS/CUDA simulation crashes on a cudaErrorMemoryAllocation after ~14 million steps on a Titan V GPU. It is a molecular system without electrostatics.

LAMMPS Version and Platform

LAMMPS: 15 May 2019
OS: Centos 7
GCC: 4.8.5
CUDA: 9.1
GPU: Titan V
CPU: Xeon E5-2630 v4

compiled with:

cmake -D BUILD_MPI=no -D BUILD_OMP=no -D PKG_MOLECULE=yes -D KOKKOS_ARCH="BDW;Volta70" -D PKG_KOKKOS=yes -D KOKKOS_ENABLE_CUDA=yes -D KOKKOS_ENABLE_OPENMP=no -D KOKKOS_ENABLE_DEBUG=yes -D CMAKE_CXX_COMPILER=/home/david/git/lammps-clean/lib/kokkos/bin/nvcc_wrapper -D CMAKE_BUILD_TYPE=Debug ../../cmake/

Expected Behavior

The script runs without issue on CPUs

Actual Behavior

The simulation crashes with the following error:

warning: Cuda API error detected: cudaCreateTextureObject returned (0x2)

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL ) error( cudaErrorMemoryAllocation): out of memory /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:290
Traceback functionality not available

Steps to Reproduce

Unfortunately I don't have a small system to reproduce this error quickly. I run the script below with default Kokkos settings "-sf kk -k on g 1".

Further Information, Files, and Links

files:
relax.in.txt
relax_440K.data.txt

Call stack from cuda-gdb:

#0  0x00002aaaacff0207 in raise () from /lib64/libc.so.6
#1  0x00002aaaacff18f8 in abort () from /lib64/libc.so.6
#2  0x00002aaaac7fb7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00002aaaac7f9746 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /lib64/libstdc++.so.6
#4  0x00002aaaac7f9773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00002aaaac7f9993 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00000000023a22ea in Kokkos::Impl::throw_runtime_exception (msg=...)
    at /home/david/git/lammps-clean/lib/kokkos/core/src/impl/Kokkos_Error.cpp:72
#7  0x00000000023ab169 in Kokkos::Impl::cuda_internal_error_throw (e=cudaErrorMemoryAllocation, 
    name=0x24ed160 <Kokkos::(anonymous namespace)::AllowPadding+792> "cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL )", 
    file=0x24ece68 <Kokkos::(anonymous namespace)::AllowPadding+32> "/home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp", 
    line=290) at /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:129
#8  0x00000000017e12ba in Kokkos::Impl::cuda_internal_safe_call (e=cudaErrorMemoryAllocation, 
    name=0x24ed160 <Kokkos::(anonymous namespace)::AllowPadding+792> "cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL )", 
    file=0x24ece68 <Kokkos::(anonymous namespace)::AllowPadding+32> "/home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp", 
    line=290) at /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Error.hpp:58
#9  0x00000000023a9148 in Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>::attach_texture_object (sizeof_alias=4, 
    alloc_ptr=0x2aaadd022a00, alloc_size=60136) at /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:290
#10 0x000000000197b1f0 in Kokkos::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>::attach_texture_object<int> (this=0x23737ea0)
    at /home/david/git/lammps-clean/lib/kokkos/core/src/Kokkos_CudaSpace.hpp:776
#11 0x0000000001978e2e in Kokkos::Impl::CudaTextureFetch<int const, int>::CudaTextureFetch<Kokkos::CudaSpace> (this=0x7fffffffd2e0, 
    arg_ptr=0x2aaadd022a80, record=0x23737ea0) at /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp:129
#12 0x000000000197230d in Kokkos::Impl::ViewDataHandle<Kokkos::ViewTraits<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<2u> >, void>::assign (arg_data_ptr=0x2aaadd022a80, arg_tracker=...)
    at /home/david/git/lammps-clean/lib/kokkos/core/src/Cuda/Kokkos_Cuda_View.hpp:301
#13 0x000000000196ce22 in Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<2u> >, Kokkos::ViewTraits<int*, Kokkos::LayoutLeft, Kokkos::Cuda, void>, void>::assign(Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<2u> ><void> >&, Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<int*, Kokkos::LayoutLeft, Kokkos::Cuda, void><void> > const&, Kokkos::Impl::SharedAllocationTracker const&) (dst=..., src=..., 
    src_track=...) at /home/david/git/lammps-clean/lib/kokkos/core/src/impl/Kokkos_ViewMapping.hpp:3009
#14 0x000000000196b6ef in Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<2u> >::operator=<int*, Kokkos::LayoutLeft, Kokkos::Cuda, void>(Kokkos::View<int*<Kokkos::LayoutLeft, Kokkos::Cuda, void> > const&) (this=0x22354df0, rhs=...)
    at /home/david/git/lammps-clean/lib/kokkos/core/src/Kokkos_View.hpp:1985
#15 0x00000000019511cb in LAMMPS_NS::NeighBondKokkos<Kokkos::Cuda>::build_topology_kk (this=0x22354a10)
    at /home/david/git/lammps-clean/src/KOKKOS/neigh_bond_kokkos.cpp:223
#16 0x00000000019351fe in LAMMPS_NS::NeighborKokkos::build_topology (this=0x22353530)
---Type <return> to continue, or q <return> to quit---
    at /home/david/git/lammps-clean/src/KOKKOS/neighbor_kokkos.cpp:388
#17 0x0000000001938f26 in LAMMPS_NS::NeighborKokkos::build_kokkos<Kokkos::Cuda> (this=0x22353530, topoflag=1)
    at /home/david/git/lammps-clean/src/KOKKOS/neighbor_kokkos.cpp:322
#18 0x0000000001934e24 in LAMMPS_NS::NeighborKokkos::build (this=0x22353530, topoflag=1)
    at /home/david/git/lammps-clean/src/KOKKOS/neighbor_kokkos.cpp:237
#19 0x0000000001d35b0f in LAMMPS_NS::VerletKokkos::run (this=0x223716b0, n=20000000) at /home/david/git/lammps-clean/src/KOKKOS/verlet_kokkos.cpp:397
#20 0x00000000017cf780 in LAMMPS_NS::Run::command (this=0x7fffffffd990, narg=1, arg=0x22361f10) at /home/david/git/lammps-clean/src/run.cpp:183
#21 0x0000000001671353 in LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run> (lmp=0xcfb7710, narg=1, arg=0x22361f10)
    at /home/david/git/lammps-clean/src/input.cpp:873
#22 0x000000000166b1b8 in LAMMPS_NS::Input::execute_command (this=0x22315490) at /home/david/git/lammps-clean/src/input.cpp:856
#23 0x000000000166898c in LAMMPS_NS::Input::file (this=0x22315490) at /home/david/git/lammps-clean/src/input.cpp:243
#24 0x00000000013f00b3 in main (argc=9, argv=0x7fffffffdc48) at /home/david/git/lammps-clean/src/main.cpp:64

The text was updated successfully, but these errors were encountered:

stanmoore1 · 2019-05-22T15:03:25Z

@danicholson you are running out of memory: error( cudaErrorMemoryAllocation): out of memory. The Titan V only has 12 GB of GPU memory, which is probably much less than your CPU. From your stack trace, it runs out of memory when building an atom map--currently the Kokkos package only supports the "array" style atom map which is more memory intensive than the "hash" style. That said, with only 15000 atoms I wouldn't expect OOM. I'll take a look, the Kokkos library has some nice memory profiling tools.

danicholson · 2019-05-22T15:25:48Z

@stanmoore1 Thanks for the reply. In a previous test, nvidia-smi did not report high memory usage. I wrote a script to check it every minute or so, and the usage was ~600 MiB at the last check before the error occurred. I also used the KOKKOS tool to check the GPU memory usage for a shorter simulation and it was stationary around a similar value. Could this error be due to memory fragmentation?

stanmoore1 · 2019-05-22T15:34:41Z

usage was ~600 MiB

Yeah that seems reasonable for your system size. The error mentions texture memory (i.e. Kokkos randomread memory), which that atom map variable uses, perhaps switching to regular global memory would fix the issue.

stanmoore1 · 2019-05-22T15:39:37Z

This issue sounds very similar to #542.

stanmoore1 · 2019-05-22T17:03:05Z

I've checked with Valgrind and Kokkos profiling tools and I don't see any memory growth over time. Can you describe how you are getting LAMMPS source code, are you cloning the GIT repo? Just to be absolutely certain, you used either make pu or make yes-kokkos after you update your repo?

stanmoore1 · 2019-05-22T17:23:03Z

I guess you are using cmake, so the comment about make pu wouldn't apply.

danicholson · 2019-05-22T17:23:37Z

For this test I cloned the repo fresh and checked out the unstable branch. I built using cmake (the command is in the issue description above) so I did not execute make pu or make yes-kokkos.

danicholson · 2019-05-22T17:26:29Z

I can do a test run using global memory rather than texture memory. Would this just require declaring map_array as typename AT::t_int_1d rather than typename AT::t_int_1d_randomread in neigh_bond_kokkos.h?

stanmoore1 · 2019-05-22T17:29:03Z

Yes, though looking back at #542, it also failed with the exact same error, and the root cause was a memory leak not texture memory, so I'm doubtful that will help.

stanmoore1 · 2019-05-22T18:07:41Z

I'm running this on a V100 GPU to see if I can reproduce the OOM. It will take several hours to reach 14 million timesteps, and may need to run even longer since V100 has 16 GB of memory instead of 12 GB for Titan V.

stanmoore1 · 2019-05-22T18:11:07Z

Other than a memory leak, something else in the simulation could be blowing up which is leading to a large memory allocation.

You could try writing a restart file every 1 million timesteps, and then after if fails at 14 million timesteps, read back in the latest restart file before it failed, and try running it again. If it fails at the same spot as before, then it is probably something wrong with the simulation. If it runs for another 14 million timesteps then fails, could be a memory leak.

danicholson · 2019-05-22T18:27:27Z

Thank you very much for you attention to this issue. I will run the test that you suggested. Based on a CPU-only run with the same input, this system should reach a stable equilibrium in a few nanoseconds, but I agree that it is worth checking.

stanmoore1 · 2019-05-22T18:40:11Z

Based on a CPU-only run with the same input, this system should reach a stable equilibrium in a few nanoseconds

Yes, I mean there may be a bug that is triggered by very rare events that causes an atom to explode out of the box, or something like that.

stanmoore1 · 2019-05-22T18:41:56Z

This looks more like a memory leak than that type of bug though, just can't find any evidence yet.

danicholson · 2019-05-23T13:50:37Z

A few things:

Starting from a restart at 12 million steps, the script runs for 14 million steps then quits on the same error.
The GPU memory usage and host RSS are increasing with time, but never approaching host/device limits.
When using device memory for both map_array and sametag, the script runs without error. I did not monitor memory usage for the whole run, only the last few minutes. For this period, the RSS and GPU memory usage were constant at values of 680 MB and 530 MiB respectively. These are close to the starting values of the runs performed with texture memory for these arrays.

Based on the profiling tool, it looks like most of the large allocations during the run are for sametag and map_array, hence the decision to use device memory for both rather than just map_array. To me, it looks like there is some sort of leak related to the texture memory.

stanmoore1 · 2019-05-23T14:52:01Z

I can reproduce this on V100. It failed just before 14 million timesteps.

stanmoore1 · 2019-05-23T17:42:33Z

@crtrott any ideas?

stanmoore1 · 2019-05-23T19:15:33Z

@danicholson I submitted another job on V100 with memory profiling to see if I can reproduce the memory growth you saw.

danicholson · 2019-05-23T19:21:10Z

@stanmoore1 Just to clarify, I did not use the kokkos profiling tools to monitor the memory. I used pmap and nvidia-smi.

stanmoore1 · 2019-05-23T19:25:28Z

@danicholson understood. Kokkos tools use getrusage to get total host memory, which should show the same rss growth as you saw. For the GPU memory, the Kokkos tools will tell if the leak is in Kokkos Views or not.

stanmoore1 · 2019-05-24T14:46:21Z

I do see significant host rss growth over 12 million timesteps. However the memory in Kokkos Views stays constant, at least according to the Kokkos profiling tool.

stanmoore1 · 2019-05-24T16:53:02Z

@danicholson I can confirm that the leak goes away if I don't use texture memory for map_array and sametag. This looks like a bug outside LAMMPS, i.e. in Kokkos or CUDA. That said, the code shouldn't be reallocating those arrays every time the neigh list is build. I'm guessing that would also fix the problem.

stanmoore1 · 2019-05-24T17:33:02Z

@danicholson my test shows #1474 fixes this issue. Could help performance a little too since it isn't reallocating as often.

stanmoore1 · 2019-05-24T18:23:44Z

I created a small reproducer and reported this to the Kokkos library developers: kokkos/kokkos#2155. @danicholson thanks for the bug report.

danicholson · 2019-05-24T18:42:26Z

@stanmoore1 Great, thank you for the investigation and the fix. I'm running it now, and the memory usage looks stable.

stanmoore1 · 2019-05-24T20:10:00Z

@danicholson sure, let us know if you see other issues.

danicholson added the bug label May 22, 2019

stanmoore1 mentioned this issue May 24, 2019

Don't reallocate views every time in neigh_bond_kokkos #1474

Merged

stanmoore1 mentioned this issue May 24, 2019

Kokkos randomread Views leak memory kokkos/kokkos#2155

Closed

akohlmey closed this as completed in #1474 May 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cudaErrorMemoryAllocation with KOKKOS on Volta GPU #1473

[BUG] cudaErrorMemoryAllocation with KOKKOS on Volta GPU #1473

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 23, 2019

danicholson commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

danicholson commented May 24, 2019 •

edited

stanmoore1 commented May 24, 2019

[BUG] cudaErrorMemoryAllocation with KOKKOS on Volta GPU #1473

[BUG] cudaErrorMemoryAllocation with KOKKOS on Volta GPU #1473

Comments

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 22, 2019

stanmoore1 commented May 22, 2019

stanmoore1 commented May 22, 2019

danicholson commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 23, 2019

danicholson commented May 23, 2019

stanmoore1 commented May 23, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

stanmoore1 commented May 24, 2019

danicholson commented May 24, 2019 • edited

stanmoore1 commented May 24, 2019

danicholson commented May 24, 2019 •

edited