set_scratch_size overflows #726

bathmatt · 2017-04-11T20:06:30Z

First, this is with the current version as of last night.
I've got a parallel for where I'm setting my level 1 scratch size to

TeamPolicy policy(n_leaves, Kokkos::AUTO, vector_len);
Kokkos::parallel_for(policy.set_scratch_size(1, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0)),*this);

mem_size is (cuda-gdb) p mem_size
$14 = 1045456

or just over a MB. Down in the code later

#15 0x00000000004a4f8a in Kokkos::Impl::ParallelFor<MatrixBuilder, Kokkos::TeamPolicy<Kokkos::Cuda>, Kokkos::Cuda>::ParallelFor (this=0x7fffffff94c0, arg_functor=..., arg_policy=...)
    at /home/mbetten/installs/install-for-drekar/cuda-DEBUG/include/Cuda/Kokkos_Cuda_Parallel.hpp:649
649	      m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
(cuda-gdb) p Cuda::concurrency()
$15 = 131072
(cuda-gdb) p m_team_size
$16 = 32
(cuda-gdb) p m_vector_size
$17 = 1

This overflows the 4B memory limit.

This is because team_size is too small and there is too much concurrency. I ask what the team size recommended is and it is 256, but auto sets it to 32 with AUTO. This is an example of AUTO not working. Shouldn't the memory sizes be taken into account on team size?

The text was updated successfully, but these errors were encountered:

crtrott · 2017-04-11T20:29:17Z

I'll fix the bug with the overflow. I also believe that AUTO is more likely to be right than recommended team_size, but it might be there is an issue that it thinks level 1 scratch is actually shared memory. I'll check that.

bathmatt · 2017-04-11T21:19:36Z

Well, AUTO may be righter but recommended works :) so give me wronger.

ibaned · 2017-05-18T19:53:58Z

@bathmatt by overflow do you mean the integer overflowed and wrapped around, or do you mean that the amount of memory requested exceeded the limit of cudaMalloc ?

@ctrott

This should fix the basic problem in #726, which is that if the total CUDA scratch space needed does not fit in CudaSpace::size_type (unsigned int), it would overflow. Using int64_t at @ctrott's recommendation instead of size_t.

ibaned · 2017-05-18T22:20:03Z

I was initially confused because your exact numbers actually didn't cause overflow for me, they were barely in the range of uint32_t, but there was still a legitimate overflow potential bug. That will be fixed by #819.

crtrott self-assigned this Apr 11, 2017

crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Apr 11, 2017

crtrott added this to the 2017-April-end milestone Apr 11, 2017

crtrott modified the milestones: 2017-April-end, 2017-June-end Apr 26, 2017

ibaned added the Blocks Promotion Overview issue for release-blocking bugs label May 17, 2017

ibaned assigned ibaned and unassigned crtrott May 17, 2017

ibaned mentioned this issue May 18, 2017

64-bit Cuda scratch types to avoid overflow #819

Merged

ibaned added bug - fix pushed to develop branch labels May 19, 2017

crtrott removed the Blocks Promotion Overview issue for release-blocking bugs label May 19, 2017

crtrott added a commit that referenced this issue May 19, 2017

Fixing some warnings related to the fix for #726

5909f54

crtrott self-assigned this May 19, 2017

crtrott closed this as completed May 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set_scratch_size overflows #726

set_scratch_size overflows #726

bathmatt commented Apr 11, 2017

crtrott commented Apr 11, 2017

bathmatt commented Apr 11, 2017

ibaned commented May 18, 2017

ibaned commented May 18, 2017

set_scratch_size overflows #726

set_scratch_size overflows #726

Comments

bathmatt commented Apr 11, 2017

crtrott commented Apr 11, 2017

bathmatt commented Apr 11, 2017

ibaned commented May 18, 2017

ibaned commented May 18, 2017