Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_scratch_size overflows #726

Closed
bathmatt opened this issue Apr 11, 2017 · 4 comments
Closed

set_scratch_size overflows #726

bathmatt opened this issue Apr 11, 2017 · 4 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@bathmatt
Copy link

First, this is with the current version as of last night.
I've got a parallel for where I'm setting my level 1 scratch size to

TeamPolicy policy(n_leaves, Kokkos::AUTO, vector_len);
Kokkos::parallel_for(policy.set_scratch_size(1, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0)),*this);

mem_size is (cuda-gdb) p mem_size
$14 = 1045456

or just over a MB. Down in the code later

#15 0x00000000004a4f8a in Kokkos::Impl::ParallelFor<MatrixBuilder, Kokkos::TeamPolicy<Kokkos::Cuda>, Kokkos::Cuda>::ParallelFor (this=0x7fffffff94c0, arg_functor=..., arg_policy=...)
    at /home/mbetten/installs/install-for-drekar/cuda-DEBUG/include/Cuda/Kokkos_Cuda_Parallel.hpp:649
649	      m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
(cuda-gdb) p Cuda::concurrency()
$15 = 131072
(cuda-gdb) p m_team_size
$16 = 32
(cuda-gdb) p m_vector_size
$17 = 1

This overflows the 4B memory limit.

This is because team_size is too small and there is too much concurrency. I ask what the team size recommended is and it is 256, but auto sets it to 32 with AUTO. This is an example of AUTO not working. Shouldn't the memory sizes be taken into account on team size?

@crtrott crtrott self-assigned this Apr 11, 2017
@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Apr 11, 2017
@crtrott crtrott added this to the 2017-April-end milestone Apr 11, 2017
@crtrott
Copy link
Member

crtrott commented Apr 11, 2017

I'll fix the bug with the overflow. I also believe that AUTO is more likely to be right than recommended team_size, but it might be there is an issue that it thinks level 1 scratch is actually shared memory. I'll check that.

@bathmatt
Copy link
Author

Well, AUTO may be righter but recommended works :) so give me wronger.

@crtrott crtrott modified the milestones: 2017-April-end, 2017-June-end Apr 26, 2017
@ibaned ibaned added the Blocks Promotion Overview issue for release-blocking bugs label May 17, 2017
@ibaned ibaned assigned ibaned and unassigned crtrott May 17, 2017
@ibaned
Copy link
Contributor

ibaned commented May 18, 2017

@bathmatt by overflow do you mean the integer overflowed and wrapped around, or do you mean that the amount of memory requested exceeded the limit of cudaMalloc ?

ibaned added a commit that referenced this issue May 18, 2017
This should fix the basic problem in #726,
which is that if the total CUDA scratch space
needed does not fit in CudaSpace::size_type
(unsigned int), it would overflow.
Using int64_t at @ctrott's recommendation
instead of size_t.
@ibaned
Copy link
Contributor

ibaned commented May 18, 2017

I was initially confused because your exact numbers actually didn't cause overflow for me, they were barely in the range of uint32_t, but there was still a legitimate overflow potential bug. That will be fixed by #819.

@crtrott crtrott removed the Blocks Promotion Overview issue for release-blocking bugs label May 19, 2017
@crtrott crtrott self-assigned this May 19, 2017
@crtrott crtrott closed this as completed May 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

3 participants