-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_scratch_size overflows #726
Comments
I'll fix the bug with the overflow. I also believe that AUTO is more likely to be right than recommended team_size, but it might be there is an issue that it thinks level 1 scratch is actually shared memory. I'll check that. |
Well, AUTO may be righter but recommended works :) so give me wronger. |
@bathmatt by overflow do you mean the integer overflowed and wrapped around, or do you mean that the amount of memory requested exceeded the limit of |
I was initially confused because your exact numbers actually didn't cause overflow for me, they were barely in the range of |
First, this is with the current version as of last night.
I've got a parallel for where I'm setting my level 1 scratch size to
TeamPolicy policy(n_leaves, Kokkos::AUTO, vector_len);
Kokkos::parallel_for(policy.set_scratch_size(1, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0)),*this);
mem_size is (cuda-gdb) p mem_size
$14 = 1045456
or just over a MB. Down in the code later
This overflows the 4B memory limit.
This is because team_size is too small and there is too much concurrency. I ask what the team size recommended is and it is 256, but auto sets it to 32 with AUTO. This is an example of AUTO not working. Shouldn't the memory sizes be taken into account on team size?
The text was updated successfully, but these errors were encountered: