-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda_internal_maximum_warp_count returns 8, but I believe it should return 16 for P100 #1269
Comments
Let me check this. |
Wow yeat, there is a comment from early 2012 on this. I think this is outdated, and also probably not used anywhere ... |
This is now fixed. I also changed the behavior of the normal RangePolicy to use a better heuristic for what the block size should be. Furthermore the old heuristic did NOT take register utilization into account, and could thus fail for very complex kernels. We were just lucky that most really complex kernels were already using Hierarchical Parallelism, which was using the better heuristics already. |
…e-1206 * 'issue-1206' of github.com:ndellingwood/kokkos: Issue kokkos#1206 - fix order of args to DynamicView in test_sort Issue kokkos#1206: Fix DynamicView API in test_sort in algorithms DynamicView: Address issue kokkos#1206 Attempt to get rid of warning Fix issue in deep_copy changes Fix an issue with the benchmark suite after changes in macros Fix warning with CUDA for OpenMP nthreads unused variable Fix issue kokkos#1269 Fix deep_copy between empty views issue kokkos#1369 Adding OpenMP InterOp test issue kokkos#1305 Fix CUDA interoperability and add unit test Fix issue kokkos#1363 : Deepcopy between rank-1 views with LayoutLeft/Right Adding ChunkSize constructor overload to RangePolicy. Error out when -arch not detected
I believe the maximum number of threads in a thread block on P100 is 32*16 = 512. I thought that cuda_internal_maximum_warp_count would return 512/Impl::CudaTraits::WarpSize = 512/32 = 16, but it returns 8. Is this a bug, or am I misinterpreting the purpose of the function? Thanks.
The text was updated successfully, but these errors were encountered: