Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel_launch_local_memory and cuda 7.5 #125

Closed
bathmatt opened this issue Nov 9, 2015 · 1 comment
Closed

parallel_launch_local_memory and cuda 7.5 #125

bathmatt opened this issue Nov 9, 2015 · 1 comment
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)

Comments

@bathmatt
Copy link

bathmatt commented Nov 9, 2015

Getting this error
/home/mbetten/Trilinos/cuda-intrepid-install-opt/include/Cuda/Kokkos_CudaExec.hpp(181):
Error: Formal parameter space overflowed (4096 bytes max) in function ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI19WeightChargeFunctorNS_10TeamPolicyINS_4CudaEvS5_EEEEEEvT

Christian said
Ok I found it. It is in the new more accurate function to figure out what the best team size etc is. You find it in this file:
kokkos/core/src/Cuda/Kokkos_Cuda_Internal.hpp

If you for now replace all "cuda_parallel_launch_local" with "cuda_parallel_launch_constant" in that file it should work again.

I need to split the functions and make the "Large" check a template parameter, so that not both branches are instantiated for
each functor. Bummer. We also need to add a functor test larger than 4kB to our test suite to catch this the next time.

Christian

@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Nov 10, 2015
crtrott added a commit that referenced this issue Nov 10, 2015
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size.
This makes the choice of constant vs local memory a template parameter
defaulted by the size of the existing DriverType template parameter.
It also changes the interface by adding a new shmem_extra argument which is
required for lambdas since the functor in those cases doesn't have a
shmem size function.

Both functions are part of the impl namespace and thus not public yet.
hcedwar pushed a commit to hcedwar/kokkos that referenced this issue Nov 12, 2015
Fixes an issue with cuda_get_max_block_size and cuda_get_opt_block_size.
This makes the choice of constant vs local memory a template parameter
defaulted by the size of the existing DriverType template parameter.
It also changes the interface by adding a new shmem_extra argument which is
required for lambdas since the functor in those cases doesn't have a
shmem size function.

Both functions are part of the impl namespace and thus not public yet.
@crtrott
Copy link
Member

crtrott commented Nov 12, 2015

Pushed to master

@crtrott crtrott closed this as completed Nov 12, 2015
@crtrott crtrott self-assigned this Sep 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

2 participants