Less restrictive TeamPolicy reduction on Cuda #286

crtrott · 2016-05-18T18:17:12Z

Allow arbitrary team sizes and vector_length > 1 for reductions using TeamPolicy.

mndevec · 2016-05-31T23:05:26Z

My shared memory allocations seems to be failing in this case on GPUs for parallel_reduce.

I set the shared memory size to:
size_t team_shmem_size (int team_size) const {
return 16384;
}

Then I try to allocate:
char *all_shared_memory = (char *) (teamMember.team_shmem().get_shmem(16384));

This is working for parallel_for, but when I use parallel_reduce, my allocation appears to be null. I am not sure if it is really related to vector size, since it also fails when vector_size = 1 as well.

crtrott added the Feature Request Create new capability; will potentially require voting label May 18, 2016

crtrott added this to the Spring 2016 milestone May 18, 2016

crtrott self-assigned this May 18, 2016

crtrott added the InDevelop label May 18, 2016

crtrott closed this as completed May 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less restrictive TeamPolicy reduction on Cuda #286

Less restrictive TeamPolicy reduction on Cuda #286

crtrott commented May 18, 2016

mndevec commented May 31, 2016

Less restrictive TeamPolicy reduction on Cuda #286

Less restrictive TeamPolicy reduction on Cuda #286

Comments

crtrott commented May 18, 2016

mndevec commented May 31, 2016