SharedMemory Support for Lambdas #81

crtrott · 2015-09-06T18:58:50Z

This is a topic we discussed a couple of times, but I'd like to get feedback from users.
Currently Team shared memory is only supported with functors. The functor must have a member function which returns how much shared memory it needs based on the team size.

One option to enable shared memory for lambdas would be to add options on the TeamPolicy:

TeamPolicy<> policy(N,M);
policy.set_shared_memory_size(size1);
policy.set_shared_memory_per_team_member(size2);

parallel_for(policy, ...);
This would request size1+M*size2 bytes of shared memory.

Alternatively one could also do:
policy.add_shared_memory_size(size1);
policy.add_shared_memory_per_team_member(size2);
policy.add_shared_memory_size(size3);
policy.add_shared_memory_per_team_member(size4);

This would request size1+size3+M*(size2+size4) bytes of shared memory.

Thoughts?

mhoemmen · 2015-09-07T18:05:45Z

What happens if both the policy and the functor have that method, to determine shared memory size? Which one does Kokkos pick?

crtrott · 2015-09-07T21:47:08Z

Interesting question. There are a couple of options:
(i) Throw if they don't agree
(ii) Take the max of the two
(iii) Precedent for Functor since it knows the work best.
(iv) Precedent for Policy: since it is a specific call position for a specific functor.

I know this is an issue, but I still think we need to support it for Lambdas. Everybody wants lambdas, including me. And the more I use them the less I want to write explicit Functors ....

mhoemmen · 2015-09-07T22:16:27Z

If the plan is to favor lambdas over functors, it would make sense to look at the Policy first. On the other hand, unexpected behavior is bad. I would say this: If both the functor and the Policy have the method, and if the functor asks for more shared memory than the Policy, report an error. Otherwise, the Policy determines the size. (The Policy controls the range, and league and team sizes, so it should control other aspects of the hardware as well.)

I think functors and lambdas should both treat shared memory allocations as actions that could fail. They should check the allocation result and reduce over the error code. This should catch any mismatch between the Policy's specification and the functor's / lambda's expectation.

crtrott · 2015-10-22T03:04:42Z

Ok I am implementing a variant right now and it works. But I am not quite sure about the interface. This is related to another thing we consider doing, for a more general scratch memory mechanism.

What I implemented right now is:

TeamPolicy<>(league_size, team_size, [vector_length ,] TeamScratchSize(per_team_size, [per_thread_size=0]) );

Eventually I would like to specify scratch sizes for multiple memory spaces for the same functor.
This is something we have now encountered in multiple big apps, that we need scratch memory in various size regimes, going from kB to GB. This happens in Lulesh, Nalu, and the SM apps.
In those cases they currently allocate std::vectors inside the iterations (or at least resize them). In parallel I need copies of those for every concurrently handled iteration. Some of those allocations are for the innermost levels (TeamThread loop) some are on the outer level. The sizes vary as well. Some of the smaller ones could fit in real team shared memory, others must stay in some larger space.

Let me know what you think of this start. Even if I push its in the Experimental namespace for now.

crtrott · 2015-10-22T03:11:27Z

OK I thought a bit more about it. Maybe its enough to use that interface but restrict a bit what you can give. Effectively a TeamPolicy would accept a templated TeamScratchSize but the MemoyrSpace you can specify is limited to TeamPolicy::execution_space::memory_space and TeamPolicy::execution_space::scratch_memory_space. Something like this:

TeamPolicy(league_size,team_size,
TeamScratchSize(per_team_CS,per_thread_CS),
TeamScratchSizeCuda::scratch_memory_space(per_team_CSS,per_thread_CSS))

mhoemmen · 2015-10-23T03:33:26Z

This is relevant for Tpetra too. Consider a parallel loop that iterates over local rows of a Tpetra::RowMatrix. RowMatrix is abstract; it exposes a row's entries by copying them into user-provided space. The number of entries per row can vary a lot in theory, though it might not vary too much in practice. Of course this isn't really the right interface for fine-grained parallelism, but it would make sense for this to work in the common case. Users' loop body should check if the scratch allocation suffices, and fail out for later retry if it doesn't.

The std::vector reuse code is doing something analogous to a loop over rows of a Tpetra::RowMatrix, so fixing one case should fix the other.

This adds shared memory support for lambdas according to issue #81 to Cuda, Pthreads and Serial. It also adds a unit test. This does not add the generic scratch space discussed in issue #81.

crtrott · 2015-10-31T05:41:38Z

This is now available as an experimental feature

This adds shared memory support for lambdas according to issue kokkos#81 to Cuda, Pthreads and Serial. It also adds a unit test. This does not add the generic scratch space discussed in issue kokkos#81.

crtrott · 2015-11-23T18:55:32Z

After some discussion we want actually an interface for scratch levels. More details later.

crtrott · 2016-01-14T22:27:58Z

In light of the interface decisions we did with chunk size here is my new proposed interface:

TeamPolicy<>(n,m).set_scratch_size(Level,PerTeam(size),PerThread(size))

Either PerThread or PerTeam is optional (you can give only one).

hcedwar · 2016-03-30T18:49:48Z

Sufficient for the current API. Will be revisited in Summer comprehensive technical review.

crtrott added the Feature Request Create new capability; will potentially require voting label Sep 6, 2015

crtrott added the Experimental Develop label Oct 31, 2015

crtrott mentioned this issue Nov 23, 2015

Tagged shared memory requests for multi-operator functor #98

Closed

crtrott added this to the GTC 2016 milestone Nov 23, 2015

crtrott self-assigned this Jan 14, 2016

hcedwar closed this as completed Mar 30, 2016

hcedwar added the InDevelop label Mar 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SharedMemory Support for Lambdas #81

SharedMemory Support for Lambdas #81

crtrott commented Sep 6, 2015

mhoemmen commented Sep 7, 2015

crtrott commented Sep 7, 2015

mhoemmen commented Sep 7, 2015

crtrott commented Oct 22, 2015

crtrott commented Oct 22, 2015

mhoemmen commented Oct 23, 2015

crtrott commented Oct 31, 2015

crtrott commented Nov 23, 2015

crtrott commented Jan 14, 2016

hcedwar commented Mar 30, 2016

SharedMemory Support for Lambdas #81

SharedMemory Support for Lambdas #81

Comments

crtrott commented Sep 6, 2015

mhoemmen commented Sep 7, 2015

crtrott commented Sep 7, 2015

mhoemmen commented Sep 7, 2015

crtrott commented Oct 22, 2015

crtrott commented Oct 22, 2015

mhoemmen commented Oct 23, 2015

crtrott commented Oct 31, 2015

crtrott commented Nov 23, 2015

crtrott commented Jan 14, 2016

hcedwar commented Mar 30, 2016