Mature task-DAG capability #320

ndellingwood · 2016-06-08T18:42:23Z

Maturing the LDRD prototype task policy and implement for all back-ends.

hcedwar · 2016-07-27T18:30:40Z

In develop without intra-team collectives. These are in progress being developed by James - summer intern.

…inished and tested for Cuda, OpenMP, and Serial. Cuda Task team collectives for ThreadVectorRange reduce and scan written not yet tested. Remove redundant setup and teardown in OpenMP unit test files. Work around gcc 4.7.2 and gcc 5.1.0 bug where using a captured variable twice genertes an error of redeclaring the captured variable. Progress on issue #320

hcedwar · 2016-08-30T19:00:19Z

Intra-team collectives are partially implemented for the Summer milestone. This feature request will have to carry over to the next milestone to be completed.

hcedwar · 2016-09-30T14:40:23Z

Kokkos technical review 9/8/16 decision: rename TaskPolicy to TaskScheduler.
Renaming due to discussion that arguments to spawn functions are an execution policy for a given task; therefore, the top-level object that manages an integrated collection of tasks is a task scheduler not a task policy .

Introduce a using statement to temporarily preserve TaskPolicy label for backward compatibility.

hcedwar · 2016-12-07T20:20:23Z

Try to reduce overhead of handing off a ready task to a ready thread, currently a lot of error checking and enqueue / dequeue work.

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

Removing use of atomic_exchange when a thread is know to have exclusive access.

…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320

where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

Removing use of atomic_exchange when a thread is know to have exclusive access.

…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320

where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

Removing use of atomic_exchange when a thread is know to have exclusive access.

…kkos#320

and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320

…okkos#320

To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .

…ler. kokkos#637 kokkos#577 kokkos#320

…kkos#320

and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320

…okkos#320

race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these

race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.

Cleaning and consolidating Cuda Team across TeamPolicy and TaskScheduler. Cleaning Cuda Team collective operations. Moving Cuda team vector collectives into CudaTeamMember and moving threadIdx.x offset and blockDim.x increment into loops. Fine tuning Impl::FunctorAnalysis in preparation for CUDA back-end clean up. Unit testing Impl::FunctorAnalysis. Fine tuning Reducer interface and implementations for CUDA back-end clean up. '#if 0' code in CUDA back-end to flesh out design clean up.

race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.

Cleaning and consolidating Cuda Team across TeamPolicy and TaskScheduler. Cleaning Cuda Team collective operations. Moving Cuda team vector collectives into CudaTeamMember and moving threadIdx.x offset and blockDim.x increment into loops. Fine tuning Impl::FunctorAnalysis in preparation for CUDA back-end clean up. Unit testing Impl::FunctorAnalysis. Fine tuning Reducer interface and implementations for CUDA back-end clean up. '#if 0' code in CUDA back-end to flesh out design clean up.

To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .

race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.

race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Creating power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.

enable stealing of empty superblocks among block sizes. Expand block size superblock hint array to "N" values per block size to provide space for TBD superblock search optimizations. Construct memory pool with min block, max block, and superblock size and introduce performance optimizations related to max vs. min block size. Issues kokkos#487, kokkos#320, kokkos#738, kokkos#215

issue kokkos#320, kokkos#314

ibaned · 2017-06-14T18:26:25Z

The last thing needed here is #577

ibaned · 2017-06-14T18:26:36Z

... and Documentation !!!

hcedwar · 2017-06-21T18:45:17Z

... and example(s)

ndellingwood added the Feature Request Create new capability; will potentially require voting label Jun 8, 2016

ndellingwood added this to the Summer 2016 milestone Jun 8, 2016

ndellingwood assigned hcedwar Jun 8, 2016

hcedwar modified the milestones: Fall 2016, Summer 2016, END 2016 Sep 28, 2016

hcedwar added a commit that referenced this issue Sep 28, 2016

Clean out experimental TaskPolicy, issue #372 and #320

aac60ec

hcedwar added a commit that referenced this issue Sep 30, 2016

Rename TaskPolicy to TaskScheduler as per issue #320

34ddf66

Introduce a using statement to temporarily preserve TaskPolicy label for backward compatibility.

hcedwar mentioned this issue Sep 30, 2016

[Feature Request] task data parallel for knl tile mapping. #447

Closed

hcedwar modified the milestones: Backlog, END 2016 Nov 1, 2016

hcedwar modified the milestones: 2017-February, Backlog Dec 6, 2016

hcedwar changed the title ~~Mature task policy~~ Mature task-DAG capability Dec 13, 2016

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 13, 2016

Progress on maturing task-DAG capability, issue kokkos#320.

b76d5c0

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 13, 2016

Simplifying and optimizing task-DAG implementation (issue kokkos#320)

56819b8

Removing use of atomic_exchange when a thread is know to have exclusive access.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 13, 2016

Split task 'schedule' function into 'schedule_runnable' and 'schedule…

e2f153b

…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 13, 2016

Replacing more atomic exchange with ordinary assignments

1223c0b

where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 19, 2016

Progress on maturing task-DAG capability, issue kokkos#320.

84c20d0

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 19, 2016

Simplifying and optimizing task-DAG implementation (issue kokkos#320)

015297d

Removing use of atomic_exchange when a thread is know to have exclusive access.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 19, 2016

Split task 'schedule' function into 'schedule_runnable' and 'schedule…

1c7a4f3

…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Dec 19, 2016

Replacing more atomic exchange with ordinary assignments

f8c655d

where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Jan 6, 2017

Progress on maturing task-DAG capability, issue kokkos#320.

489169b

Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Jan 6, 2017

Simplifying and optimizing task-DAG implementation (issue kokkos#320)

456d98b

Removing use of atomic_exchange when a thread is know to have exclusive access.

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 5, 2017

Cleaning Cuda Team collective operations for kokkos#637 kokkos#577 ko…

5dfd061

…kkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 5, 2017

Moving Cuda team vector collectives into CudaTeamMember

6bf2de3

and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 5, 2017

Start cleanup of Cuda TeamPolicy global reduce. kokkos#637 kokkos#577 k…

16b09a4

…okkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 18, 2017

Initial concurrent bitset.

30d30cc

To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 18, 2017

Cleaning and consolidating Cuda Team across TeamPolicy and TaskSchedu…

a8a86dc

…ler. kokkos#637 kokkos#577 kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 18, 2017

Cleaning Cuda Team collective operations for kokkos#637 kokkos#577 ko…

b998b60

…kkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 18, 2017

Moving Cuda team vector collectives into CudaTeamMember

8de23ea

and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 18, 2017

Start cleanup of Cuda TeamPolicy global reduce. kokkos#637 kokkos#577 k…

1b8e87e

…okkos#320

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 19, 2017

Initial concurrent bitset.

9d77e09

To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .

hcedwar added this to Backlog in On-node Task DAG Apr 19, 2017

hcedwar added a commit to hcedwar/kokkos that referenced this issue Apr 23, 2017

Convert TaskScheduler to use refactored memory pool.

6689541

issue kokkos#320, kokkos#314

hcedwar moved this from Feature Backlog to In Progress in On-node Task DAG Apr 25, 2017

hcedwar modified the milestones: 2017-August (middle), 2017-June-end Jun 26, 2017

hcedwar added the InDevelop label Sep 28, 2017

hcedwar moved this from In Progress to In Develop in On-node Task DAG Sep 28, 2017

crtrott closed this as completed Oct 28, 2017

crtrott mentioned this issue Oct 28, 2017

Kokkos promotion 2.04.04 -> 2.04.11 trilinos/Trilinos#1916

Merged

hcedwar moved this from In Develop to Done in On-node Task DAG Feb 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mature task-DAG capability #320

Mature task-DAG capability #320

ndellingwood commented Jun 8, 2016

hcedwar commented Jul 27, 2016

hcedwar commented Aug 30, 2016

hcedwar commented Sep 30, 2016

hcedwar commented Dec 7, 2016

ibaned commented Jun 14, 2017

ibaned commented Jun 14, 2017

hcedwar commented Jun 21, 2017

Mature task-DAG capability #320

Mature task-DAG capability #320

Comments

ndellingwood commented Jun 8, 2016

hcedwar commented Jul 27, 2016

hcedwar commented Aug 30, 2016

hcedwar commented Sep 30, 2016

hcedwar commented Dec 7, 2016

ibaned commented Jun 14, 2017

ibaned commented Jun 14, 2017

hcedwar commented Jun 21, 2017