-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mature task-DAG capability #320
Comments
In develop without intra-team collectives. These are in progress being developed by James - summer intern. |
…inished and tested for Cuda, OpenMP, and Serial. Cuda Task team collectives for ThreadVectorRange reduce and scan written not yet tested. Remove redundant setup and teardown in OpenMP unit test files. Work around gcc 4.7.2 and gcc 5.1.0 bug where using a captured variable twice genertes an error of redeclaring the captured variable. Progress on issue #320
Intra-team collectives are partially implemented for the Summer milestone. This feature request will have to carry over to the next milestone to be completed. |
Kokkos technical review 9/8/16 decision: rename TaskPolicy to TaskScheduler. |
Introduce a using statement to temporarily preserve TaskPolicy label for backward compatibility.
|
Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.
Removing use of atomic_exchange when a thread is know to have exclusive access.
…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320
where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320
Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.
Removing use of atomic_exchange when a thread is know to have exclusive access.
…_aggregate. Rename 'pop_task' to 'pop_ready_task' since it is only applicable to ready queues. Related to issue kokkos#320
where the thread is guaranteed to have exclusive access. Progress on issue kokkos#320
Updating API to better align with 'pattern( policy , functor )' as per issue kokkos#426.
Removing use of atomic_exchange when a thread is know to have exclusive access.
and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320
To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .
and moving threadIdx.x offset and blockDim.x increment into loops. kokkos#637 kokkos#577 kokkos#320
race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these
race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.
Cleaning and consolidating Cuda Team across TeamPolicy and TaskScheduler. Cleaning Cuda Team collective operations. Moving Cuda team vector collectives into CudaTeamMember and moving threadIdx.x offset and blockDim.x increment into loops. Fine tuning Impl::FunctorAnalysis in preparation for CUDA back-end clean up. Unit testing Impl::FunctorAnalysis. Fine tuning Reducer interface and implementations for CUDA back-end clean up. '#if 0' code in CUDA back-end to flesh out design clean up.
race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.
Cleaning and consolidating Cuda Team across TeamPolicy and TaskScheduler. Cleaning Cuda Team collective operations. Moving Cuda team vector collectives into CudaTeamMember and moving threadIdx.x offset and blockDim.x increment into loops. Fine tuning Impl::FunctorAnalysis in preparation for CUDA back-end clean up. Unit testing Impl::FunctorAnalysis. Fine tuning Reducer interface and implementations for CUDA back-end clean up. '#if 0' code in CUDA back-end to flesh out design clean up.
To be used in MemoryPool for TaskDAG maturing issue kokkos#320 . Designed to help address issue kokkos#452 .
race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Using power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.
race conditions by condensing state representation to a single integer and simplifying algorithm. Addresses issues kokkos#320 , kokkos#487 , kokkos#452 Creating power-of-two Kokkos::Impl::concurrent_bitset size to streamline implementation and align with MemoryPool needs. Unit testing over a range of superblocks the following sequence: 1) allocate N of varying size 2) deallocate N/3 of these 3) reallocation deallocated 4) concurrently deallocate and allocate N/3 of these Add performance test for memory pool. Add performance enhancement note for multiple hints per block size.
enable stealing of empty superblocks among block sizes. Expand block size superblock hint array to "N" values per block size to provide space for TBD superblock search optimizations. Construct memory pool with min block, max block, and superblock size and introduce performance optimizations related to max vs. min block size. Issues kokkos#487, kokkos#320, kokkos#738, kokkos#215
The last thing needed here is #577 |
... and Documentation !!! |
... and example(s) |
Maturing the LDRD prototype task policy and implement for all back-ends.
The text was updated successfully, but these errors were encountered: