Skip to content

Commit

Permalink
Merge improved Dense stride handling
Browse files Browse the repository at this point in the history
Preserves the stride of Dense matrices during copies and conversions.
Additionally adds missing output-parameter versions of Dense functions.

Related PR: #774
  • Loading branch information
upsj committed Jun 3, 2021
2 parents da19a97 + 4677856 commit b621017
Show file tree
Hide file tree
Showing 23 changed files with 2,962 additions and 623 deletions.
53 changes: 19 additions & 34 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,40 +327,6 @@ build/cuda92/gcc/all/release/shared:
BUILD_TYPE: "Release"
CUDA_ARCH: 35

# Make sure that our jobs run when HWLOC is
# forcibly switched off
build/cuda92/intel/cuda/release/static:
<<: *default_build
extends:
- .full_test_condition
- .use_gko-cuda92-gnu7-llvm50-intel2017
variables:
<<: *default_variables
C_COMPILER: "icc"
CXX_COMPILER: "icpc"
BUILD_OMP: "ON"
BUILD_CUDA: "ON"
BUILD_HWLOC: "OFF"
BUILD_TYPE: "Release"
BUILD_SHARED_LIBS: "OFF"
CUDA_ARCH: 35

# Build CUDA NVIDIA without omp
build/cuda92/intel/cuda_wo_omp/release/shared:
<<: *default_build
extends:
- .quick_test_condition
- .use_gko-cuda92-gnu7-llvm50-intel2017
variables:
<<: *default_variables
C_COMPILER: "icc"
CXX_COMPILER: "icpc"
BUILD_CUDA: "ON"
BUILD_HIP: "ON"
BUILD_HWLOC: "OFF"
BUILD_TYPE: "Release"
CUDA_ARCH: 35

# cuda 10.0 and friends
# Make sure that our jobs run when using self-installed
# third-party HWLOC.
Expand All @@ -378,6 +344,8 @@ build/cuda100/gcc/all/debug/shared:
FAST_TESTS: "ON"
CUDA_ARCH: 35

# Make sure that our jobs run when HWLOC is
# forcibly switched off
build/cuda100/clang/all/release/static:
<<: *default_build
extends:
Expand All @@ -390,6 +358,7 @@ build/cuda100/clang/all/release/static:
BUILD_OMP: "ON"
BUILD_CUDA: "ON"
BUILD_HIP: "ON"
BUILD_HWLOC: "OFF"
BUILD_TYPE: "Release"
BUILD_SHARED_LIBS: "OFF"
CUDA_ARCH: 35
Expand All @@ -408,6 +377,22 @@ build/cuda100/intel/cuda/release/shared:
BUILD_TYPE: "Release"
CUDA_ARCH: 35

# Build CUDA NVIDIA without omp
build/cuda100/intel/cuda_wo_omp/release/shared:
<<: *default_build
extends:
- .full_test_condition
- .use_gko-cuda100-gnu7-llvm60-intel2018
variables:
<<: *default_variables
C_COMPILER: "icc"
CXX_COMPILER: "icpc"
BUILD_CUDA: "ON"
BUILD_HIP: "ON"
BUILD_HWLOC: "OFF"
BUILD_TYPE: "Release"
CUDA_ARCH: 35

# cuda 10.1 and friends
build/cuda101/gcc/all/debug/shared:
<<: *default_build
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ For Ginkgo core library:
* C++14 compliant compiler, one of:
* _gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+_
* _clang 3.9+_
* _Intel compiler 2017+_
* _Apple LLVM 8.0+_ (__TODO__: verify)
* _Intel compiler 2018+_
* _Apple LLVM 8.0+_

The Ginkgo CUDA module has the following __additional__ requirements:

Expand Down
16 changes: 16 additions & 0 deletions common/matrix/dense_kernels.hpp.inc
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,22 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
namespace kernel {


template <typename InValueType, typename OutValueType>
__global__ __launch_bounds__(default_block_size) void strided_copy(
size_type num_rows, size_type num_cols, size_type in_stride,
size_type out_stride, const InValueType *__restrict__ input,
OutValueType *__restrict__ output)
{
const auto global_id = thread::get_thread_id_flat();
const auto row_id = global_id / num_cols;
const auto col_id = global_id % num_cols;
if (row_id < num_rows) {
output[row_id * out_stride + col_id] =
static_cast<OutValueType>(input[row_id * in_stride + col_id]);
}
}


template <typename ValueType>
__global__ __launch_bounds__(default_block_size) void strided_fill(
size_type num_rows, size_type num_cols, size_type stride,
Expand Down
6 changes: 6 additions & 0 deletions core/device_hooks/common_kernels.inc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,12 @@ GKO_DECLARE_DENSE_APPLY_KERNEL(ValueType)
GKO_NOT_COMPILED(GKO_HOOK_MODULE);
GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(GKO_DECLARE_DENSE_APPLY_KERNEL);

template <typename InValueType, typename OutValueType>
GKO_DECLARE_DENSE_COPY_KERNEL(InValueType, OutValueType)
GKO_NOT_COMPILED(GKO_HOOK_MODULE);
GKO_INSTANTIATE_FOR_EACH_VALUE_CONVERSION_OR_COPY(
GKO_DECLARE_DENSE_COPY_KERNEL);

template <typename ValueType>
GKO_DECLARE_DENSE_FILL_KERNEL(ValueType)
GKO_NOT_COMPILED(GKO_HOOK_MODULE);
Expand Down
Loading

0 comments on commit b621017

Please sign in to comment.