Heterogeneous Computing
HIP: C++ Heterogeneous-Compute Interface for Portability
Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
大规模并行处理器编程实战 第二版答案
NVIDIA Linux open GPU kernel module source
Main Book repository for the Parallel and High Performance Computing book, Manning Publications
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
GPU-accelerated real-time reference-based dynamic phase retrieval G-LS3U
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
An extension library of WMMA API (Tensor Core API)
Test suite for probing the numerical behavior of NVIDIA tensor cores
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Tile primitives for speedy kernels
The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm



