Skip to content

hanzz2007/Awesome-GPU

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-GPU

Resources Management

Papers

  1. ASPLOS'17-Locality-Aware CTA Clustering for Modern GPUs
  2. ASPLOS'17-Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
  3. HPCA'17-Dynamic GPGPU Power Management Using Adaptive Model Predictive Control
  4. ISCA'16-Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Parallelism

Papers

  1. HPCA'17-Controlled Kernel Launch for Dynamic Parallelism in GPUs
  2. ISCA'16-LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
  3. ISCA'16-Virtual Thread Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit

Slides

  1. GTC'17-COOPERATIVE GROUPS

Cache

Papers

  1. ISCA'16-APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs
  2. SC'15-Adaptive and Transparent Cache Bypassing for GPUs

Algorithm

Papers

  1. HPCA'17-Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures

Slides

  1. GTC'18-CUTLASS: CUDA TEMPLATE LIBRARY FOR DENSE LINEAR ALGEBRA AT ALL LEVELS AND SCALES

Software

  1. CUTLASS

Performance Analysis

Papers

  1. PLDI'18-GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis
  2. CGO'18-CUDAAdvisor: LLVM-based runtime profiling for modern GPUs
  3. CCGRID'18-Exposing Hidden Performance Opportunities in High Performance GPU Applications
  4. Euro-Par'15-Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
  5. ISCA'15-Flexible software profiling of GPU architectures
  6. SC'13-Effective sampling-driven performance tools for GPU-accelerated supercomputers
  7. ISPASS'12-Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures
  8. ICPP'11-Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs
  9. ISPASS'10-Demystifying GPU Microarchitecture through Microbenchmarking
  10. ISPASS'10-Visualizing Complex Dynamics in Many-Core Accelerator Architectures
  11. ISPASS'09-Analyzing CUDA Workloads Using a Detailed GPU Simulator

Books

  1. Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
  2. Monitoring Heterogeneous Applications with the OpenMP Tools Interface

Slides

  1. SASSI

Software

  1. Vampir|Score-P
  2. TAU
  3. PAPI
  4. Allinea MAP
  5. Open|SpeedShop
  6. HPCToolkit
  7. NVIDIA Nsight Systems
  8. NVIDIA Nsight Compute

Compiler

  1. LLVM'17-Implementing implicit OpenMP data sharing on GPUs
  2. CGO'16-gpucc: An Open-Source GPGPU Compiler
  3. LLVM'16-Offloading Support for OpenMP in Clang and LLVM
  4. PMBS'15-Performance Analysis of OpenMP on a GPU using a CORAL Proxy Application
  5. LLVM'15-Integrating GPU Support for OpenMP Offloading Directives into Clang
  6. LLVM'14-Coordinating GPU Threads for OpenMP 4.0 in LLVM

Approximate Computing

  1. ASPLOS'14-Paraprox: Pattern-Based Approximation for Data Parallel Applications

Documentations

White Papers

  1. Turing-NVIDIA TURING GPU ARCHITECTURE
  2. Volta-NVIDIA TESLA V100
  3. Pascal-NVIDIA TESLA P100
  4. Kepler-NVIDIA’s Next Generation CUDA Compute Architecture: Kepler
  5. Fermi-NVIDIA’s Next Generation CUDA Compute Architecture: Fermi

APIs

  1. CUDA Toolkit Documentation-CUDA Toolkit Documentation

GTC

  1. GTC-GPU Technology Conference

About

Awesome resources for GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published