Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 302 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 302 Bytes

FastGPUConvolution

Part of UIUC ECE408/CS483 Final project.

An optimized convolution layer of CNN on CUDA

  • GEMM implementation of convolution.
  • Shared memory tiling of matrix multiplication.
  • Fp16 vectorized matrix multiplication.
  • Accelerating GEMM with the Tensor Cores (wmma instruction).