Skip to content

ryan42210/FastGPUConvolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FastGPUConvolution

Part of UIUC ECE408/CS483 Final project.

An optimized convolution layer of CNN on CUDA

  • GEMM implementation of convolution.
  • Shared memory tiling of matrix multiplication.
  • Fp16 vectorized matrix multiplication.
  • Accelerating GEMM with the Tensor Cores (wmma instruction).

About

An optimized foward convolution pass of CNN on CUDA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages