ryan42210 / FastGPUConvolution Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

An optimized foward convolution pass of CNN on CUDA

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

FastGPUConvolution

Part of UIUC ECE408/CS483 Final project.

An optimized convolution layer of CNN on CUDA

GEMM implementation of convolution.
Shared memory tiling of matrix multiplication.
Fp16 vectorized matrix multiplication.
Accelerating GEMM with the Tensor Cores (wmma instruction).

About

An optimized foward convolution pass of CNN on CUDA

Report repository

Releases

No releases published

Packages

Languages