GitHub - jin-yc10/sparse_gemm

A simple gemm kernel for sparse convolution. Mainly an implicit GEMM cuda kernel with naive tensor-core. Used following tricks to improve overall runtime

tensor-core
float16 arithmetic
and half2 intrinsic ( hfma2, hmul2, hadd2 )
software pipelining
combined memory access ( ldg128, stg128 ). Note that for A100, we should consider using ldgsts intrinsic

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
tc128.cu		tc128.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tc128.cu

tc128.cu

Repository files navigation

About

Releases

Packages

Languages

jin-yc10/sparse_gemm

Folders and files

Latest commit

History

README.md

README.md

tc128.cu

tc128.cu

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages