Skip to content

l1351868270/LD_mma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LD_mma

Depend on pytorch and cuda

compile

git clone --recursive https://github.com/l1351868270/LD_mma.git
or
git clone https://github.com/l1351868270/LD_mma.git
git submodule update --init --recursive
cd mma
python setup.py install

优化思路

GPU上的Profiling

GPU上的Profiling分为两类:

  1. 对系统整体(CPU&GPU)执行情况进行Profile,判断性能瓶颈是位于CPU还是GPU上,并考虑CPU&GPU之间的同步开销
  2. 对GPU Kernel进行Profile,以找到Kernel的潜在优化点

Profiling Tools

CPU Profiler

gprof

g++ -pg
-pg: profiler gprof
-O2 -O3 会inline函数

gprof 

Intel VTune

AMD uProf

GPU Profiler

NVIDIA Nsight

nsys profile python ./ld_mma/tests/cublas_matmul_test.py

NVIDIA Nsight Compute

ncu -f --set full -o lsl1 python ./ld_mma/tests/cublas_matmul_test.py

Matrials

Video

GPU编程

CUDA: From Correctness to Performance

About

Depend on pytorch and cuda

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published