GitHub

Grant access to GPU performance counter

We obtain the kernel duration with NVIDIA Nsight Compute. Profiling with Nsight Compute requires access to the performance counters on the GPU (Permission issue with Performance Counters). This should be configured on the machine outside of the container.

To check if the performance is accessible, run

cat /proc/driver/nvidia/params | grep RmProfilingAdminOnly

You should see

RmProfilingAdminOnly: 0

Otherwise, the access can be granted with the following steps:

Create .conf file (e.g. profile.conf) in folder /etc/modprobe.d
Open file /etc/modprobe.d/profile.conf in any editor

Add below line in profile.conf

options nvidia “NVreg_RestrictProfilingToAdminUsers=0”

Close file /etc/modprobe.d/profile.conf
Restart your machine

For more information, please see nvprof-warning-the-user-does-not-have-permission-to-profile-on-the-target-device and Permission issue with Performance Counters.

Using Docker

Step 1: Get the source code

git clone https://github.com/apuaaChen/vectorSparse.git && cd vectorSparse

Step 2: We provides a Dockerfile that builds the proper environment with all dependencies. Note that nvidia-docker must be installed to run on GPU. To build the image, run the following command:

docker build -t vectorsparse .

Step 3: Get The dataset We use the Deep Learning Matrix Collection. Please download the dataset and put it into the directory <host_dataset_dir>. The directory will be something like <host_dataset_dir>/dlmc/rn50/....

Step 4: To launch the container

docker run -it --gpus all --name <your_container_name> -v <host_dataset_dir>:/raid/datasets -v <host_dir>/vectorSparse:/projects/vectorSparse vectorsparse

So that in the container, the sparse matrices will be available at /raid/datasets/dlmc/rn50/...

Step 5: Compile the source code with

cd vectorSparse
bash setup.sh

Step 6.1: To obtain the results in Figure 17, run

python3 launch.py --exp spmm

This script will launch all the experiments sequentially. For each experiment, the profiling result is stored in a .csv file under the ./csv directory. We present an example csv file in the ./example. When all the experiments are done, another python script will be lauched to fetch the kernel durations in the csv files, and summarize the results as a figure in spmm_speedup_rn50_combo.pdf.

Step 6.2: To obtain the results in Figure 18, run

python3 launch.py --exp sddmm

The result will be shown in sddmm_speedup_rn50_combo.pdf

The DLMC dataset and sputnik library are from this paper

@inproceedings{sgk_sc2020,
  author    = {Trevor Gale and Matei Zaharia and Cliff Young and Erich Elsen},
  title     = {Sparse {GPU} Kernels for Deep Learning},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, {SC} 2020},
  year      = {2020},
}

We demonstrate how to use our kernels in Sparse Transformer with fixed mask in here.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
example		example
include		include
src		src
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
file_name_server.py		file_name_server.py
job_launcher.py		job_launcher.py
launch.py		launch.py
ncu_profile.py		ncu_profile.py
plot_blocked_ell.py		plot_blocked_ell.py
plot_finegrained.py		plot_finegrained.py
plot_mem_l2_l1.py		plot_mem_l2_l1.py
plot_sddmm.py		plot_sddmm.py
plot_spmm.py		plot_spmm.py
sddmm_benchmark.cpp		sddmm_benchmark.cpp
setup.sh		setup.sh
spmm_benchmark.cpp		spmm_benchmark.cpp

kzjeef/vectorSparse

Folders and files

Latest commit

History

Repository files navigation

Grant access to GPU performance counter

Using Docker

About

Resources

Stars

Watchers

Forks

Languages