- Run measure_fp16.ipynb
- You should see results (on v100) similar to this:
- https://discuss.pytorch.org/t/solved-titan-v-on-pytorch-0-3-0-cuda-9-0-cudnn-7-0-is-much-slower-than-1080-ti/11320/10
pip install nvprof
/usr/local/cuda/bin/nvprof --log-file nvprof_output.txt python profile_fp16.py
cat nvprof_output.txt | grep fp16_s884
You should see some 884 calls