Tests to see if half precision (fp16) is working

Measure performance in notebook

Run measure_fp16.ipynb
You should see results (on v100) similar to this:
https://discuss.pytorch.org/t/solved-titan-v-on-pytorch-0-3-0-cuda-9-0-cudnn-7-0-is-much-slower-than-1080-ti/11320/10

Profile CUDA to make sure it is utilizing half precision

pip install nvprof
/usr/local/cuda/bin/nvprof --log-file nvprof_output.txt python profile_fp16.py
cat nvprof_output.txt | grep fp16_s884

You should see some 884 calls

Awesome Nvidia resource on model conversion and why we need to copy model parameters

http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#multigpu