```
conda create -n tf-gpu tensorflow-gpu
conda activate tf-gpu
conda install anaconda
```

TF doesn't allow easy switch between CPU and GPU. Requires notebook restart to switch, hence separate notebooks here for CPU and GPU

In [1]:
!which python

/work/bootcamp/gpu_training/tf_env/bin/python


In [2]:
import tensorflow as tf
import os

### GPU

In [3]:
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
print (f"CUDA Device Number found: {os.environ['CUDA_VISIBLE_DEVICES']}") 
if tf.test.gpu_device_name():
    print('GPU found')
else:
    print("No GPU found")

CUDA Device Number found: 0
GPU found


In [4]:
print (f"Validate TF is installed with CUDA support : {tf.test.is_built_with_cuda()}")
print (f"Validate GPU is available for compute: {tf.config.list_physical_devices('GPU')}")

Validate TF is installed with CUDA support : True
Validate GPU is available for compute: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [5]:
tf.random.set_seed(42)
print (f"TF Version : {tf.__version__}") 

TF Version : 2.4.1


In [6]:
tensor1 = tf.random.normal([5000,10000])
tensor2 = tf.random.normal([10000,5000])
result = tf.matmul(tensor1, tensor2)
print (result.shape)

(5000, 5000)


In [10]:
%%timeit -r 7 -n 1000
tf.matmul(tensor1, tensor2)

59.1 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
!nvidia-smi

Wed Jun  9 09:28:02 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA Tesla P1...  Off  | 00000000:03:00.0 Off |                    0 |
| N/A   66C    P0   220W / 250W |  11525MiB / 12198MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

GPU runtime ~ 36 ms with a high std. dev. of 5ms. Mean runtime is close to PyTorch GPU execution. 

> If you see warning, result may not be realistic to compare with PyTorch. But the code used GPU and ran much faster than TF's CPU only run. It is not straightforward to use cuda synchronize to prevent caching as we did in PyTorch. A possible way to use explicit sess.run(); https://discuss.pytorch.org/t/is-there-any-code-torch-backends-cudnn-benchmark-torch-cuda-synchronize-similar-in-tensorflow/51484