# Using GPU with Tensorslow

# TensorFlow CUDA package

* [Install TensorFlow with pip](https://www.tensorflow.org/install/gpu)

> Software requirements
> * Python 3.9–3.11
> *  pip version 19.0 or higher for Linux (requires manylinux2014 support) and Windows. pip version 20.3 or higher for macOS.
> *Windows Native Requires Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019
> 
> The following NVIDIA® software are only required for GPU support.
> 
> NVIDIA® GPU drivers version 450.80.02 or higher.
> * CUDA® Toolkit 11.8.    <-----
> * cuDNN SDK 8.6.0.       <-----
> ```
> python3 -m pip install tensorflow[and-cuda]
> # Verify the installation:
> python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
> ```

## CUDA driver

Make sure CUDA driver for the current GPU is installed.

## Cuda Toolkit

Install the CUDA toolkit version supported by the Tensorflow
* [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive)


In [None]:
import os
os.environ['CUDA_HOME']
!nvcc -V

In [1]:
import tensorflow as tf

2023-11-18 21:10:54.494256: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-18 21:10:54.553926: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-18 21:10:54.553972: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-18 21:10:54.554002: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-18 21:10:54.566119: I tensorflow/core/platform/cpu_feature_g

# GPU Devices

* [GPU support](https://www.tensorflow.org/install/gpu)

> TensorFlow GPU support requires an assortment of drivers and libraries. To simplify installation and avoid library conflicts, we recommend using a TensorFlow Docker image with GPU support (Linux only). This setup only requires the NVIDIA® GPU drivers.

* [Use a GPU](https://www.tensorflow.org/guide/gpu)

> * "/device:CPU:0": The CPU of your machine.<br>
> * "/GPU:0": Short-hand notation for the first GPU of your machine that is visible to TensorFlow.
> * "/job:localhost/replica:0/task:0/device:GPU:1": Fully qualified name of the second GPU of your machine that is visible to TensorFlow.


## Confirm GPU

On a GPU instance (using Google colab GPU runtime)

```
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
tf.config.list_physical_devices('GPU')
---
Num GPUs Available:  1
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
```

In [2]:
print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))

Is there a GPU available: 
[]


2023-11-18 21:10:59.714001: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-11-18 21:10:59.910942: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


### Number of available GPU

In [3]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  0


---

# TF usage of GPU

Using CPU or GPU is transparent to TensorFlow code.

* [Use a GPU](https://www.tensorflow.org/guide/gpu)

> TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required.

>tf.matmul has both CPU and GPU kernels. On a system with devices CPU:0 and GPU:0, the GPU:0 device will be selected to run tf.matmul unless you explicitly request running it on another device.

## Confirm if TF is using GPU

In [7]:
tf.config.list_physical_devices('GPU')

[]

## Confirm a tensor is allocated on a GPU device

In [5]:
x = tf.random.uniform([3, 3])

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is the Tensor on GPU #0:  
False


## Explicit device assignment

In [6]:
tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)

In [None]:
tf.debugging.set_log_device_placement(True)
with tf.device('/GPU:0'):
    # Place tensors on the GPU
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    
    # Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    c = tf.matmul(a, b)

# Control TF GPU Usage

* [Limiting GPU memory growth](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth)

> to only allocate a subset of the available memory, or to **only grow the memory usage** as is needed by the process. TensorFlow provides two methods to control this.
>   
> The first option is to turn on memory growth by calling ```tf.config.experimental.set_memory_growth```, which **attempts to allocate only as much GPU memory as needed** for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, the GPU memory region is extended for the TensorFlow process. Memory is not released since it can lead to memory fragmentation. To turn on memory growth for a specific GPU, use the following code prior to allocating any tensors or executing any ops. Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.
> ```
> gpus = tf.config.list_physical_devices('GPU')
> if gpus:
>   try:
>     # Currently, memory growth needs to be the same across GPUs
>     for gpu in gpus:
>       tf.config.experimental.set_memory_growth(gpu, True)
>     logical_gpus = tf.config.list_logical_devices('GPU')
>     print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
>   except RuntimeError as e:
>     # Memory growth must be set before GPUs have been initialized
>     print(e)
> ```
>
> The second method is to configure a virtual GPU device with tf.config.set_logical_device_configuration and set a hard limit on the total memory to allocate on the GPU. This is useful if you want to truly bound the amount of GPU memory available to the TensorFlow process. This is common practice for local development when the GPU is shared with other applications such as a workstation GUI.
> ```
> gpus = tf.config.list_physical_devices('GPU')
> if gpus:
>   # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
>   try:
>     tf.config.set_logical_device_configuration(
>         gpus[0],
>         [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
>     logical_gpus = tf.config.list_logical_devices('GPU')
>     print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
>   except RuntimeError as e:
>     # Virtual devices must be set before GPUs have been initialized
>     print(e)
> ```

# Confirm GPU usage

Run ```nvidia-smi``` while training/inference is on-going to make sure GPU is being used.

```
$ nvidia-smi
Sat Nov 18 22:05:29 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8               3W /  35W |   4398MiB /  6141MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    119310      C   /home/eml/venv/ml/bin/python3              4392MiB |
+---------------------------------------------------------------------------------------+
```

In [8]:
!nvidia-smi

Sat Nov 18 22:06:22 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8               3W /  35W |   4398MiB /  6141MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

# OutOfMemory

Make sure the batch can be allocated to GPU memory to avoid OOM:

```
OOM when allocating tensor with shape[32,64,224,224] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node pascal_voc_cnn/conv01/Conv2D}}]]
```

---
# Using Multiple GPU

* [Using multiple GPUs](https://www.tensorflow.org/guide/gpu#using_multiple_gpus)

> The best practice for using multiple GPUs is to use tf.distribute.Strategy. This program will run a copy of your model on each GPU, splitting the input data between them, also known as "data parallelism".
> ```
> tf.debugging.set_log_device_placement(True)
> gpus = tf.config.list_logical_devices('GPU')
> strategy = tf.distribute.MirroredStrategy(gpus)
> with strategy.scope():
>   inputs = tf.keras.layers.Input(shape=(1,))
>   predictions = tf.keras.layers.Dense(1)(inputs)
>   model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
>   model.compile(loss='mse',
>                 optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))
> ```

* [Distributed training with TensorFlow](https://www.tensorflow.org/guide/distributed_training)

> tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.



---
# Disable TF GPU Usage

* [CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/)

> easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees.  
>   
> 
> To use it, set CUDA_VISIBLE_DEVICES to a comma-separated list of device IDs to make only those devices visible to the application.  Note that you can use this technique both to mask out devices or to change the visibility order of devices so that the CUDA runtime enumerates them in a specific order.


In [None]:
# Disable GPU
!export CUDA_VISIBLE_DEVICES='-1'

In [1]:
# From Python. Run this at the start before any TF operation.
try:
    # Disable all GPUS
    tf.config.set_visible_devices([], 'GPU')
    visible_devices = tf.config.get_visible_devices()
    for device in visible_devices:
        assert device.device_type != 'GPU'
except:
    # Invalid device or cannot modify virtual devices once initialized.
    pass