# GPU Environment Tensorflow

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [malaya-speech/example/gpu-environment-tensorflow](https://github.com/huseinzol05/malaya-speech/tree/master/example/gpu-environment-tensorflow).
    
</div>

In [1]:
import os

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

In [6]:
%%time

import malaya_speech
import logging
logging.basicConfig(level = logging.INFO)



CPU times: user 184 ms, sys: 17.5 ms, total: 201 ms
Wall time: 93.9 ms


### List available GPU

**You must install Tensorflow GPU version first to enable GPU hardware acceleration**.

In [3]:
from tensorflow.python.client import device_lib

device_lib.list_local_devices()

2023-02-09 15:53:12.310410: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-09 15:53:12.325848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:53:12.328364: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:53:12.329202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zer

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 11493416253575721766,
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 23385997312
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 16395544932201862459
 physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6"]

### Run model inside GPU

We can follow steps from here https://www.tensorflow.org/guide/gpu

In [4]:
import tensorflow as tf

tf.debugging.set_log_device_placement(True)

In [7]:
malaya_speech.stt.ctc.available_transformer()

INFO:malaya_speech.stt:for `malay-fleur102` language, tested on FLEURS102 `ms_my` test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `malay-malaya` language, tested on malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `singlish` language, tested on IMDA malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt


Unnamed: 0,Size (MB),Quantized Size (MB),malay-malaya,Language
hubert-conformer-tiny,36.6,10.3,"{'WER': 0.238714008166, 'CER': 0.060899814, 'W...",[malay]
hubert-conformer,115.0,31.1,"{'WER': 0.2387140081, 'CER': 0.06089981404, 'W...",[malay]
hubert-conformer-large,392.0,100.0,"{'WER': 0.2203140421, 'CER': 0.0549270416, 'WE...",[malay]


### Malaya frozen graph interfaces

#### load graph

All the malaya tensorflow model interface will pass vector arguments to `malaya_boilerplate.frozen_graph.load_graph`,

```python
def load_graph(package, frozen_graph_filename, **kwargs):
    """
    Load frozen graph from a checkpoint.

    Parameters
    ----------
    frozen_graph_filename: str
    precision_mode: str, optional (default='FP32')
        change precision frozen graph, only supported one of ['BFLOAT16', 'FP16', 'FP32', 'FP64'].
    device: str, optional (default='CPU:0')
        device to use for specific model, read more at https://www.tensorflow.org/guide/gpu

    Returns
    -------
    result : tensorflow.Graph
    """
```

#### generate session

After get load into the graph, it will pass the graph into `malaya_boilerplate.frozen_graph.generate_session` to generate session for Tensorflow graph,

```python
def generate_session(graph, **kwargs):
    """
    Load session for a Tensorflow graph.

    Parameters
    ----------
    graph: tensorflow.Graph
    gpu_limit: float, optional (default = 0.999)
        limit percentage to use a gpu memory.

    Returns
    -------
    result : tensorflow.Session
    """
```

In [8]:
tiny = malaya_speech.stt.ctc.transformer(model = 'hubert-conformer-tiny', device = 'GPU:0')

Downloading:   0%|          | 0.00/36.6M [00:00<?, ?B/s]

2023-02-09 15:54:01.508535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:54:01.509432: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:54:01.510104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:54:01.530884: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:54:01.531653: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from S

In [9]:
y, _ = malaya_speech.load('speech/example-speaker/husein-zolkepli.wav')

In [11]:
tiny.predict([y])

2023-02-09 15:55:01.695095: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8302
2023-02-09 15:55:02.590549: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-02-09 15:55:02.680280: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op CTCBeamSearchDecoder in device /job:localhost/replica:0/task:0/device:CPU:0
Instructions for updating:
Use `tf.cast` instead.


2023-02-09 15:55:02.788906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:55:02.789677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:55:02.790159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:55:02.790921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-09 15:55:02.791811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from S

Executing op Cast in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op SparseToDense in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op StridedSlice in device /job:localhost/

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /j

['testing nama saya busin bian zokeple']

In [12]:
!nvidia-smi

Thu Feb  9 15:55:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
| 47%   61C    P2   345W / 350W |  22423MiB / 24256MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:07:00.0 Off |                  Off |
|  0%   47C    P2   106W / 350W |   4146MiB / 24256MiB |      0%      Defaul