# Simple TensorFLow test

This notebook can be used to benchmark/test your TF installation. It is based on a [blog post by Purnendu Shukla (2021)](https://www.analyticsvidhya.com/blog/2021/11/benchmarking-cpu-and-gpu-performance-with-tensorflow/).

Including test data during fit process was suggested by [blog post by Jason Brownlee (2022)](https://machinelearningmastery.com/evaluate-performance-deep-learning-models-keras/).

In [1]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

2022-12-10 13:32:15.291007: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-10 13:32:15.291007: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


## Test TF installation

In [2]:
# Initial tests
print("Step 1: Initial GPU presence tests...")
print(tf.config.experimental.list_physical_devices())
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_built_with_cuda())

Step 1: Initial GPU presence tests...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True


2022-12-10 13:33:13.099141: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-12-10 13:33:13.180145: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-12-10 13:33:13.495229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.41GiB deviceMemoryBandwidth: 1.41TiB/s
2022-12-10 13:33:13.495276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-10 13:33:14.364286: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-12-10 13:33:14.364344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2

Step 1: Initial GPU presence tests...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True


2022-12-10 13:33:13.099141: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-12-10 13:33:13.180145: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-12-10 13:33:13.495229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.41GiB deviceMemoryBandwidth: 1.41TiB/s
2022-12-10 13:33:13.495276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-10 13:33:14.364286: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-12-10 13:33:14.364344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2

## Define a neural network model

In [3]:
# Download/preprocess data
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train_scaled = X_train/255
X_test_scaled = X_test/255
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')

print('Data shapes:')
X_train_scaled.shape
y_train_encoded.shape

# Define the model
def get_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(32,32,3)),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(1000, activation='relu'),
        keras.layers.Dense(10, activation='sigmoid')
    ])
    model.compile(optimizer='SGD',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Define a small model
def get_small_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(32,32,3)),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(1000, activation='relu'),
        keras.layers.Dense(10, activation='sigmoid')
    ])
    model.compile(optimizer='SGD',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

Data shapes:
Data shapes:


## Train model on CPU

This should be slow, about 1 minute per epoch. With the 18 additional layers, it takes closer to 30 minutes per epoch!

In [4]:
with tf.device('/CPU:0'):
    model_cpu = get_small_model()
    model_cpu.fit(X_train_scaled, y_train_encoded, epochs = 10)
    #validation_data=(X_test_scaled, y_train_encoded), epochs = 10)

2022-12-10 13:33:20.065598: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-10 13:33:20.079146: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-12-10 13:33:20.081518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.41GiB deviceMemoryBandwidth: 1.41TiB/s
2022-12-10 13:33:20.081548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-10 13:33:20.081566: I tensorflow/stream_executor/platform/defaul

Epoch 1/10
Epoch 1/10
Epoch 2/10
   1/1563 [..............................] - ETA: 1:09 - loss: 1.7471 - accuracy: 0.3750Epoch 2/10
Epoch 3/10
   1/1563 [..............................] - ETA: 1:09 - loss: 1.3751 - accuracy: 0.5938Epoch 3/10
Epoch 4/10
   1/1563 [..............................] - ETA: 1:08 - loss: 1.5935 - accuracy: 0.4375Epoch 4/10
Epoch 5/10
Epoch 5/10
Epoch 6/10
   1/1563 [..............................] - ETA: 1:10 - loss: 1.2388 - accuracy: 0.6562Epoch 6/10
Epoch 7/10
   1/1563 [..............................] - ETA: 1:05 - loss: 1.1437 - accuracy: 0.5938Epoch 7/10
Epoch 8/10
   1/1563 [..............................] - ETA: 1:11 - loss: 1.2209 - accuracy: 0.5938Epoch 8/10
Epoch 9/10
Epoch 9/10
Epoch 10/10
   1/1563 [..............................] - ETA: 1:08 - loss: 1.2261 - accuracy: 0.6875Epoch 10/10


## Train the same model on GPU

This should go much faster at a few seconds per epoch. The first epoch is the slowest.

In [5]:
with tf.device('/GPU:0'):
    model_gpu = get_small_model()
    model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)
    #validation_data=(X_test_scaled, y_train_encoded), epochs = 10)

Epoch 1/10
Epoch 1/10


2022-12-10 13:50:23.266438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-12-10 13:50:23.266438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10


Epoch 2/10
Epoch 2/10
Epoch 3/10
Epoch 3/10
Epoch 4/10
   1/1563 [..............................] - ETA: 3s - loss: 1.5647 - accuracy: 0.4062Epoch 4/10
Epoch 5/10
   1/1563 [..............................] - ETA: 3s - loss: 1.2332 - accuracy: 0.5938Epoch 5/10
Epoch 6/10
   1/1563 [..............................] - ETA: 3s - loss: 1.5692 - accuracy: 0.3438Epoch 6/10
Epoch 7/10
Epoch 7/10
Epoch 8/10
Epoch 8/10
Epoch 9/10
   1/1563 [..............................] - ETA: 3s - loss: 1.6309 - accuracy: 0.4375Epoch 9/10
Epoch 10/10
   1/1563 [..............................] - ETA: 3s - loss: 1.4310 - accuracy: 0.4062Epoch 10/10
