A quick summary of the environment: Windows 11, i7-10 16GB RAM, RTX 2060 GPU w 6 GB RAM, VS Code, tensorflow
1) Tensorflow + CUDA set up per https://www.tensorflow.org/install/pip; requires WSL
2) VSCode set up to work with WSL per https://code.visualstudio.com/docs/remote/wsl

Quick summary of results for training a simple model with 3 dense layers:
1) WSL + GPU: 32 seconds
2) WSL + CPU: 22 seconds
3) Windows + CPU: 20 seconds
Task manager also shows WSL taking a substantial amount of memory

In [4]:
# Description: This file is used to test the performance of WSL + GPU vs WSL + CPU, vs Windows + CPU
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"  # Suppress warnings about missing NUMA and TensorRT on WSL2
import tensorflow as tf
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "0"

HAS_GPU = len (tf.config.list_physical_devices("GPU")) > 0
HAS_GPU = False

In [5]:
RANDOM_SEED = 12    # 12th man - go Seahawks!
tf.random.set_seed(RANDOM_SEED)

BATCH_SIZE = 64
EPOCHS = 10

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train/255.0
x_test  = x_test/255.0

def get_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10)
    ])
    model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
    return model

if HAS_GPU:
    model = get_model()
    current_time = tf.timestamp()
    model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE)
    elapsed_time = tf.timestamp() - current_time
    print (f"GPU Training time: {elapsed_time:.2f} seconds")
    print(f"GPU:0 physical memory: {tf.config.experimental.get_memory_info('GPU:0')}")

with tf.device('/cpu:0'):
    model = get_model()
    current_time = tf.timestamp()
    model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE)
    elapsed_time = tf.timestamp() - current_time
    print (f"CPU Training time: {elapsed_time:.2f} seconds")


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU Training time: 21.96 seconds
