This notebook is for perf-testing PyTorch vs TensorFlow with and without GPU on a simple training set so I can figure out the best environment for training models. Here's the setup I used
* Windows 11, i7-10 16GB RAM, RTX 2060 GPU w 6 GB RAM, VS Code
* Fashion mnista data set with simple model with 3 dense layers
* Tensorflow w CUDA via WSL set up per https://www.tensorflow.org/install/pip
* Tensorflow w CUDA via direct ML set up per https://learn.microsoft.com/en-us/windows/ai/directml/gpu-tensorflow-plugin

Here are the results so far:
1) Tensorflow w CPU: 17 seconds
2) Tensorflow w CUDA via WSL: 32 seconds. Plus WSL is horrible on so many levels (see below)
3) Tensorflow w CUDA via direct ML: 40 seconds. Even worse, but not horrible like WSL (see below) and while the perf on this small model was worse than cpu, it significantly sped up the training of a deeper unet model I tested it on; from 5.5 hours to 25 minutes!
4) PyTorch with or without without CUDA: 128 seconds. By far the worst; I must be doing something wrong. Hopefully the internet can help me.

My WSL experience:
* Setting it up was a complete PITA
* After following all the instructions, you still get spurrious warnings about tensorRT and NUMA
* WSL eats up a ton of disk space, and worse it eats up a ton or RAM when running, and 1GB or RAM even when it is not running(!) due to virualization of the operation systems
* And for all this, it has negative benefit, so I've uninstalled for now and will wait for the tech to mature

In [1]:
# Description: This file is used to test the performance of WSL + GPU vs WSL + CPU, vs Windows + CPU
import tensorflow as tf

HAS_GPU = len (tf.config.list_physical_devices("GPU")) > 0
if HAS_GPU:
    print("Available GPU devices:", tf.config.list_physical_devices("GPU"))

Available GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [2]:
RANDOM_SEED = 12    # 12th man - go Seahawks!
tf.random.set_seed(RANDOM_SEED)

BATCH_SIZE = 64
EPOCHS = 10

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train/255.0
x_test  = x_test/255.0

def get_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(256, activation='relu'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10)
    ])
    model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
    return model

with tf.device('/cpu:0'):
    model = get_model()
    current_time = tf.timestamp()
    model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE)
    elapsed_time = tf.timestamp() - current_time
    print (f"CPU Training time: {elapsed_time:.2f} seconds")

if HAS_GPU:
    model = get_model()
    current_time = tf.timestamp()
    model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE)
    elapsed_time = tf.timestamp() - current_time
    print (f"GPU Training time: {elapsed_time:.2f} seconds")
    print(f"GPU:0 physical memory: {tf.config.experimental.get_memory_info('GPU:0')}")

# Print accuracy and loss on the test set
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}, loss: {test_loss:.4f}")



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU Training time: 15.70 seconds
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
GPU Training time: 39.75 seconds
GPU:0 physical memory: {'current': 295487232, 'peak': 296705792}
Test accuracy: 0.8824, loss: 0.3320


Now let's try it with PyTorch

In [3]:
import time
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
device = "cuda" if torch.cuda.is_available() else "cpu"
print (f"Using device: {device}")

Using device: cuda


In [4]:
DIR = "data/fashionmnist"
RANDOM_SEED = 12    # 12th man - go Seahawks!
BATCH_SIZE = 64
EPOCHS = 10

# Fix random seeds for reproducibility
torch.manual_seed(RANDOM_SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(RANDOM_SEED)
    torch.backends.cudnn.deterministic=True

model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28*28, 256),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
    ).to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# Get the fashion mnist training data and normalize it
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train = datasets.FashionMNIST(DIR, download=True, train=True, transform=transform)
train_loader = DataLoader(train, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    avg_loss = 0
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        avg_loss += loss.item()
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return avg_loss / size

current_time = time.time()
for t in range(EPOCHS):
    print(f"Epoch {t+1}...", end="")
    avg_loss = train_loop(train_loader, model, loss_fn, optimizer)
    print(f"Avg loss: {avg_loss:.4f}")
elapsed_time = time.time() - current_time
print (f"{device} Training time: {elapsed_time:.2f} seconds")

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    correct /= size
    print(f"Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
test = datasets.FashionMNIST(DIR, download=True, train=False, transform=transform)
test_loader = DataLoader(test, batch_size=BATCH_SIZE, shuffle=False)
test_loop(test_loader, model, loss_fn)

print("Done!")




Epoch 1...Avg loss: 0.0084
Epoch 2...Avg loss: 0.0065
Epoch 3...Avg loss: 0.0060
Epoch 4...Avg loss: 0.0056
Epoch 5...Avg loss: 0.0053
Epoch 6...Avg loss: 0.0051
Epoch 7...Avg loss: 0.0050
Epoch 8...Avg loss: 0.0048
Epoch 9...Avg loss: 0.0047
Epoch 10...Avg loss: 0.0046
cuda Training time: 125.60 seconds
Accuracy: 87.1%, Avg loss: 0.005769 

Done!
