# EuroSAT training example

![EuroSAT dataset example images](images/EuroSAT-fig-4.png)

This is a single-GPU version of a workflow for training a simple convolutional network to classify EuroSAT data; information about the dataset and the paper documenting it (from which the image above comes) can be found [at this GitHub repository](https://github.com/phelber/eurosat).

Below we'll walk through the single-GPU training loop, and we will use this example to explore PyTorch's key distributed training frameworks --- [Distributed Data Parallel](https://docs.pytorch.org/tutorials/intermediate/ddp_tutorial.html), [Pipeline Parallel](https://docs.pytorch.org/docs/stable/distributed.pipelining.html), and [Fully Sharded Data Parallel (FSDP2)](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html).

Let's get started!

First let's import the libaries we need:

In [None]:
import time

import torch
import torch.nn as nn
import torch.optim as optim

import torchvision
import torchvision.transforms.v2 as transforms

import matplotlib.pyplot as plt
import numpy as np

The dataset is in the torchvision library, so we can import it directly from there.  For setting up the data loaders, we'll want to set up some simple transforms.  Let's also, for convenience, set up a list of the category names we'll be using for the classification task.

In [None]:
transform = transforms.Compose(
    [
        transforms.ToImage(),
        transforms.ToDtype(torch.float32, scale=True),
        transforms.RandomVerticalFlip(),
        transforms.RandomHorizontalFlip(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

batch_size = 8

dataset = torchvision.datasets.EuroSAT(
    root="./data", download=True, transform=transform
)
total_count = len(dataset)
train_count = int(0.6 * total_count)
valid_count = int(0.2 * total_count)
test_count = total_count - train_count - valid_count
train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(
    dataset, (train_count, valid_count, test_count)
)


trainloader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, drop_last=True
)
testloader = torch.utils.data.DataLoader(
    test_dataset, batch_size=batch_size, shuffle=True, num_workers=4, drop_last=True
)


classes = [
    "AnnualCrop",
    "Forest",
    "HerbaceousVegetation",
    "Highway",
    "Industrial",
    "Pasture",
    "PermanentCrop",
    "Residential",
    "River",
    "SeaLake",
]

Alright, great.  Lets take a look at some of the items of data, so we know what we're working with.

In [None]:
def imshow(img):
    img = img * 0.224 + 0.456  # unnormalize
    npimg = np.clip(img.numpy(), 0, 1)
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(" ".join(f"{classes[labels[j]]:5s}" for j in range(batch_size)))

Great!  Ok, so now we can start building our model.  We'll use a simple convolutional neural network, which we'll define in the next cells.  We'll use a single GPU.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
class Net(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        self.conv_block1 = nn.Sequential(
            nn.Conv2d(
                in_channels=3, out_channels=32, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(
                in_channels=32, out_channels=32, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32
        )

        self.conv_block2 = nn.Sequential(
            nn.Conv2d(
                in_channels=32, out_channels=64, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(
                in_channels=64, out_channels=64, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16
        )

        self.conv_block3 = nn.Sequential(
            nn.Conv2d(
                in_channels=64, out_channels=128, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(
                in_channels=128, out_channels=128, kernel_size=3, padding=1, stride=1
            ),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8
        )

        # Global Average Pooling and Fully Connected Layers
        self.global_avg_pool = nn.AdaptiveAvgPool2d(
            (1, 1)
        )  # Reduces each 128-channel map to 1x1

        self.classifier = nn.Sequential(
            nn.Flatten(),  # Input will be (batch_size, 128)
            nn.Linear(in_features=128, out_features=64),
            nn.ReLU(),
            nn.Dropout(0.5),  # Standard dropout for FC layers
            nn.Linear(in_features=64, out_features=num_classes),
        )

    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = self.conv_block3(x)

        x = self.global_avg_pool(x)
        x = self.classifier(x)

        return x

Let's instantiate the model on the device.

In [None]:
net = Net(len(classes)).to(device)

Ok, next up is to set up the loss function and the optimizer.  We'll use cross-entropy loss, which is standard for classification tasks, and we'll use a simple SGD optimizer with a learning rate of 0.01.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

Here's the routine to evaluate how well we're doing on the validation set.  We'll use this to monitor our progress during training.

In [None]:
def test(model, test_loader, loss_fn, device):
    total_labels = 0
    correct_labels = 0
    loss_total = 0
    model.eval()
    with torch.no_grad():
        for images, labels in test_loader:
            # Transfering images and labels to GPU if available
            labels = labels.to(device)
            images = images.to(device)

            # Forward pass
            outputs = model(images)
            loss = loss_fn(outputs, labels)

            # Extracting predicted label, and computing validation loss and validation accuracy
            predictions = torch.max(outputs, 1)[1]
            total_labels += len(labels)
            correct_labels += (predictions == labels).sum()
            loss_total += loss

    v_accuracy = correct_labels / total_labels
    v_loss = loss_total / len(test_loader)

    return v_accuracy, v_loss

Alright, so now let's set up and run the training loop!

In [None]:
total_time = 0
for epoch in range(5):
    running_loss = 0.0
    t0 = time.time()
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)

        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 500 == 499:  # print every 500 minibatches
            print(f"[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}")
            running_loss = 0.0

    # timing
    epoch_time = time.time() - t0
    total_time += epoch_time

    # output metrics at the end of each epoch
    images_per_sec = torch.tensor(len(trainloader) * batch_size / epoch_time).to(
        device
    )
    v_accuracy, v_loss = test(net, testloader, criterion, device)
    print(
        f"Epoch = {epoch:2d}: Cumulative Time = {total_time:5.3f}, Epoch Time = {epoch_time:5.3f}, Images/sec = {images_per_sec:5.3f}, Validation Loss = {v_loss:5.3f}, Validation Accuracy = {v_accuracy:5.3f}"
    )

print("Finished Training")

Let's save the model after training, so we can use it later for inference or further training.


In [None]:
PATH = "./eurosat_net.pth"
torch.save(net.state_dict(), PATH)

Here's some simple data:

In [None]:
dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print("GroundTruth: ", " ".join(f"{classes[labels[j]]:5s}" for j in range(batch_size)))

Now let's see how the model performs on those samples after training.

In [None]:
net = Net()
net.to(device)
net.load_state_dict(torch.load(PATH, weights_only=True))

In [None]:
images = images.to(device)
outputs = net(images)

In [None]:
_, predicted = torch.max(outputs, 1)
print(predicted)

print("Predicted: ", " ".join(f"{classes[predicted[j]]:5s}" for j in range(batch_size)))

Not bad!  Now let's look at how the model performs on all the test set:

In [None]:
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Accuracy of the network on the 10000 test images: {100 * correct // total} %")

And now by category:

In [None]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f"Accuracy for class: {classname:5s} is {accuracy:.1f} %")

Nicely done.  There's experiments we can do to improve the model, such as using a more complex architecture, data augmentation, or hyperparameter tuning.  But this is a good start!

Next up is applying distributed training techniques to this model.  But first, let's have a little introduction to the command we'll use to launch distributed training, torchrun, [in the next notebook](2_Torchrun_and_distributed).