# Deep Neural Networks with the iris ML dataset

We show how to use the special DNN techniques:
- multi-class output
- loss functions,
- batch training,
- 1cycle learning rate adjustment
- input normalization

Admittedly, this is overkill for the small number of input features and small data of the size iris dataset, but we will have fun!

This also gives some idea of the "standard" PyTorch approach for setting up DNNs and training on large datasets.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
print(iris.feature_names)
print(iris.target_names)

X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=42)

There is no reason to expect the input features to be in the range (-1,1), but this is where our activation functions are most powerful.

Scaling the features also helps ensure that different input features are treated on the same footing, whether their numerical values are large or small.

The most challenging part of the setup is pushing all of the data into the PyTorch tensors for training and testing. Just use some standard `DataLoader` code to accomplish that.

For the first time, we introduce (mini-)batches that are a fraction of the total training dataset. This allows us to load an entire training batch into the GPU memory at once (even if we're not yet using GPUs). The noise from the mini-batches also helps with the generalization to testing data. A typical batch size is 16-128, depending on the dataset and model.

In [None]:
import torch
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader, TensorDataset

# normalize the input features to be within (0,1)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.LongTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.LongTensor(y_test)

# create TensorDataset
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

# create PyTorch DataLoader with batches for training
train_loader = DataLoader(
    train_dataset,
    batch_size=16,      # Adjust based on your needs
    shuffle=True        # Shuffle training data in batches
)

test_loader = DataLoader(
    test_dataset,
    batch_size=16,
    shuffle=False       # Don't shuffle test data
)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Batches per epoch: {len(train_loader)}")


We can see that each epoch will include 7 batches, so the model weights will be updated 7 times per epoch.

Now let's create the DNN model itself. We use the ReLU function. We also introduce a new loss function and learning rate scheduler tuned for DNN multi-class classification.

In [None]:
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import OneCycleLR

# define DNN model
model = nn.Sequential(
    nn.Linear(4, 10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 3)
)

# CrossEntropyLoss made for multi-class classification
# It combines LogSoftMax and NLLLoss internally
criterion = nn.CrossEntropyLoss()

# SGD with momentum
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

# 1cycle implement in Torch as OneCycleLR
scheduler = OneCycleLR(
    optimizer,
    max_lr=0.1,                      # Peak learning rate
    epochs=100,                      # Total epochs
    steps_per_epoch=len(train_loader),  # Batches per epoch
    pct_start=0.3,                   # 30% warmup, 70% annealing
    anneal_strategy='cos',           # Cosine annealing
    div_factor=25.0,                 # Initial LR = max_lr/25
    final_div_factor=1e4             # Final LR = max_lr/10000
)

# training loop
for epoch in range(100):
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        # CRITICAL for 1cycle: step scheduler after each batch, not each epoch!
        scheduler.step()

The training is very fast! Why do you think that is the case?

Now let's evaluate the performance of the model. We can turn off the gradient tracking for the evaluation.

Instead of simply asking whether the model got the classification correct, I want to know the accuracy for each one of the target classes individually.

In [None]:
model.eval() # Set the model to evaluation mode

# Initialize variables for overall accuracy
correct = 0
total = 0

# Initialize variables for per-class accuracy
num_classes = len(iris.target_names)
correct_pred = list(0. for i in range(num_classes))
total_pred = list(0. for i in range(num_classes))

with torch.no_grad(): # Disable gradient calculation during inference
    for data, target in test_loader:
        outputs = model(data)
        _, predicted = torch.max(outputs.data, 1)

        # Overall accuracy
        total += target.size(0)
        correct += (predicted == target).sum().item()

        # Per-class accuracy
        for label in range(num_classes):
            label_mask = (target == label)
            total_pred[label] += label_mask.sum().item()
            correct_pred[label] += ((predicted == label) & label_mask).sum().item()

overall_accuracy = 100 * correct / total
print(f'Overall Accuracy of the model on the test data: {overall_accuracy:.2f}%')

print('\nAccuracy for each class:')
for i, class_name in enumerate(iris.target_names):
    if total_pred[i] > 0:
        class_accuracy = 100 * correct_pred[i] / total_pred[i]
        print(f'  {class_name}: {class_accuracy:.2f}%')
    else:
        print(f'  {class_name}: No samples in test set')

Does this result look reasonable to you? Are the results what you expected from the DNN?

In [None]:
# Or try this model with ELU
model = nn.Sequential(
    nn.Linear(4, 10),
    nn.ELU(alpha=1.0),  # alpha is optional, default is 1.0
    nn.Linear(10, 10),
    nn.ELU(alpha=1.0),
    nn.Linear(10, 10),
    nn.ELU(alpha=1.0),
    nn.Linear(10, 3)
)