# Minimal PyTorch Example



This notebooks shows a very minimal example on how to use PyTorch for training a neural network on the Iris data set.

Note: This notebook is inspired by https://jamesmccaffrey.wordpress.com/2020/05/22/a-minimal-pytorch-complete-example/

### 0. Preamble

In [1]:
import numpy as np
import torch
import torch.nn.functional as F
import torch.nn as nn
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split

torch.manual_seed(1)
np.random.seed(1)

The following lines checks for GPU availability on the machine and sets the GPU as processing device (if available).
If you are on Colab you can enable GPU support in the menu via  "Runtime > Change runtime type" and select "GPU" as hardware accelerator.

In [2]:
if(torch.cuda.is_available()):
  processing_chip = "cuda:0"
  print(f"{torch.cuda.get_device_name(0)} available")
else:
  processing_chip = "cpu"
  print("No GPU available")

device = torch.device(processing_chip)
device

No GPU available


device(type='cpu')

### 1. Data Preperation

For this small example we use the [Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set). The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on these four features, we want to train a model that can predict the species.

In the first step we load the data into a Pandas.

In [3]:
url = 'data/iris.csv'
dataset = pd.read_csv(url)
dataset.head(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


To be able to train a model, we first need to transform the *species* column into a numeric:

In [4]:
dataset.loc[dataset.species=='Iris-setosa', 'species'] = 0
dataset.loc[dataset.species=='Iris-versicolor', 'species'] = 1
dataset.loc[dataset.species=='Iris-virginica', 'species'] = 2
dataset.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Next, we specify which columns we want to use as features and which as label:

In [5]:
X = dataset[dataset.columns[0:4]].values
y = dataset.species.values.astype(int)

We then split our data into training and test data.

In [6]:
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2)
print(train_X.shape, test_X.shape)

(120, 4) (30, 4)


To be able to use the data in PyTorch, we need to convert them into PyTorch tensors. Such a tensor can be thought of an efficient way to represent lists and matrices (similar to Numpy), with the additional benefit that they can be moved to the GPU (the `.to(device)` part in the code below) and that they support automatic backpropagation (more on this later):

In [7]:
train_x = torch.Tensor(train_X).float().to(device)
test_x = torch.Tensor(test_X).float().to(device)
train_y =torch.Tensor(train_y).long().to(device)
test_y = torch.Tensor(test_y).long().to(device)

### 2. Model definition
We define now the strucutre of our neural network. For this we create a class that is a subclass from PyTorch's `nn.Module`.
By convention we put in the `__init__` method the layers we want to use in the network and in the `forward` method how data flows through this network.

Our network has 4 input features, 7 hidden layer nodes and 3 output neurons. The hidden layer uses a Relu activation function. Note that the output layer does not have a softmax activation (unlike we have seen it in the lecture). It rather gives out a raw score for each class (more on this later). 


In [16]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.hidden1 = nn.Linear(4, 7) 
        self.hidden2 = nn.Linear(7, 5) # change for Task 2
        self.output = nn.Linear(5, 3) # change for Task 2

    def forward(self, x):
        z1 = self.hidden1(x)
        z2 = F.relu(z1)
        z3 = self.hidden2(z2) # change for Task 2
        z4 = F.relu(z3)       # change for Task 2
        z5 = self.output(z4)  # change for Task 2; no softmax. see CrossEntropyLoss() 
        return z5

### 3. Model Training
We can now start training our network. We run several epochs in which we first predict on the training data with our network and than backpropagate the loss. For this we use PyTorch's build-in optimizer that runs gradient descent on the weights of the network. Hence, in every episode we reduce the loss on the training data and improve our network.

As loss function we use cross entropy, which consumes the raw scores from the prediction and internally applies a softmax (that is why we do not need the softmax as last layer in the network).

Note that all training data is passed at once to our network (line `net(train_x)`), since PyTorch will predict on all data points in parallel. 

In [17]:
# create network, move it to device (either CPU or GPU), and set it to training mode
net = Net().to(device)
net.train()

Net(
  (hidden1): Linear(in_features=4, out_features=7, bias=True)
  (hidden2): Linear(in_features=7, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=3, bias=True)
)

In [21]:
# define the parameters for training
no_epochs = 400 # change for Task 2
learning_rate = 0.04
loss_func = nn.CrossEntropyLoss()  # applies softmax() internally
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)

print("\nStarting training ")

train_losses = []
for epoch in range(0, no_epochs):

    optimizer.zero_grad()  # set gradients to zero 
    predictions = net(train_x)  # predict on the training data, this calls net.forward() 

    loss = loss_func(predictions, train_y)  # compute loss between prediction and true labels
    loss.backward() # calculate the gradients for every weight
    optimizer.step() # do one step of gradient descent

    train_losses.append(loss.item())

    if epoch % 10 == 0:
        print(f"Loss in epoch {epoch} is {loss.item()}")

print("Done training ")


Starting training 
Loss in epoch 0 is 0.9038964509963989
Loss in epoch 10 is 0.8249509930610657
Loss in epoch 20 is 0.7307508587837219
Loss in epoch 30 is 0.6387642025947571
Loss in epoch 40 is 0.5645408034324646
Loss in epoch 50 is 0.5088369846343994
Loss in epoch 60 is 0.4654156267642975
Loss in epoch 70 is 0.4292605221271515
Loss in epoch 80 is 0.39734676480293274
Loss in epoch 90 is 0.3678705394268036
Loss in epoch 100 is 0.3398509621620178
Loss in epoch 110 is 0.31288284063339233
Loss in epoch 120 is 0.28715163469314575
Loss in epoch 130 is 0.2630310654640198
Loss in epoch 140 is 0.24091897904872894
Loss in epoch 150 is 0.22426123917102814
Loss in epoch 160 is 0.24908393621444702
Loss in epoch 170 is 0.3230430781841278
Loss in epoch 180 is 0.29318034648895264
Loss in epoch 190 is 0.2754005193710327
Loss in epoch 200 is 0.2671451270580292
Loss in epoch 210 is 0.26012489199638367
Loss in epoch 220 is 0.25110501050949097
Loss in epoch 230 is 0.23860317468643188
Loss in epoch 240 is 

fig = plt.figure()
plt.plot(range(0, no_epochs), train_losses, color='blue')
plt.legend(['Train Loss'], loc='upper right')
plt.xlabel('number of epochs')
plt.ylabel('loss')

### 4. Model Evaluation
Finally, we check the model accuracy on the test data. For this we predict on the test data, identify the class with the highest score and compare it to the true label.

In [23]:
net.eval() # set the network to evaluation mode
predictions = net(test_x)
predicted = torch.argmax(predictions.data, 1) # get the class with highest score
correct = (predicted == test_y).sum().item() # compare predicted class with real class
print(f"Accuarcy is {100. * correct / len(test_x)}%")

Accuarcy is 90.0%


### Solution Task 2
Already directly done in the code above.

## 5. Solution Task 3
First, transform the features into a torch tensor:

In [24]:
x = [4.9, 3.0, 1.4, 0.2]
torch_x = torch.Tensor(x).float().to(device)
torch_x.shape

torch.Size([4])

Then, get the prediction (raw values, since we did not use a softmax inside the network)

In [25]:
y_pred = net(torch_x)
y_pred

tensor([  8.0056,   2.8565, -12.7938], grad_fn=<AddBackward0>)

Finally: Apply softmax to get class probabilities.

In [26]:
softmax = nn.Softmax(dim=0)
softmax(y_pred)

tensor([9.9423e-01, 5.7711e-03, 9.2139e-10], grad_fn=<SoftmaxBackward0>)