In [None]:
# Comment the following lines if you're not in colab:
from google.colab import drive
drive.mount('/content/drive')
# If you're in colab, cd to your own working directory here:
%cd ..//..//content//drive//MyDrive//Colab-Notebooks//HY-673-Tutorials//Tutorial-2

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Colab-Notebooks/HY-673-Tutorials/Tutorial-2


# <u>Preliminaries</u>

In [None]:
import os
import pickle
import numpy as np
import torch as tc
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

We can start by importing the pickled data that we made previously:

In [None]:
pickledatapath = os.path.join('data', 'iris_data.pkl')

with open(pickledatapath, 'rb') as f:
    x_train, x_test, y_train, y_test = pickle.load(f)

Verify that all the data have the correct dimensionality:

In [None]:
print(f"x_train.shape = {x_train.shape}, y_train.shape = {y_train.shape}")
print(f"x_test.shape  = {x_test.shape},  y_test.shape  = {y_test.shape}")

x_train.shape = (100, 4), y_train.shape = (100,)
x_test.shape  = (50, 4),  y_test.shape  = (50,)


By default, PyTorch initializes tensors as 32-bit floats, but, we can change this behavior in order for PyTorch to default to 64-bits instead. This is a tradoff between accuracy and increased memory usage and computational costs. In our case, it won't be necessary. The standard way to do so, if needed, is:

In [None]:
# tc.set_default_dtype(tc.float64)

Let's set the PRNG seeds for reproducibility:

In [None]:
seed = 42
tc.manual_seed(seed)
np.random.seed(seed)

Our dataset class from the previous notebook was:

In [None]:
class MyIrisDataset(Dataset):

    def __init__(self, x_data, y_data):
        super().__init__()
        assert len(x_data) == len(y_data)
        self.x_data = tc.tensor(x_data, dtype=tc.float32)
        self.y_data = tc.tensor(y_data, dtype=tc.long)

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

Let's check again that it works as intended:

In [None]:
batch_size = 16

train_dataset = MyIrisDataset(x_data=x_train, y_data=y_train)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

x_batch, y_batch = next(iter(train_loader))
print(f"x_batch.shape = {x_batch.shape}, y_batch.shape = {y_batch.shape}")

x_batch.shape = torch.Size([16, 4]), y_batch.shape = torch.Size([16])


## <u>A Very Simple Neural Network in PyTorch</u>

A neural network model in PyTorch is an extension of the class `torch.nn.Module`. In order to define this class, we need to specify at least these 2 methods:

- `__init__`: The constructor method. Here, we store variables, e.g., the number of features, classes, and whatever parameters are necessary for us to create a model object. Most importantly, we also create the building blocks of our model, e.g., layers, activations, etc., and define how they connect with one another. Instead of defining the entire architecture here, there is the option to use the next method (forward) to connect the layers if want. Here, we will use `Sequential` blocks, so all layers will be automatically connected in series.
- `forward`: Here, we declare what your model does during the forward pass given an input batch. In case of a simple sequential model, it'll just linearly pass through the architecture defined in the constructor (with a single call).



In [None]:
class MyMultiLayerPerceptron(nn.Module):

    def __init__(self, n_features, n_classes):

        super().__init__() # initialize the attributes of the parent class

        self.n_features = n_features
        self.n_classes = n_classes

        self.model = nn.Sequential(
            nn.Linear(in_features=self.n_features, out_features=5),
            nn.ReLU(),
            nn.Linear(in_features=5, out_features=self.n_classes),
        )

    def forward(self, x):
        return self.model(x)

To initialize a model object, we just need to create an instance of the above class. It is a helful test to just pass a batch through the network to see if we get any errors, and to see we get the expected output shape etc. Here, we have 3 classes, so we expect to get a batch of triplets for each input batch:

In [None]:
model = MyMultiLayerPerceptron(n_features=4, n_classes=3)

pred_batch = model(x_batch)
print(f"Example of this model's output before training:\n{pred_batch}\n{pred_batch.shape}")

Example of this model's output before training:
tensor([[0.7879, 1.3486, 0.5005],
        [0.5511, 0.9441, 0.4546],
        [0.4981, 0.9760, 0.4541],
        [0.8637, 1.3947, 0.5281],
        [0.5613, 1.0368, 0.4821],
        [0.9687, 1.6414, 0.5729],
        [1.0010, 1.6669, 0.5744],
        [0.8306, 1.3054, 0.5114],
        [1.0798, 1.8091, 0.6078],
        [0.8793, 1.4977, 0.5331],
        [0.9335, 1.5683, 0.5631],
        [0.9781, 1.6748, 0.5800],
        [0.4586, 0.8746, 0.4169],
        [0.5203, 0.9360, 0.4530],
        [1.0170, 1.7536, 0.5776],
        [0.7678, 1.2238, 0.4867]], grad_fn=<AddmmBackward0>)
torch.Size([16, 3])


We can get a basic summary of this model's architecture simply by printing the model object:

In [None]:
print(model)

MyMultiLayerPerceptron(
  (model): Sequential(
    (0): Linear(in_features=4, out_features=5, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5, out_features=3, bias=True)
  )
)


Unfortunately, this is a very simple summary, so, in case of more complicated models, it will not be very useful. A better alternative, for instance, is to use the function `summary()` from the `tochsummary` or `torchinfo` library https://pypi.org/project/torch-summary/ (needs installation in your local environment):

In [None]:
from torchsummary import summary
summary(model=model, input_size=x_batch.shape)
# Warning: Calling this function requires the default tensor type to be float32.
# If you find a way to use it with default type float64, tell me as well.

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                [-1, 16, 5]              25
              ReLU-2                [-1, 16, 5]               0
            Linear-3                [-1, 16, 3]              18
Total params: 43
Trainable params: 43
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


Let's also show how one can get a model's trainable parameters in a more manual way without additional libraries. Firstly, we have the `model.parameters()` method:

In [None]:
model.parameters()

<generator object Module.parameters at 0x7d32be440510>

This is a generator object, meaning that we can get parameters by making it an iterator and calling `next()`:

In [None]:
param_iter = iter(model.parameters())

some_params = next(param_iter)
print(f"{some_params}\n{some_params.shape}\n")

some_more_params = next(param_iter)
print(f"{some_more_params}\n{some_more_params.shape}")

Parameter containing:
tensor([[-0.1096,  0.1009, -0.2434,  0.2936],
        [ 0.4408, -0.3668,  0.4346,  0.0936],
        [ 0.3694,  0.0677,  0.2411, -0.0706],
        [ 0.3854,  0.0739, -0.2334,  0.1274],
        [-0.2304, -0.0586, -0.2031,  0.3317]], requires_grad=True)
torch.Size([5, 4])

Parameter containing:
tensor([-0.3947, -0.2305, -0.1412, -0.3006,  0.0472], requires_grad=True)
torch.Size([5])


The first extracted parameters have shape (5, 4), meaning that they are 5 * 4 = 20 in total, and the next parameters are 5. But, that's just one set of parameters: the weights of the first linear layer are 20, and the bias parameters are 5. Hence, the first layer has a total of 20 + 5 = 25 parameters (confirmed by `torchsummary`).

In order to get the total number of parameters of the entire model, we can just multiply the shape of each set of parameters and compute the overall sum:

In [None]:
n_params = np.sum([np.prod(param.shape) for param in model.parameters()])
print(f"Total number of parameters in our NN: {n_params}")

Total number of parameters in our NN: 43


This number matches what `torchsummary` gave us.

To also get the names, we can use the `model.named_parameters()` method, and print it with key-value pairs in a dictionary. Before that, however, in order to use it we need to `detatch()` it from the computational graph of the network:

In [None]:
param_dict = {name: params.detach() for name, params in model.named_parameters()}

# If you want to print the whole dictionary:
# with np.printoptions(precision=2, suppress=True):
#    print(param_dict)

for name, params in param_dict.items():
    _, layer, weight_type = name.split(".")
    print(f"Layer# {layer} ({weight_type}):\t{np.prod(params.shape):>3} params")

Layer# 0 (weight):	 20 params
Layer# 0 (bias):	  5 params
Layer# 2 (weight):	 15 params
Layer# 2 (bias):	  3 params


As you can see, our ReLU activation has been omitted since it does not introduce any parameters.

### <u>Additional Things Before Training</u>


We need to define our *loss function*. Here, we'll use cross-entropy: <br>(check theory and docs to see how the cross-entropy loss works) <br> https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html:

In [None]:
loss_fn = nn.CrossEntropyLoss()

Next, we need to define our *learning rate* and *optimizer*. Let us use regular (mini-batch) SGD for this simple case:

In [None]:
lr = 1e-2
optimizer = tc.optim.SGD(params=model.parameters(), lr=lr)

Finally, a function to evaluate our model's success. In classification problems, accuracy is a common metric, and is essentially the average number of samples that are correctly predicted by our model:

\begin{equation}
\text{Accuracy} = \frac{\text{correct predictions}}{\text{total predictions}}=\frac{1}{N} \sum_{i=1}^{N} 𝟙 \{\hat{y}_i = y_i \},
\end{equation}

where $\hat{y}$ are the predicted labels, $y$ are the true labels, and 𝟙 is the indicator function. <br>
Side note: This is a separate issue, but this metric alone does not give the entire picture of how good a classifier is. That is why other metrics, i.e, recall, precision, f1 score, are also necessary, but, we'll stick with just accuracy now for simplicity.

The above equation can be easily computed in Python by using `np.mean(predicted_labels == true_labels)`:

In [None]:
def model_accuracy(x, y):
    x = tc.tensor(x, dtype=tc.float32)  # cast to tensor
    out_tensor = model(x)               # get model's output scores
    out = out_tensor.detach().numpy()   # cast to numpy
    pred = np.argmax(out, axis=1)
    return np.mean(pred == y)

If we print the training and test accuracies before training our model, we will see that its behavior is essentially equivalent to random guessing, i.e., they will both be approximately $1/3$ in our case of $3$ classes:

In [None]:
train_acc = model_accuracy(x_train, y_train)
print(f"Train accuracy before training: {train_acc}")

test_acc  = model_accuracy(x_test, y_test)
print(f"Test accuracy before training:  {test_acc}")

Train accuracy before training: 0.33
Test accuracy before training:  0.34


### <u>Training</u>

Finally, time to train our model. We do that by iterating the `train_loader` for multiple *epochs*. An epoch usually means iterating through the entire training dataset once. Here's a brief summary of the steps we will follow **for each batch**:

1) Reset the gradient stored in the optimizer with `optimizer.zero_grad()`. By default, your optimizer keeps the sums of all the gradients computed so far, so at each new iteration you normally want to clear it.

2) Feed the batch to the model, and obtain the model's output scores, a.k.a. logits, which should be a tensor of size `(batch_size, n_classes)`. The higher a score is, the higher the model's belief is that this is the correct class.

3) Calculate the cross-entropy loss between the logits and the true labels.

4) Call `loss_batch.backwards()` to perform the backpropagation and compute the gradient, which is automatically added to the optimizer.

5) Use `optimizer.step()` to update the parameters based on the current gradient. In the standard SGD, this is simply:
\begin{equation}
w \gets w - \eta \cdot \nabla_w \mathcal{L},
\end{equation}
where $\nabla_w \mathcal{L}$ is the gradient of the loss w.r.t. the weights $w$, and $\eta$ is the learning rate. In case one uses the *Adam* optimizer, for instance, there are some additional adaptive momentum (hence the name) terms that are taken into account, but, this is a separate topic.

In [None]:
n_epochs = 50

for _ in range(n_epochs):

    for x, y_true in train_loader:

        optimizer.zero_grad()
        scores = model(x)
        loss = loss_fn(scores, y_true)
        loss.backward()
        optimizer.step()

If everything went smoothly, our model should now have a much better accuracy. We are mainly interested in the *test accuracy*, but we should compute both to see the gap between them. That can help to determine if the model is <i>overfitting</i> the training data:

In [None]:
train_acc = model_accuracy(x_train, y_train)
print(f"Train accuracy after training: {train_acc}")

test_acc  = model_accuracy(x_test, y_test)
print(f"Test accuracy after training:  {test_acc}")

Train accuracy after training: 0.93
Test accuracy after training:  0.88


## <u>Homework (Optional)</u>

Neural networks have various "settings", or <i>hyperparameters</i> that you can change, e.g., the learning rate, batch size, etc. You should get acquainted with them a get an idea of what role they play in the training process. Start by toying with the hyperparameters of this model and see what happens when you change:
- Batch size
- Learning rate
- Number of epochs
- Initialization (e.g., change the random seed to see how sensitive the model is to initialization)
- Architecture (more/less layers, bigger/smaller layers, on/off biases, other types of layers, activations, etc.)
- Many more (weight initialization, optimizer, regularization, etc.)

<u>**You may want to see the pdf file of this tutorial for more details regarding everything we have said.**</u>