<a href="https://colab.research.google.com/github/marinarhianna/python-tutorials/blob/main/AI_Part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Teaching our AI how to think üëæ


---
In this notebook, we will see how to train and test our network.



*   Read through the descriptions carefully!
*   This is advanced stuff, so ask Marina if anything is confusing üò∏
*   Run each cell, and at the end there is a challenge for you to try some things out to improve our AI's performance.
*   Happy coding!


### Code from previous notebook
This cell contains all the code from the previous notebook in one cell. We need to run this before doing anything else!

I have made the training and testing sets a bit larger here, just so the network has enough data to learn from, and also implemented some normalization to the datasets, to help the network learn patterns. In machine learning, it is very common to normalize data.

In [1]:
# import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

# create training and testing sets
X_train = [
    [1, 2, 3, 4], [4, 3, 2, 1], [2, 2, 3, 3], [1, 1, 2, 2],  # small numbers
    [100, 200, 300, 400], [400, 300, 200, 100], [150, 150, 200, 200], [120, 130, 140, 150]  # large numbers
]
y_train = [0, 0, 0, 0, 1, 1, 1, 1]   # labels

X_test = [
    [2, 3, 4, 5], [3, 2, 1, 0], [180, 190, 200, 210], [220, 230, 240, 250]
]
y_test = [0, 0, 1, 1]

# normalize all values to be between 0 and 1
X_train = np.array(X_train) / 400.0
X_test = np.array(X_test) / 400.0

# convert to tensor form
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).unsqueeze(1)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# define function
class MyNetwork(nn.Module):
    def __init__(self):
        super(MyNetwork, self).__init__()
        self.conv1 = nn.Conv1d(1, 4, 2)
        self.fc1 = nn.Linear(4 * 3, 2)

    def forward(self, x):
        x = self.conv1(x)
        x = torch.relu(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

### Step 1 - Create a model using our blueprintüó∫

---



In [None]:
# create a model by calling our class
model = MyNetwork()

# check out the architecture of our network
print(model)

The above print statement allows us to quickly see the structure of our neural network.

### Step 2 - Define network parameters üñä


---


There are a few important **parameters** of our network that we need to set before we can train it.

Some important things to remember:


*   A neural network is trained by assigning **weights** to pieces of information, to tell the network which **features** are important.
*   These weights start off **random**.
*   As information passes **forward** through the network, it makes a random **guess**, and then checks the **real** label.
*   Depending on how close the guess is to the **truth**, it feeds the information **backwards** and **updates** the weights accordingly.
*   This process is repeated until the **predictions match the truth**.

So, let's take a look at the network parameters that help us do this:


*   **Loss function**: this is how we measure the difference between the **predictions** our network makes and the **real** labels. It tells us **how far away we are from the truth**. A perfect prediction would have a loss of zero, and a bad prediction would be much higher loss.
*   **Optimizer**: this is how we **update the weights** to improve our network's performance. The optimizer calculates the **gradient** of the loss function to find out what direction we need to go down to make the loss smaller.
* **Learning rate**: this sets how much we are allowed to change the weights by. We have to tell the optimizer how many steps in either direction it is allowed to take once it has calculated the gradient of the loss.

Now, how do we actually set these parameters in code form?


*   We name the loss function `criterion`, and call a function from our `nn` package called `CrossEntropyLoss`.
    * Cross-entropy is just an equation where if the prediction is correct, it outputs a zero. If you're curious to see what this equation actually looks like, ask Marina!  
*   Recall that earlier, we imported `torch.optim` as `optim` (the same way that we imported `torch.nn` as `nn`).
    * To set up our `optimizer` parameter, we call a function named `Adam` from our `optim` package.
    * `Adam` is short for "Adaptive Moment Estimation" and it works by cleverly tracking the magnitude of the current gradient, as well as previous gradients. The maths behind `Adam` is quite nasty, but luckily Python has it all built in already so we can just call it as a function.

Within the `optim.Adam()` function, we need to tell it:


1.   **Where to find our neural network:** earlier, we created a shortcut to our network by naming it `model = MyNetwork()`. Remember that `MyNetwork()` is what we named our `class` at the very beginning. So, we can say `model.parameters()` which tells Python that `model` is the network that we are defining the parameters for.
2.   **Learning rate**: learning rate is abbreviated to `lr=`, where common numbers to choose are 0.1, 0.01, 0.0001, etc. It is often a case of trial and error to see which learning rate produces the best performing network.


In [20]:
# set our loss function
criterion = nn.CrossEntropyLoss()

# set our optimizer and learning rate
optimizer = optim.Adam(model.parameters(), lr=0.01)

### Step 3 - Train our AI üèã


---

Before we start training, let's go over some terminology.

Remember that information gets passed **forwards** and **backwards** through the network.
* In the **forward** pass, the network makes a **prediction** and calculates the **loss** function.
* In the **backwards** pass, the optimizer calculates the **gradient** of the loss and takes a step in a particular direction to make the loss smaller.

To train a neural network, we pass the training dataset through the network multiple times.
An **epoch** is how many times we pass our training dataset through the network.

In the backwards pass, we need to make sure that the optimizer calculates the gradient of the loss **from scratch** each time it loops through the training set. Otherwise, the gradients layer on top of each other and the network gets muddled about how wrong the predictions are. To do this, we tell Python to `.zero_grad()` on the `optimizer`, to make sure we are starting from zero at each loop.

Here is a chance to introduce you to a couple of ways to make a print statement more helpful and neat.

In our training loop, we want to print out at each epoch what the loss is, so we can see whether the loss decreases as the network runs through the training dataset. Remember, a loss of zero is a perfect guess, so we want the loss to decrease as much as possible.

To retrieve the loss from the code, we need to say `loss.item()` to find the actual number.

From the previous notebooks, one way to print this information would be:

`print("Epoch:", epoch, "Loss:", loss.item())`

However, the loss might be a very long number. It is helpful to tell Python to round this to a certain number of decimal places.

Say we wanted to round a variable to 3 decimal places. We would add `:3f` to the end of it.

Also, in the print statement above, there are a lot of quotation marks and commas. This can be tedious to type out.

There is a handy shortcut for this called an f-string. By putting an `f` before the quotation marks, it means that we only need one set of quotation marks, and no commas at all. To let Python know when we want to insert a variable, instead of separating them with commas, we can just put the variable inside curly brackets.

So, the above print statement can be written as:

`print(f"Epoch: {epoch} Loss: {loss.item():3f}")`


Putting all of this together, below is an example of a training loop for our neural network.

In [None]:
# set a number of epochs
epochs = 5

# create a variable to store losses
losses = []

# create a loop for training
for epoch in range(epochs):
    # call torch's training function on our model
    model.train()

    ### FORWARD PASS:
    # make a prediction
    y_pred = model(X_train_tensor)
    # calculate the loss function
    loss = criterion(y_pred, y_train_tensor)

    ### BACKWARD PASS:
    # make sure the gradients are calculated from scratch
    optimizer.zero_grad()
    # find out how to improve the loss
    loss.backward()
    # update the weights
    optimizer.step()

    # add the losses to the variable
    losses.append(loss.item())

    # print progress
    print(f"Epoch {epoch} Loss: {loss.item():.4f}")

### Step 4 - Testing our AI üß™

---

Now, let's see how this initial model performs on data it has never seen before.

Some things to note before we test our AI:

* We need to tell our model not to calculate any gradients this time, because we are no longer updating the weights. To do this, we can specify
`with torch.no_grad():` before we do our testing.
* The predictions that our network makes are stored in a variable `y_pred_test`. These predictions are structured as a score for each label, with the higher score being the more confident prediction.
    * For example, we have labels *small* and *big*, stored as `[0, 1]` where 0 is small and 1 is big. So, if the model was predicting that this item was labelled as big (or 1), an example of the prediction would be `[0.4, 3.9]`, where the 0 is scored at 0.4 and the 1 is scored at 3.9. Since 3.9 is higher than 0.4, this would mean the model is more confident that the item is labelled as 1, meaning it is big.
    * To retrieve the prediction from this list of scores, we need to tell Python to pick the highest number out of the list. To do this, we call a function `torch.argmax()`, which picks out the maximum *argument* (aka item or number) within a list.

In [22]:
# start the evaluation process
model.eval()

# do not calculate any gradients
with torch.no_grad():
    # make a prediction
    y_pred_test = model(X_test_tensor)
    # get the highest scoring prediction
    predicted_labels = torch.argmax(y_pred_test, dim=1)

### Step 5 - Assess our AI's performance üíÉ

---



Now, let's break down the network's predictions. We can use `.tolist()` to convert the prediction tensor into a list form. Remember, there were 4 groups in our test set, so the below code shows us what the predictions VS real labels were for each of the 4 groups.

In [None]:
# show predictions
print("Predicted:", predicted_labels.tolist())

# show actual labels
print("Actual:", y_test)

Now, we can use a special function called `softmax` which converts the confidence scores from earlier into actual probabilities, for us to see how confident the predictions were.

`softmax` presents this information to us in tensor form, meaning it will show us a 2x4 **matrix**. A matrix is just a vector with more dimensions!

In other words, it will show us each of our 4 test groups, with 2 probabilities in each group. These 2 probabilities are for each label. For example, if it shows us:



```
tensor([[0.5, 0.5],
        [1, 0],
        [0.3, 0.7],
        [0, 1]])
```

This means for the first group, it wasn't more confident in either label. In other words, it took a 50/50 guess.

For the second group, it was 100% confident that it was the first label.

For the third group, it was more confident that it was the second label.

For the last group, it was 100% confident that it was the second label.

This is just an example!!!

Run the cell below to see what the actual prediction probabilities were.


In [None]:
# print prediction probabilities
softmax = torch.nn.Softmax(dim=1)
probabilities = softmax(y_pred_test)
print("Prediction Probabilities:")
print(probabilities)

Have a chat about what these probabilities mean for the performance of the network so far. How well is it really learning?

Next, let's calculate and print the overall accuracy of our network.

Accuracy (%) = (Correct / Total) $\times$ 100

To find the **Correct** number of predictions:

* Check which predictions are correct by comparing the predicted labels to the true labels. In Python, if we say `x == y`, it will output a `True` or `False` depending on whether the statement holds. So, if we compare `predicted_labels == y_test_tensor `, it will output something like `[True, True, False, True]`.
* We then add these up to get how many were `True` by implementing a function `.sum()`.
* We then convert this collection of `True`s into a number by implementing a function `.item() `. Now we have a number for the correct predictions.

To find the **Total**:
* We just need the length of the group containing the labels: `len(y_test_tensor)`.

Then, we can multiply this by 100 to get the percentage.


In [None]:
# calculate correct predictions
correct = (predicted_labels == y_test_tensor).sum().item()

# get total
total = len(y_test_tensor)

# calculate accuracy
accuracy = correct / total

# print accuracy
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Let's have a deeper look at how our AI is really learning.

It is useful for us to visualise how the predictions are improving with each pass over the training dataset.

In [None]:
# plot Loss against Epoch
plt.plot(losses)
plt.title("Training Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

### Fun Plot: Confusion Matrix üßê

A confusion matrix is another way to visualise the network's performance.

*   It compares the prediction (x-axis) with the real label (y-axis).
*   Run the cells and have a look at the plot.
*   Ask Marina if you feel confused!! (no pun intended...)



In [10]:
# import some extras
import seaborn as sns
from sklearn.metrics import confusion_matrix

In [None]:
# create confusion matrix -- don't worry too much about the details!
cm = confusion_matrix(y_test, predicted_labels.numpy(), labels=[0, 1])
cm_percent = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] * 100
labels = ["Small", "Large"]
sns.heatmap(cm_percent, annot=True, cmap='Purples', fmt='.1f', xticklabels=labels, yticklabels=labels)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix (%)")
plt.show()

### üéÆ CHALLENGE: Minimize the loss as much as you can! ‚¨á


IMPORTANT NOTE: You MUST re-run all of the cells starting from Step 1 (the one beginning with `model = MyNetwork()`) before you re-run the training loop after making any changes.

Order of things to do:


1.   Change a parameter listed below.
2.   Do not run the training cell. Scroll up to Step 1, and run that one.
3.   Then, run the Step 2 cell.
4.   Then, run the Step 3 cell.

This is important, because otherwise the model already has memory from the previous training, and we won't be able to assess what are the optimal parameter choices for the best results.


Some things for you to try changing:
* Number of epochs
* Learning rate

Make a note of which combinations you try, and what the lowest loss is at the end! Remember, we are aiming to get as close to zero as possible.

Once you have a low loss, test the network and see what your accuracy is.

Can you get an accuracy of 100%?

See how the other plots change with your adapted parameters!