<a href="https://colab.research.google.com/github/silvalugo/deep_learning/blob/main/02_1_code_a_perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Practicum AI Logo image](images/practicum_ai_logo.png) <img src='images/practicumai_deep_learning.png' alt='Practicum AI: Deep Learning Foundations icon' align='right' width=50>


***
# *Practicum AI:* Deep Learning - Perceptron


> This exercise adapted from the [W3 Schools Perceptrons](https://www.w3schools.com/ai/ai_perceptrons.asp) article and from Baig et al. (2020) The Deep Learning Workshop from [Packt Publishers](https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856) (Exercise 2.01, page 55).

<img alt="A cartoon of Dr. Amelia, a nutrition researcher, sitting at a computer thinking about food items which appear in a thought bubble." src="images/DrAmelia.jpg" align="right" width=250>Amelia is back! This time, she needs your help to analyze some of her survey data. As part of Amelia's dietary study, participants are also asked to follow a special nutrition plan, the Dr. Amelia Recommended Nutrition Plan (the DARN Plan). We'll use a simple [perceptron](https://developers.google.com/machine-learning/glossary#perceptron) to predict if participants follow the DARN Plan.

**Note:** Dr. Amelia's cartoon was generated with AI's assistance.

As a note, this exercise lies somewhere between coding everything from scratch and relying on the pre-coded APIs (Application Programming Interfaces) that underlie the power of TensorFlow, Keras, and Pytorch. **You will not need to create weight tensors beyond this exercise**. Still, hopefully, by doing it this time, you will have a better understanding (*and appreciation*) of the details often lost in an API call to `model.fit()`, for example.

The table below shows some data Amelia has gathered from participant surveys about their nutrition. She is looking at how different factors predict if participants follow her DARN Plan ($y$, the output or [labels](https://developers.google.com/machine-learning/glossary#label) in our example) based on three input variables: if participants submit photos of three meals a day ($x_1$), if participants report being satisfied with their food choices ($x_2$), and if participants report being generally happy ($x_3$). We will combine $x_1$, $x_2$, and $x_3$ into our input tensor $X$. Here, we are simplifying the question of the likelihood of following the DARN Plan to a Yes/No.

Case # | Photos of 3 meals submitted? ($x_1$) | Satisfied with food choices? ($x_2$) | Generally happy? ($x_3$) | Following the DARN Plan? ($y$)
--|--------------------------|---------------------|-----------------------|----------------
1 | 1 (Yes) | 1 (Yes) | 1 (Yes) | Yes (1)
2 | 0 (No) | 1 (Yes) | 1 (Yes) | Yes (1)
3 | 1 (Yes) | 0 (No) | 1 (Yes) | Yes (1)
4 | 0 (No) | 0 (No) | 1 (Yes) | Yes (1)
5 | 1 (Yes) | 1 (Yes) | 0 (No) | Yes (1)
6 | 0 (No) | 1 (Yes) | 0 (No) | No (0)
7 | 1 (Yes) | 0 (No) | 0 (No) | No (0)
8 | 0 (No) | 0 (No) | 0 (No) | No (0)


## 1. Import libraries

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'> Note

> * We'll probably stop reminding you after this, but... remember not all red output is bad!
> * Also, remember to check that the correct kernel is selected.

In [None]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras import activations

from matplotlib import pyplot as plt

## 2. Create an input data matrix

Create a 3 x 8 matrix for our input data. Remember that we have three input variables (we'll call them $x_1$, $x_2$, and $x_3$ for now). These variables are the columns in our input data.

The matrix below has the three input columns of our data table, using just the 0/1 values corresponding to the no/yes entries in the table. The comments help match rows of the table with entries in our `X` variable. (Remember, we are using the capital letter `X` as our variable name here to remind us that this is a matrix with our input data).

In [None]:
X = tf.Variable([[1.,1.,1.], # Case 1
                 [0.,1.,1.], # Case 2
                 [1.,0.,1.], # Case 3
                 [0.,0.,1.], # Case 4
                 [1.,1.,0.], # Case 5
                 [0.,1.,0.], # Case 6
                 [1.,0.,0.], # Case 7
                 [0.,0.,0.]], # Case 8
                 dtype = tf.float32)  # 3x8, input data table
print(X)


<tf.Variable 'Variable:0' shape=(8, 3) dtype=float32, numpy=
array([[1., 1., 1.],
       [0., 1., 1.],
       [1., 0., 1.],
       [0., 0., 1.],
       [1., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 0.]], dtype=float32)>


## 3. Create a label tensor

* Create a tensor of labels to hold our 'ground truth'. This indicates, for each set of input, whether or not the participant is following the DARN Plan.

* The y variable is initially a 1D tensor (vector) with 8 elements. Reshaping it to [8, 1] converts it from a 1D vector to a 2D tensor (matrix) with 8 rows and 1 column.

* Reshaping is necessary because y will be used in a matrix operation, and matrices dimensions need to be aligned. The output y is represented as a column vector with the shape [num_samples, 1] or 2D instead of 1D array. By reshaping to [8, 1], we are conforming to the expected format for further operations like loss calculation, which often involves matrices or vectors with two dimensions.

```python
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8 - one for each case in the table         
y = tf.Variable([1, 1, 1, 1, 1, 0, 0, 0], dtype = tf.float32)

# Reshape to be 8 rows of 1 column  
y = tf.reshape(y, [8,1])
print(y)
```

In [None]:
y = tf.Variable([1, 1, 1, 1, 1, 0, 0, 0], dtype = tf.float32)
y = tf.reshape(y, [8,1])
print(y)


tf.Tensor(
[[1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [0.]
 [0.]
 [0.]], shape=(8, 1), dtype=float32)


## 4. Define some constants to set the shape of the weight matrix

Define two constants to be used in the next step when we define the connections weight matrix.

We can use the number of columns in the X table to determine the number of features or how many $x_i$ we have and, therefore, how many weights we need to store (one for each feature). We only need one output value since we are looking for a binary decision about plan adherence (Yes/No).

*   Shape is a function that returns the dimension of the data frame
*   num_features = X.shape[1]: This gets the number of features (or columns) in the input data matrix X.
*   If X is a matrix with dimensions (m, n), where m is the number of rows (samples) and n is the number of columns (features), then:

    *   X.shape[0] gives the number of rows (m).
    *   X.shape[1] gives the number of columns (n), which is the total number of features.
*   output_size = 1: This indicates that the model is expected to produce a single output.
*   In a machine learning context, if output_size = 1, it often means the model is performing a regression task or binary classification, where only one output value is needed per input. For example, in a regression task, the model predicts a single continuous value, and in binary classification, it predicts a probability or a binary class label, which it is our case with neural network.



```python
num_features = X.shape[1]
output_size = 1
```

In [None]:
num_features = X.shape[1]
output_size = 1

***

## 5. Define connections weight matrix

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. The perceptron body multiplies the inputs by the weights and sums them and the bias, resulting in the output--whether or not the participant is following the DARN Plan. The three weights are highlighted here.](images/02_perceptron_section5.png)

In our feature matrix, we will need one weight for each feature, $x_i$ (three photos submitted, satisfied with food choices, etc.), labeled $X$. These weights are our $w_i$. We don't know what value they should take so we will initialize them with a random, positive number - this is one reason different runs of model training may give different answers. Another common option is to use 0 to initialize the weights, though that can have issues in training.

* tf.Variable: Wraps the tensor in a tf.Variable, making it a trainable variable. This means it can be updated during training (e.g., weights in a neural network).
* tf.random.uniform generates values spread evenly across a range (e.g., between -1 and 1), which could lead to larger values being equally likely to occur as small ones.
* This may not be ideal in some cases, as starting with larger values can slow down the learning process or lead to exploding gradients.
* tf.random.uniform([num_features, output_size]):
  * This generates a tensor of random values with a uniform distribution. The shape of the tensor is [num_features, output_size].
  * num_features: The number of rows in the tensor, which corresponds to the number of input features (independent variables). We must use num_features because we need a weigh for each variable.
  * output_size: The number of columns in the tensor, which corresponds to the size of the model’s output (number of predictions or output neurons).
  * Since we are performing a matrix multiplication later, the dimensions need to match the number of input features and outputs.

* minvalue=0:
  * Specifies that the minimum value for the uniform distribution is 0. The generated values will lie between 0 and 1 (since no maxvalue is specified, 1 is the default upper bound for tf.random.uniform).

* dtype=tf.float32:
  *Ensures that the tensor's elements are stored as 32-bit floating-point numbers.


```python
W = tf.Variable(tf.random.uniform([num_features, output_size]), minvalue=0, dtype = tf.float32)
print(W)
```

In [None]:
W =tf.Variable(tf.random.uniform([num_features, output_size]), minvalue=0, dtype = tf.float32)
print(W)


<tf.Variable 'Variable:0' shape=(3, 1) dtype=float32, numpy=
array([[0.8897692 ],
       [0.24160457],
       [0.22348213]], dtype=float32)>


***

## 6. Define bias variable

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the bias term](images/02_perceptron_section6.png)

Since we only have one neuron, we only need one bias value. Again, we'll initialize it to a random number - 0 would be another option here. We can write each bias term as $b_i$ and the matrix of all biases as $B$.

* tf.random.normal generates values centered around a mean (often 0) with a certain standard deviation, leading to more values clustered near the mean and fewer extreme values.
* This is often suitable for initializing weights and biases in deep learning models, where values close to 0 can help the model start with small, stable updates during training.
* In this case, the biases B are initialized with a normal distribution because many models perform better when weights and biases are initialized with small values near 0.
* tf.random.normal([output_size, 1]):
  * This generates a tensor with random values drawn from a normal distribution.
  * The shape of the tensor is [output_size, 1], meaning it has output_size rows and 1 column. So, the number of rows equal to the output_size =1.
  * We must use output_size because we only need one bias in this example.
* dtype=tf.float32: Ensures that the elements of the tensor are 32-bit floating-point numbers.



```python
B = tf.Variable(tf.random.normal([output_size, 1]), dtype = tf.float32)
print(B)
```

In [None]:
B = tf.Variable(tf.random.normal([output_size, 1]), dtype = tf.float32)
print(B)


<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[0.10403337]], dtype=float32)>


***

## 7. Define a perceptron function

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the perceptron body.](images/02_perceptron_section7.png)


In the following code block, we define a perceptron function with one input argument, $X$, containing our three input data features.

The function's first line implements a net input function.  It multiplies the input data matrix ($X$) by the weights ($W$) using the matrix multiplication function (matmul).  It then adds the bias ($B$) value to that product.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> This is the essential function of a neuron: gather the inputs, multiply each input by the weight for that input, add the products up and add in the bias.

The function's second line implements an activation function. The activation function determines how the neuron's output (calculated above) is changed before passing it on. Here, we use the `tanh` activation function.  However, there are other TensorFlow options.  For example, you could use the `tf.sigmoid` function.  Or, select a function from the Keras activation (`activations`) library.  Search the [Keras documentation](https://keras.io/api/layers/activations/) for a complete list of available functions.

Try out these other options, retrain the network, and see what happens.

```python
output = tf.sigmoid(z)
output = activations.relu(z)
output = activations.linear(z)
```

In [None]:
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)  # Net input function
    output = tf.tanh(z)             # Activation function
    return output

Execute the perceptron function to see its initial predictions before any training.  All of its predictions ought to be 0 (remember we set all the weights and the bias to 0 - so whatever the inputs are, they are all multiplied by 0 and have 0 added to the sum).

In [None]:
# Execute the perceptron to see its initial predictions before training.
print(perceptron(X))

tf.Tensor(
[[0.8974366 ]
 [0.51471275]
 [0.8388514 ]
 [0.31628656]
 [0.84414065]
 [0.33250135]
 [0.7589791 ]
 [0.10365966]], shape=(8, 1), dtype=float32)


## 8. Training the Perceptron

* Now that we have the elements of a simple, single-node perceptron in place, let's train the network using an algorithm called "stochastic gradient descent" (SGD). The purpose of SGD is to iteratively adjust the weights and bias parameters of the single neuron in our model and eventually, we hope, find values that make our neuron's predictions as good as possible. Tensorflow/Keras implements this algorithm for us, so we don't need to code it ourselves.

* The [learning rate](https://developers.google.com/machine-learning/glossary#learning-rate) determines the size of the steps taken towards the global minimum. Here, the Stochastic Gradient Descent (SGD) optimizer has been selected.

* The learning rate is a floating-point number that tells the gradient descent algorithm how strongly to adjust weights and biases on each iteration. For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1.

* Learning rate is a key hyperparameter. If you set the learning rate too low, training will take too long. If you set the learning rate too high, gradient descent often has trouble reaching convergence.

* optimizer = tf.optimizer.SGD(learning_rate)
  * tf: This is the TensorFlow library, which provides tools for building and training machine learning models.

  * optimizers: This is a module within TensorFlow that contains various optimization algorithms used for training machine learning models. It helps in adjusting the model parameters (like weights) to minimize the loss function.

  * SGD: This is a class (or function-like object) within the tf.optimizers module that implements the Stochastic Gradient Descent (SGD) optimization algorithm. It's used for updating model parameters based on the gradients computed during backpropagation.

In [None]:
learning_rate = 0.01
optimizer = tf.optimizers.SGD(learning_rate)

## 9. Train the perceptron for 1000 epochs

An [epoch](https://developers.google.com/machine-learning/glossary#epoch) is a complete training pass over the entire dataset. Our loss or error function is defined as a lambda function (a single-line, inline function) in the first line of code in the loop block.  We use the `sigmoid_cross_entropy_with_logits` function, an appropriate choice for this application, to calculate how far our predicted results are from the known results. We will not get into the technical details here as that is outside the scope of this learning experience. Our SGD optimizer seeks to minimize the model's total error in the second line.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> The code below uses a `for` loop. This common programming construct allows you to loop, or iterate, through
> a list of items (the numbers 0 to 999 in our case). *Implicitly*, training will use `for` loops - for each epoch do
> this thing. *Explicitly*, however, after this notebook, we will use the API that automatically does this for us.
> Thus we dropped coverage of `for` loops and other "flow control" methods from the *Python for AI* course. It's
> helpful to know about them, but they are rarely used explicitly in AI research.
> [Click here for more details](https://wiki.python.org/moin/ForLoop).
>
> The block also uses a special Python function called a `lambda` function. These are functions that can be
> written as a single line of code. [Click here for more
> details](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions).

Explanation of the script:
* Loop: The model is trained over multiple epochs:
  * for n in range(no_of_epochs:
  * This is a loop that runs for a specified number of epochs (no_of_epochs).
  * An epoch is a full pass through the training dataset.
  * This is the total number of epoch the loss will be caculated.
  * Training typically involves multiple epochs to allow the model to learn better by iteratively updating its parameters.
* Loss Calculation: For each epoch, a lambda function calculates the loss using sigmoid cross-entropy between the true labels and the model's predictions.
  * loss: It calculates the loss value for the current state of the model.
  * lambda: The function calculates the loss when called, and is necessary here because optimizer.minimize expects a function as its first argument.
  * abs(...): The abs() function takes the absolute value of the computed loss. In typical scenarios, this might be unnecessary because the cross-entropy loss is non-negative, but it ensures that the loss value is always non-negative.
  * tf.reduce_mean(...): This calculates the mean of the cross-entropy losses across all the samples in the batch, giving you a single scalar value representing the average loss.
  * tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=perceptron(X)): This computes the neural network sigmoid cross-entropy loss between the true labels y and the model's logits. This loss function is commonly used for binary classification problems.
  * perceptron(X): This is the forward pass of a neural network (or a perceptron model) applied to the input data X, producing the logits (raw, unscaled predictions) before applying any activation function.

* Optimization: The optimizer updates the model parameters (W and B) to minimize the calculated loss by the lambda function, improving the model's performance over time.
  * optimizer: This is an instance of an optimizer, such as Stochastic Gradient Descent (SGD), which adjusts the model's weights and biases to minimize the loss.
  * minimize(loss, [W, B]): This method computes the gradients of the loss with respect to the variables [W, B] (which are the model's weights and biases) and then updates these variables in the direction that reduces the loss.
  * loss: The function passed to minimize, which calculates the loss value for the current state of the model.

This loop continues until the specified number of epochs is reached, iteratively improving the model by minimizing the loss function at each step.

In [None]:
no_of_epochs = 1000

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])

AttributeError: 'SGD' object has no attribute 'minimize'

The error message 'SGD' object has no attribute 'minimize' indicates that the SGD optimizer you're using does not have a method named minimize. In TensorFlow, the method for minimizing a loss function has been updated in recent versions. Here’s how to fix the issue.

Probably I am using TensorFlow 2.x. In this case, you typically use tf.GradientTape to compute gradients and then apply updates. The script is presented below, and here is the explanation:

Explanation of the Modified Code:

* tf.GradientTape():
  * This context manager records the operations for automatic differentiation. When you calculate the loss, it tracks the gradients of the variables involved in the computation.

* tape.gradient(loss_value, [W, B]):
  * This computes the gradients of the loss_value with respect to the variables [W, B].

* optimizer.apply_gradients(zip(gradients, [W, B])):
  * This applies the computed gradients to update the weights and biases. The zip function pairs each gradient with its corresponding variable.

In [None]:
no_of_epochs = 1000

for n in range(no_of_epochs):
    with tf.GradientTape() as tape:
        logits = perceptron(X)  # Compute the logits
        loss_value = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits))  # Calculate loss

    # Calculate gradients
    gradients = tape.gradient(loss_value, [W, B])

    # Apply gradients using the optimizer
    optimizer.apply_gradients(zip(gradients, [W, B]))

## 10. Print the weights
<img alt="AI Generated cartoon of happy people eating healthy food." src="images/happy_people.jpg" align="right" width="300">

Notice that the model has learned that the general happiness of a participant is the best predictor of whether or not they are following the DARN Plan! Of the weights, the 3rd one has the largest value.

Given that the input from each feature will be a 0 or a 1, multiplying by a larger weight will increase the contribution of that feature in the summation of all input-by-weight products ($x_i * w_i$) in determining the output of the neuron.

The perceptron has learned how to take the three input variables and weigh them to predict the output.

**Note:** The image was generated with AI's assistance.

```python
print(W)
```

In [30]:
print(W)


<tf.Variable 'Variable:0' shape=(3, 1) dtype=float32, numpy=
array([[0.7009806],
       [0.4296667],
       [1.6375327]], dtype=float32)>


## 11. Print the bias

```python
print(B)
```

In [31]:
print(B)


<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[-0.8423005]], dtype=float32)>


## 12. Test the perceptron

The numbers in the output tensor reflect the perceptron's predictions for each input case. These are not probabilities but the **model's estimate of the output value**. We could set a threshold value and conclude the participant is following the DARN Plan when the value exceeds some number.

```python
print(perceptron(X))
```

In [33]:
print(perceptron(X))

tf.Tensor(
[[ 0.958399  ]
 [ 0.8410935 ]
 [ 0.90446156]
 [ 0.6613629 ]
 [ 0.2806125 ]
 [-0.3907067 ]
 [-0.14038654]
 [-0.6870257 ]], shape=(8, 1), dtype=float32)


### Print things more clearly

Let's bring the `X`, `y` and predictions together to make it easier to read. Remember that `Yes=1` and `No=0` in the table.

In [34]:
X_df = pd.DataFrame(X.numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

Unnamed: 0,Photos of 3 meals submitted?,Satisfied with food choices?,Generally happy?,Following the DARN Plan?,Predictions
0,1.0,1.0,1.0,1.0,0.958399
1,0.0,1.0,1.0,1.0,0.841093
2,1.0,0.0,1.0,1.0,0.904462
3,0.0,0.0,1.0,1.0,0.661363
4,1.0,1.0,0.0,1.0,0.280612
5,0.0,1.0,0.0,0.0,-0.390707
6,1.0,0.0,0.0,0.0,-0.140387
7,0.0,0.0,0.0,0.0,-0.687026


## 13. Let's see how different choices would change the results

### Did we run enough epochs?

You may want to increase the number of epochs used.

### Change the outcomes

Usually, we don't change the data we are working with, but in this example, we do this so that you can see the link between the input data and the weights learned. Let's change the participant outcomes and see what happens to the learned weights and predictions.

#### Change 1: participants are more likely to follow the DARN Plan when they like the food choices:

`y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32)`

#### Change 2: participants are more likely to follow the DARN Plan when they regularly submit three photos a day:

`y = tf.Variable([1, 0, 1, 0, 1, 0, 1, 0], dtype = tf.float32)`

Feel free to play with other parts of the model; everything but the X inputs is replicated below to put it all in one place for easy reference. Comments point out hyperparameters that you might want to change.

In [35]:
## From step 3
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8 - one for each case in the table
y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32)  # Change 1 has been made, you'll need to make change 2
y = tf.reshape(y, [8,1])  # convert to 4x1

## From step 4
num_features = X.shape[1]
output_size = 1

## From step 5
W = tf.Variable(tf.zeros([num_features, output_size]), dtype = tf.float32)

## From step 6
B = tf.Variable(tf.zeros([output_size, 1]), dtype = tf.float32)

## From step 7
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)
    output = tf.tanh(z)                  # Activation function is a good hyperparameter to change
    return output

## From step 8
learning_rate = 0.01  # Learning rate is a good hyperparameter to change
optimizer = tf.optimizers.SGD(learning_rate)

## From step 9
no_of_epochs = 1000  # Number of epochs is a good hyperparameter to change

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])

## From steps 10 on, printing the output
print(f'Weights: {W}')
print(f'Bias: {B}')

X_df = pd.DataFrame(X.numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

AttributeError: 'SGD' object has no attribute 'minimize'

## Before continuing
###  <img src='images/alert_icon.svg' alt="Alert icon" width=40 align=center> Alert!
> Before continuing to another notebook within the same Jupyter session,
> use the **"Running Terminals and Kernels" tab** (below the File Browser tab) to **shut down this kernel**.
> This will free up this notebook's GPU memory, making it available for
> your next notebook.
>
> Every time you run multiple notebooks within a Jupyter session with
> a GPU, this should be done.

----
## Push changes to GitHub <img src="images/push_to_github.png" alt="Push to GitHub icon" align="right" width=150>

 Remember to **add**, **commit**, and **push** the changes you have made to this notebook to GitHub to keep your repository in sync.

In Jupyter, those are done in the git tab on the left. In Google Colab, use File > Save a copy in GitHub.
