# Neural Network Architecture and Hyperparameters

Hyperparameters are parameters, often chosen by the user, that control model training. The type of activation function, the number of layers in the model, and the learning rate are all hyperparameters of neural network training. Together, we will discover the most critical hyperparameters of a neural network and how to modify them


In [1]:
import torch.nn as nn
import torch
import numpy as np

## Discovering activation functions between layers


### Implementing ReLU

The rectified linear unit (or ReLU) function is one of the most common activation functions in deep learning.

It overcomes the training problems linked with the sigmoid function you learned, such as the **vanishing gradients problem**.

In this exercise, you'll begin with a ReLU implementation in PyTorch. Next, you'll calculate the gradients of the function.

The `nn` module has already been imported for you.


Instructions:

- Create a ReLU function in PyTorch.
- Calculate the gradient of the ReLU function for x using the relu_pytorch() function you defined, then running a backward pass
- Find the gradient at x.


In [2]:
# Create a ReLU function with PyTorch
relu_pytorch = nn.ReLU()

# Apply your ReLU function on x, and calculate gradients
x = torch.tensor(-1.0, requires_grad=True)
y = relu_pytorch(x)
print(y)
y.backward()

# Print the gradient of the ReLU function for x
gradient = x.grad
print(gradient)

tensor(0., grad_fn=<ReluBackward0>)
tensor(0.)


### Implementing leaky ReLU

You've learned that ReLU is one of the most used activation functions in deep learning. You will find it in modern architecture. However, it does have the inconvenience of outputting null values for negative inputs and therefore, having null gradients. Once an element of the input is negative, it will be set to zero for the rest of the training. Leaky ReLU overcomes this challenge by using a multiplying factor for negative inputs.

In this exercise, you will implement the leaky ReLU function in NumPy and PyTorch and practice using it.


Instructions:

- Create a leaky ReLU function in PyTorch with a negative slope of 0.05.
- Call the function on the tensor x, which has already been defined for you.


In [3]:
# Create a leaky relu function in PyTorch
leaky_relu_pytorch = nn.LeakyReLU(negative_slope=0.05)

x = torch.tensor(-2.0)
# Call the above function on the tensor x
output = leaky_relu_pytorch(x)
print(output)

x = torch.tensor(-3.0)
output = leaky_relu_pytorch(x)
print(output)

tensor(-0.1000)
tensor(-0.1500)


A good rule of thumb is to use ReLU as the default activation function in your models (except for the last layer).

## A deeper dive into neural network architecture

### Counting the number of parameters

Deep learning models are famous for having a lot of parameters. Recent language models have billions of parameters. With more parameters comes more computational complexity and longer training times, and a deep learning practitioner must know how many parameters their model has.

In this exercise, you will calculate the number of parameters in your model, first using PyTorch then manually.

Instructions:
- Iterate through the model's parameters to update the total variable with the total number of parameters in the model.


In [4]:
model = nn.Sequential(nn.Linear(16, 4), 
                      nn.Linear(4, 2), 
                      nn.Linear(2, 1))

total = 0

# Calculate the number of parameters in the model
for parameter in model.parameters():
    total += parameter.numel()

print(total)

81


### Manipulating the capacity of a network

In this exercise, you will practice creating neural networks with different capacities. The capacity of a network reflects the number of parameters in said network. To help you, a `calculate_capacity()` function has been implemented, as follows:

In [5]:
def calculate_capacity(model):
    total = 0
    for p in model.parameters():
        total += p.numel()
    return total

This function returns the number of parameters in the your model.

Instructions:
- Create a neural network with exactly three linear layers and less than 120 parameters, which takes `n_features` as inputs and outputs `n_classes`.
- Create a neural network with exactly four linear layers and more than 120 parameters, which takes `n_features` as inputs and outputs `n_classes`.

In [6]:
n_features = 8
n_classes = 2

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Create a neural network with less than 120 parameters
model = nn.Sequential(
    nn.Linear(n_features, 4), nn.Linear(4, 2), nn.Linear(2, n_classes)
)
output = model(input_tensor)

print(calculate_capacity(model))

52


In [7]:
n_features = 8
n_classes = 2

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Create a neural network with more than 120 parameters
model = nn.Sequential(
    nn.Linear(n_features, 4), nn.Linear(4, 8), nn.Linear(8, 4), nn.Linear(4, n_classes)
)

output = model(input_tensor)

print(calculate_capacity(model))

122


### Experimenting with learning rate

### Experimenting with momentum

## Layer initialization and transfer learning


### Fine-tuning process


### Freeze layers of a model