# **Counting the number of parameters**

Deep learning models are famous for having a lot of parameters. Recent language models have billions of parameters. With more parameters comes more computational complexity and longer training times, and a deep learning practitioner must know how many parameters their model has.

In this exercise, you will calculate the number of parameters in your model, first using PyTorch then manually.

Iterate through the model's parameters to update the total variable with the total number of parameters in the model.

In [None]:
import numpy as np
import torch
import torch.nn as nn

In [None]:
model = nn.Sequential(nn.Linear(16, 4),
                      nn.Linear(4, 2),
                      nn.Linear(2, 1))

total = 0

# Calculate the number of parameters in the model
for parameter in model.parameters():

    total += parameter.numel()
    
print(total)



**Manipulating the capacity of a network**

In this exercise, you will practice creating neural networks with different capacities. The capacity of a network reflects the number of parameters in said network. To help you, a calculate_capacity() function has been implemented, as follows:

* Create a neural network with exactly three linear layers and less than 120 parameters, which takes n_features as inputs and outputs n_classes.

* Create a neural network with exactly four linear layers and more than 120 parameters, which takes n_features as inputs and outputs n_classes.

In [None]:
def calculate_capacity(model):
    total = 0
    for p in model.parameters():
        total += p.numel()
        return total

In [None]:
n_features = 8
n_classes = 2

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Create a neural network with less than 120 parameters
model = nn.Sequential(nn.Linear(n_features, 8),
                      nn.Linear(8, 4), nn.Linear(4, n_classes))
output = model(input_tensor)

print(calculate_capacity(model))

In [None]:
n_features = 8
n_classes = 2

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Create a neural network with more than 120 parameters
model = nn.Sequential(nn.Linear(n_features, 16),
nn.Linear(16,8), nn.Linear(8, 4), nn.Linear(4,n_classes))

output = model(input_tensor)

print(calculate_capacity(model))

Changing the number of layers and the number of neurons per layer is a great way to quickly iterate on your model and experiment.

# Experimenting with learning rate

In this exercise, your goal is to find the optimal learning rate such that the optimizer can find the minimum of the non-convex function 
 in ten steps.

You will experiment with three different learning rate values. For this problem, try learning rate values between 0.001 to 0.1.

You are provided with the optimize_and_plot() function that takes the learning rate for the first argument. This function will run 10 steps of the SGD optimizer and display the results.

* Try a small learning rate value such that the optimizer isn't able to get past the first minimum on the right.
* Try a large learning rate value such that the optimizer skips past the global minimum at -2.
* Based on the previous results, try a better learning rate value.

In [None]:
# Try a first learning rate value
lr0 = 0.001
optimize_and_plot(lr=lr0)


![image.png](attachment:71f65ca3-7f2d-4502-9e8e-88c992b02d58.png)

In [None]:
# Try a second learning rate value
lr1 = 0.1
optimize_and_plot(lr=lr1)

![image.png](attachment:e30b5a96-8e61-4d58-a039-4a9e91605bbb.png)

In [None]:
# Try a third learning rate value
lr2 = 0.085
optimize_and_plot(lr=lr2)

![image.png](attachment:1947e15c-1da2-4f0b-8967-19c6768d5823.png)

A learning rate around 0.09 gets you closest to the global minimum.

**Experimenting with momentum**

In this exercise, your goal is to find the optimal momentum such that the optimizer can find the minimum of the following non-convex function 
 in 20 steps. You will experiment with two different momentum values. For this problem, the learning rate is fixed at 0.01.

You are provided with the optimize_and_plot() function that takes the learning rate for the first argument. This function will run 20 steps of the SGD optimizer and display the results

* Try a first value for the momentum such that the optimizer gets stuck in the first minimum.

In [None]:
# Try a first value for momentum
mom0 = 0.2
optimize_and_plot(momentum=mom0)

![image.png](attachment:17e79bc6-27f8-40a7-bca2-1965422f5cc4.png)

In [None]:
# Try a second value for momentum
mom1 = 0.95
optimize_and_plot(momentum=mom1)

![image.png](attachment:7ce7ef58-e50a-4677-becd-285b4a534901.png)

Momentum and learning rate are critical to the training of your neural network. A good rule of thumb is to start with a learning rate of 0.001 and a momentum of 0.95.