# Freeze layers of a model
You are about to fine-tune a model on a new task after loading pre-trained weights. The model contains three linear layers. However, because your dataset is small, you only want to train the last linear layer of this model and freeze the first two linear layers.

You will be using the named_parameters method of the model to list the parameters of the model. Each parameter is described by a name. This name is a string with the following naming convention: x.name where x is the index of the layer.

Remember that a linear layer has two parameters: the weight and the bias

* Use an if statement to determine if the parameter should be frozen or not based on its name.
* Freeze the parameters of the first two layers of this model.

In [1]:
import torch
import torch.nn as nn

In [6]:
model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 20),
    nn.ReLU(),
    nn.Linear(20, 5)
)

print(model)

Sequential(
  (0): Linear(in_features=10, out_features=20, bias=True)
  (1): ReLU()
  (2): Linear(in_features=20, out_features=20, bias=True)
  (3): ReLU()
  (4): Linear(in_features=20, out_features=5, bias=True)
)


In [11]:
for name, param in model.named_parameters():    
  
    # Check if the parameters belong to the first layer
    if name == '0.weight' or name == '0.bias':
      
        # Freeze the parameters
        param.requires_grad = False
  
    # Check if the parameters belong to the second layer
    if name == '2.weight' or name == '2.bias':
      
        # Freeze the parameters
        param.requires_grad = False
        

In [12]:
# Verify that the parameters of the first two layers are frozen
for name, param in model.named_parameters():
    print(f"{name}: requires_grad={param.requires_grad}")

0.weight: requires_grad=False
0.bias: requires_grad=False
2.weight: requires_grad=False
2.bias: requires_grad=False
4.weight: requires_grad=True
4.bias: requires_grad=True


Choosing which layer to freeze is an empirical process but a good rule of thumb is to start with the first layers and go deeper.

**Layer initialization**

The initialization of the weights of a neural network has been the focus of researchers for many years. When training a network, the method used to initialize the weights has a direct impact on the final performance of the network.

As a machine learning practitioner, you should be able to experiment with different initialization strategies. In this exercise, you are creating a small neural network made of two layers and you are deciding to initialize each layer's weights with the uniform method.

* For each layer (layer0 and layer1), use the uniform initialization method to initialize the weights.

In [16]:
layer0 = nn.Linear(4, 8)
layer1 = nn.Linear(8, 16)

# Use uniform initialization for layer0 and layer1 weights
nn.init.uniform_(layer0.weight)
nn.init.uniform_(layer1.weight)

model = nn.Sequential(layer0, layer1)

# Print model layers' weights
print("Layer 0 weights:", model[0].weight)
print("Layer 1 weights:", model[1].weight)

Layer 0 weights: Parameter containing:
tensor([[0.6928, 0.7653, 0.7003, 0.1282],
        [0.5271, 0.3602, 0.0314, 0.2038],
        [0.6965, 0.0276, 0.9807, 0.2103],
        [0.4551, 0.4438, 0.1325, 0.0393],
        [0.7576, 0.5620, 0.4499, 0.6103],
        [0.8542, 0.3954, 0.0239, 0.0258],
        [0.6765, 0.3739, 0.5181, 0.8950],
        [0.2078, 0.7522, 0.4847, 0.9370]], requires_grad=True)
Layer 1 weights: Parameter containing:
tensor([[0.0470, 0.8998, 0.2830, 0.8920, 0.9998, 0.9316, 0.3186, 0.6546],
        [0.4531, 0.8117, 0.2813, 0.3909, 0.5903, 0.8229, 0.4271, 0.2445],
        [0.7003, 0.8792, 0.9795, 0.6034, 0.4508, 0.6155, 0.8383, 0.4566],
        [0.4285, 0.1114, 0.5640, 0.2554, 0.4231, 0.2409, 0.5597, 0.4028],
        [0.6415, 0.7968, 0.7504, 0.6868, 0.6673, 0.6035, 0.8084, 0.3392],
        [0.6228, 0.6406, 0.1622, 0.0280, 0.6484, 0.2807, 0.9071, 0.9459],
        [0.0753, 0.3577, 0.0779, 0.9153, 0.7900, 0.3867, 0.2820, 0.7087],
        [0.0323, 0.7941, 0.5514, 0.2151, 0.3968

 The uniform initialization is one of the many different initialization strategies but they all tend to initialize weights with small values.