### Set separate bias and weight learning parameters in PyTorch

This is to match the settings of SegNetBasic `caffe` implementation which sets:

```
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
```  

for each convolutional layer. This means the weights learn with a learning rate multiplier of 1, and are influenced by weight decay, while biases learn with a learning rate multiplier of 2 and are not influenced by weight decay.  

In [1]:
import torch
import torchvision

from segnet import SegNetBasic

Import and initialise SegNetBasic with 4 input channels (for vegetation segmentation, RGB+NIR), for semantic segmentation into three classes.

In [2]:
net = SegNetBasic(in_channels = 4, num_classes = 3)

Select net weights and net biases separately. These replace the single `net.parameters()`.

In [20]:
from operator import itemgetter

net_weights = list(map(itemgetter(1), filter(lambda x: 'bias' not in x[0] and 'conv_classifier' not in x[0], net.named_parameters())))
net_biases = list(map(itemgetter(1), filter(lambda x: 'bias' in x[0] and 'conv_classifier' not in x[0], net.named_parameters())))

classifier_weight = net.conv_classifier.weight
classifier_bias = net.conv_classifier.bias

<class 'list'>


Initialise SGD optimiser on `SegNetBasic`. **For training from scratch**, set the learning rate multuplier to 2 for biases, and only using weight decay for the weights. **For fine-tuning**, additionally increase the learning rates of the conv-classifier 10 times.

In [21]:
base_lr = 0.01
weight_decay = 0.0005
momentum = 0.9

# for training from scratch, initialise classifier uses the same LR as the rest of the network
# but initialising them separately allows for a singular save_model and load_model functionality... 
# (check utils/utils.py)

optimizer_FS = torch.optim.SGD([
                                {'params': net_weights, 'lr': base_lr, 'weight_decay': weight_decay },
                                {'params': net_biases, 'lr': base_lr*2 },
                                {'params': classifier_weight, 'lr': base_lr*1, 'weight_decay': weight_decay },
                                {'params': classifier_bias, 'lr': base_lr*2 },
                            ],
                            momentum = momentum, # but note the docs, might need to change value: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
                            lr = base_lr) # probably not needed

# for fine-tuning from a pre-trained model, classifier uses x10 LR of the rest of the network
optimizer_FT = torch.optim.SGD([
                                {'params': net_weights, 'lr': base_lr, 'weight_decay': weight_decay },
                                {'params': net_biases, 'lr': base_lr*2 },
                                {'params': classifier_weight, 'lr': base_lr*10, 'weight_decay': weight_decay },
                                {'params': classifier_bias, 'lr': base_lr*20 },
                            ],
                            momentum = momentum, # but note the docs, might need to change value: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
                            lr = base_lr) # probably not needed