Implementation of the paper All you need is a good init.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
#export
from exp.nb_07 import *

## Layerwise Sequential Unit variance (LSUV)

Getting the MNIST data and a CNN

In [3]:
x_train, y_train, x_valid, y_valid = get_data()

x_train, x_valid = normalize_to(x_train, x_valid)
train_ds, valid_ds = Dataset(x_train, y_train), Dataset(x_valid, y_valid)

nh, bs = 50, 512
c = y_train.max().item()+1
loss_func = F.cross_entropy

data = DataBunch(*get_dls(train_ds, valid_ds, bs), c)

In [None]:
mnist_view = view_tfm(1,28,28)
cbfs = [Recorder,
       partial(AvgStatsCallback, accuracy),
       CudaCallback,
       partial(BatchTransformXCallbackh, mnist_view)]


In [None]:
nfs = [8,16,32,64,64]

In [5]:
class ConvLayer(nn.Module):
    def __init__(self, ni, nf, ks=3, stride=2, sub=0., **kwargs):
        super().__init__()
        self.conv = nn.Conv2d(ni, nf, ks, padding=ks//2, stride=stridede, bias=True)
        self.relu = GeneralRelu(sub=sub, **kwargs)
        
    def forward(self, x): return self.relu(self.conv(x))
    
    @property
    def bias(self): return -self.relu.sub
    @bias.setter
    def bias(self, v): self.relu.sub = -v
    @property
    def weight(self): return self.conv.weight

In [None]:
learn, run = get_learn_run(nfs, data, 0.6, ConvLayer, cbs=cbfs)

Now we're going to look at the paper [All You Need is a Good Init](https://arxiv.org/pdf/1511.06422.pdf), which introduces *Layer-wise Sequential Unit-Variance* (*LSUV*). We initialize our neural net with the usual technique, then we pass a batch through the model and check the outputs of the linear and convolutional layers. We can then rescale the weights according to the actual variance we observe on the activations, and subtract the mean we observe from the initial bias. That way we will have activations that stay normalized.

We repeat this process until we are satisfied with the mean/variance we observe.

Let's start by looking at a baseline:

In [None]:
run.fit(2, learn)

train: [1.73625, tensor(0.3975, device='cuda:0')]
valid: [1.68747265625, tensor(0.5652, device='cuda:0')]
train: [0.356792578125, tensor(0.8880, device='cuda:0')]
valid: [0.13243565673828125, tensor(0.9588, device='cuda:0')]


Now we recreate our model and we'll try again with LSUV. Hopefully, we'll get better results!

In [None]:
learn,run = get_learn_run(nfs, data, 0.6, ConvLayer, cbs=cbfs)

Helper function to get one batch of a given dataloader, with the callbacks called to preprocess it.