Adding Stochastic Weight Averaging (SWA) #159

kiudee · 2018-03-21T09:52:39Z

Since SWA was successful for Leela Zero in producing stronger network weights (see leela-zero/leela-zero#814, leela-zero/leela-zero#1030), I want to record this as a possible improvement here.

What is Stochastic Weight Averaging?

Izmailov et al. (2018) discovered that SGD explores regions of the weight space where networks with good performance lie, but does not reach the central point. By tracking a running average of the mean weights, they were able to find better weights than those found by SGD alone.
They also demonstrate that SWA leads to solutions in wider optima, which is conjectured to be important for generalization.

Here is a comparison of SWA and SGD with a ResNet-110 on CIFAR-100:

Implementation

The implementation is trivially easy, because the only thing we need to do is to update a running average of the weights in addition to the current weight vector.
Since we use batch normalization, we also need to calculate the running means and standard deviations for the resulting network.

The algorithm can be seen here:

The authors recommend starting with a pretrained model, before starting to average the weights. This we get for free, since we always initialize with the last best network.

Error323 · 2018-03-21T13:40:33Z

This makes a lot of sense to me. Nice! Let's do it!

Error323 · 2018-03-21T13:48:25Z

Let's do it once our trainingwindow contains solely V2 chunks. Then I can train the same net in parallel.

remdu · 2018-03-22T20:43:34Z

The paper says you may need to run the resulting net on the training set in order to get batchnorm weights too. In LZ it seems like it's not strictly necessary though.

Error323 added the enhancement label Mar 21, 2018

sethtroisi mentioned this issue Jun 20, 2018

SWA attempt tensorflow/minigo#280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Stochastic Weight Averaging (SWA) #159

Adding Stochastic Weight Averaging (SWA) #159

kiudee commented Mar 21, 2018

Error323 commented Mar 21, 2018

Error323 commented Mar 21, 2018

remdu commented Mar 22, 2018

Adding Stochastic Weight Averaging (SWA) #159

Adding Stochastic Weight Averaging (SWA) #159

Comments

kiudee commented Mar 21, 2018

What is Stochastic Weight Averaging?

Implementation

Error323 commented Mar 21, 2018

Error323 commented Mar 21, 2018

remdu commented Mar 22, 2018