In [1]:
import json
import holoviews as hv
%load_ext holoviews.ipython

In [11]:
# matplotlib hitting a lot of warnings with holoviews
import warnings
warnings.filterwarnings("ignore")

In this notebook I'm comparing the results of many experiments training networks on CIFAR-10, with the same setup. They were all trained for 128 epochs with a Cosine annealing schedule starting at 0.2. They all used the architecture from the [cifar10-fast][cf] repository, but substituting convolutions with a generic low-rank alternative.

[cf]: https://github.com/davidcpage/cifar10-fast

In [2]:
with open("results.json", "r") as f:
    results = json.load(f)

Check all experiments ran until the 128th epoch:

In [4]:
for r in results:
    assert r[1]['epoch'] == 128

We are looking at the difference in using a standard weight decay of `5e-4` on all parameters, and what we're going to call **appropriate weight decay** which is when we scale the weight decay by the compression factor achieved in doing this low-rank approximation. The intuition being that if you have fewer parameters you should be more concerned about regularising them to zero.

So, for both types of weight decay, we're going to look at the relationship between `test_acc` and the `rankscale`, which is an arbitrary parameter that divides the rank of the low-rank approximations. The rank of low-rank approximations used scale with `1/rankscale`.

In [16]:
hm = hv.HoloMap(kdims=['Weight Decay'])
owd, awd = [], []
for r in results:
    if r[0]['d']:
        awd.append((r[0]['rankscale'], r[1]['test acc']))
    else:
        owd.append((r[0]['rankscale'], r[1]['test acc']))
owd = sorted(owd, key=lambda x: x[0])
awd = sorted(awd, key=lambda x: x[0])
hm['Original'] = hv.Curve(owd, kdims=['Rank Scale'], vdims=['Test Accuracy (%)'])
hm['Appropriate'] = hv.Curve(awd, kdims=['Rank Scale'], vdims=['Test Accuracy (%)'])

In [19]:
%output size=250
%opts Curve [aspect=2., show_grid=True]
hm.overlay('Weight Decay')

So, it is a little better. 