signSGD: compressed optimisation for non-convex problems
Here I house mxnet code for the signSGD paper. Some links:
- arxiv version of the paper.
- more information about the paper on my personal website.
- my coauthors: Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar.
- Signum is implemented as an official optimiser in mxnet!
- to use Signum in this codebase, we pass in the optim 'signum' as a command line argument.
- if you do not use our suggested hyperparameters, be careful to tune them yourself.
- Signum hyperparameters are typically similar to Adam hyperparameters, not SGD!
There are four folders:
- cifar/ -- code to train resnet-20 on Cifar-10.
- gradient_expts/ -- code to compute gradient statistics as in Figure 1 and 2. Includes Welford algorithm.
- imagenet/ -- code to train resnet-50 on Imagenet. Implementation inspired by that of Wei Wu.
- toy_problem/ -- simple example where signSGD is more robust than SGD.
More info to be found within each folder.
Any questions / comments? Don't hesitate to get in touch: email@example.com.