Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

signSGD: compressed optimisation for non-convex problems

Here I house mxnet code for the original signSGD paper (ICML-18). I've put the code here to facilitate reproducing the results in the paper, and this code isn't intended for development purposes. In particular, this implementation does not gain any speedups from compression. Some links:

[Update Jan 2021] As noted in this issue, this codebase used an implementation of the sign function that maps sign(0) --> 0. A test in this notebook suggests there may be little difference to an implementation that maps sign(0) --> ±1 at random. In the codebase for the ICLR 2019 paper, we used an implementation that maps sign(0) --> +1 deterministically.


General instructions:

  • Signum is implemented as an official optimiser in mxnet, so to use Signum in this codebase, we pass in the string 'signum' as a command line argument.
  • if you do not use our suggested hyperparameters, be careful to tune them yourself.
  • Signum hyperparameters are typically similar to Adam hyperparameters, not SGD!

There are four folders:

  1. cifar/ -- code to train resnet-20 on Cifar-10.
  2. gradient_expts/ -- code to compute gradient statistics as in Figure 1 and 2. Includes Welford algorithm.
  3. imagenet/ -- code to train resnet-50 on Imagenet. Implementation inspired by that of Wei Wu.
  4. toy_problem/ -- simple example where signSGD is more robust than SGD.

More info to be found within each folder.


Any questions / comments? Don't hesitate to get in touch: bernstein@caltech.edu.

About

Code for the signSGD paper

Resources

Releases

No releases published

Packages

No packages published