An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST. This is an approximate implementation because we do not have a multi-GPU setup and hence use sequential gradients of each step to calculate
To calculate the simple noise scale
python mnist_train.py --noise-scale --batch-size 128 --epochs 1 --lr 0.01
This gives an average noise scale value of
For model training,
python mnist_train.py --batch-size 512 --epochs 25 --lr 0.01
Nice detailed reference blogpost.