An Empirical Model of Large-Batch Training

An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST. This is an approximate implementation because we do not have a multi-GPU setup and hence use sequential gradients of each step to calculate $B_{big}$ (refer Appendix A of the paper).

To calculate the simple noise scale $\mathcal{B}_{simple}$

python mnist_train.py --noise-scale --batch-size 128 --epochs 1 --lr 0.01

This gives an average noise scale value of $\approx$ 870 which is close to the value in the paper as well (900). Since MNIST's simple noise scale is an overestimate of the critical noise scale. $\mathcal{B}_{critical}$ (mentioned in the paper), we set the batch size to 512.

For model training,

python mnist_train.py --batch-size 512 --epochs 25 --lr 0.01

Nice detailed reference blogpost.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
mnist_train.py		mnist_train.py
noise_scale.py		noise_scale.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

mnist_train.py

mnist_train.py

noise_scale.py

noise_scale.py

utils.py

utils.py

Repository files navigation

An Empirical Model of Large-Batch Training

About

Releases

Packages

Languages

shreyansh26/An-Empirical-Model-of-Large-Batch-Training

Folders and files

Latest commit

History

Repository files navigation

An Empirical Model of Large-Batch Training

About

Resources

Stars

Watchers

Forks

Languages