Skip to content

salesforce/comparison_SGD_ADAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

This is a PyTorch implementation of Why SGD generalizes better than Adam:

@inproceedings{zhou2020generalizationdeep,
  title={Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning},
  author={Pan Zhou and Jiashi Feng and Chao Ma and Caiming Xiong and Steven Hoi and Weinan E},
  booktitle={Neural Information Processing Systems},
  year={2020}
}

Prepare

Our environment is Pytorch 1.2 and torchvision 0.4.

Generalization Experiments

Please find the test code in the "main.py" file. You can directly run it by selecting SGD or ADAM optimizer. This code mainly depends on the implementation of the paper "A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks" which analyzes the convergence behaviors of SGD on one-dimentional problem. The main differnece is that we theoretically compare the generalization performance between SGD and ADAM on the high-dimensional deep learning problems.

To obtain the convergence trajectory data of SGD and ADAM, you can directly run "main.py".

To show the convergence trajectory of SGD and ADAM in terms of noise and the training/test accuracy, you can directly run "plot_CIFAR10.py".

License

This project is under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages