diffGrad: An Optimization Method for Convolutional Neural Networks

The PyTorch implementation of diffGrad can be found in torch-optimizer and can easily be used by following.

How to use

pip install torch-optimizer

import torch_optimizer as optimizer

# model = ...
optimizer = optimizer.DiffGrad(
    model.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Issues

It is recommended to use diffGrad_v2.py which fixes an issue in diffGrad.py.

It is also recommended to refer arXiv version for the updated results.

Abstract

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take the advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of online learning framework. Rigorous analysis is made in this paper over three synthetic complex non-convex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 datasets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet) based Convolutional Neural Networks (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms the other optimizers. Also, we showed that diffGrad performs uniformly well on network using different activation functions.

Citation

If you use this code in your research, please cite as:

  @article{dubey2019diffgrad,
  title={diffGrad: An Optimization Method for Convolutional Neural Networks},
  author={Dubey, Shiv Ram and Chakraborty, Soumendu and Roy, Swalpa Kumar and Mukherjee, Snehasis and Singh, Satish Kumar and Chaudhuri, Bidyut Baran},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  volume={31},
  no={11},
  pp.={4500 - 4511},
  year={2020},
  publisher={IEEE}
  }

Acknowledgement

All experiments are perfomed using following framework: https://github.com/kuangliu/pytorch-cifar

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
README.md		README.md
diffGrad.py		diffGrad.py
diffGrad_Regression_Loss.ipynb		diffGrad_Regression_Loss.ipynb
diffGrad_analytical.ipynb		diffGrad_analytical.ipynb
diffGrad_v2.py		diffGrad_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

diffGrad.py

diffGrad.py

diffGrad_Regression_Loss.ipynb

diffGrad_Regression_Loss.ipynb

diffGrad_analytical.ipynb

diffGrad_analytical.ipynb

diffGrad_v2.py

diffGrad_v2.py

Repository files navigation

diffGrad: An Optimization Method for Convolutional Neural Networks

How to use

Issues

Abstract

Citation

Acknowledgement

License

About

Releases

Packages

Contributors 3

Languages

shivram1987/diffGrad

Folders and files

Latest commit

History

Repository files navigation

How to use

Issues

Abstract

Citation

Acknowledgement

License

About

Resources

Stars

Watchers

Forks

Languages