The PyTorch implementation of DSTAdam algorithm in:'A decreasing scaling transition scheme from Adam to SGD'
https://arxiv.org/abs/2106.06749.
The implementation is highly based on projects AdaBound , Adam , pytorch-cifar, thanks pretty work.
The test environment we passed is: PyTorch=1.7.0, Python=3.7.10, Linux/Centos8.3.
Please directly download the dstadam folder and put it in your project, then
from dstadam import DSTAdam
...
optimizer = DSTAdam(model.parameters(), iters=required)
...
#iters(int, required): iterations
# iters = math.ceil(train_size / batch_size) * epochs
We give the codes of training CIFAR10/100 using DSTAdam in this paper.
For CIFAR-10 without weight_decay
python main.py --cifar="cifar10" --optimizer="DSTAdam" --model="resnet18" --lr=0.001 --coeff=1e-8 --up_lr=5 --low_lr=0.005 --weight_decay=0 --step_lr=False
For CIFAR-100 without weight_decay
python main.py --cifar="cifar10" --optimizer="DSTAdam" --model="resnet18" --lr=0.001 --coeff=1e-8 --up_lr=5 --low_lr=0.005 --weight_decay=0 --step_lr=False
For CIFAR-10 with weight_decay
python main.py --cifar="cifar10" --optimizer="DSTAdam" --model="resnet18" --lr=0.001 --coeff=1e-8 --up_lr=1 --low_lr=0.005 --weight_decay=5e-4 --step_lr=True
For CIFAR-100 with weight_decay
python main.py --cifar="cifar100" --optimizer="DSTAdam" --model="resnet18" --lr=0.001 --coeff=1e-8 --up_lr=1 --low_lr=0.005 --weight_decay=5e-4 --step_lr=True
The experiment of Rosenbrock is from projects: https://github.com/jettify/pytorch-optimizer
The experiment of Penn Treebank dataset is from projects: https://github.com/juntang-zhuang/Adabelief-Optimizer/tree/update_0.2.0/PyTorch_Experiments/LSTM
Thanks again these pretty work.