This repository documents the code to reproduce the experiments reported in the paper:
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
In this work, we introduce a novel optimization algorithm called KATE, a scale invariant adaptation of AdaGrad. Here we provide a screenshot of KATE's pseudocode from the paper.
In this repository we compare the performance of KATE with well-known algorithms like AdaGrad anbd ADAM on logistic regression, image classification and text classification problems. If you use this code for your research, please cite the paper as follow
@article{choudhury2024remove,
title={Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad},
author={Choudhury, Sayantan and Tupitsa, Nazarii and Loizou, Nicolas and Horvath, Samuel and Takac, Martin and Gorbunov, Eduard},
journal={arXiv preprint arXiv:2403.02648},
year={2024}
}
The anaconda environment can be easily created by the following command:
conda env create -f environment.yml
In Figure 1 of our paper, we compare the performance of KATE on scaled and un-scaled data and empirically show the scale-invariance property. Please run the code in KATEscaleinvariance.py to reproduce the plots of Figure 1.
In Figure 2 of our paper, we compare the performance of KATE with AdGrad, AdaGradNorm, SGD-Decay and SGD-constant to examine the robustness of KATE. Please run the code in RobustKATE.py to reproduce the plots of Figure 2.
In Figure 3 of our paper, we compare the performance of KATE with AdGrad, AdaGradNorm, SGD-Decay and SGD-constant on real data. Please run the code in KATEheart.py, KATEaustralian.py and KATEsplice.py to reproduce the performance of KATE on heart, australian and splice dataset, respectively.
In Figure 4 of our paper, we compare the performance of KATE with AdGrad and ADAM on two tasks.
-
Image Classification: For training ResNet18 in CIFAR10 dataset.
-
Text Classification: BERT fine-tuning on the emotions dataset from the Hugging Face Hub.
Please run the code in train.ipynb to reproduce the plots for these two tasks.