Is it possible to have schedules for hyper-parameters such as momentum?
For example, onecycle momentum from this paper: https://arxiv.org/abs/1708.07120. There are also benefits to using schedules for parameters like epsilon (as a trust-region / damping parameter), highlighted by this blog post: http://zna.do/epsilon.
Is it possible to have schedules for hyper-parameters such as momentum?
For example, onecycle momentum from this paper: https://arxiv.org/abs/1708.07120. There are also benefits to using schedules for parameters like epsilon (as a trust-region / damping parameter), highlighted by this blog post: http://zna.do/epsilon.