Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

back-gradient optimization technique #31

Closed
dm-medvedev opened this issue May 5, 2020 · 1 comment
Closed

back-gradient optimization technique #31

dm-medvedev opened this issue May 5, 2020 · 1 comment

Comments

@dm-medvedev
Copy link

Hello!

I have a question about back-gradient optimization technique. Your paper mentioned this article, but reading the source code train_distill_image.py, I've noticed that you couldn't use SGD with momentum (because of previous learning rates influence), and so had to save neural network parameters of each forward step. So what is advantage of your scheme over usual backpropagation?

@ssnl
Copy link
Owner

ssnl commented May 22, 2020

These are two different issues though.

  1. Re momentum: There is nothing in our paper's framework that prevents using momentum. One just need to add the forward and backward logic. Momentum is computed independent of learning rate.
  2. Re backgradient vs backprop: The literature has been using back-gradient to refer to backpropagation though optimization steps, often done with jvps. So there is not a difference between back-gradient and backprop though optimizations steps. The article you cited is just an instance of this technique, with improvements to make it more numerically stable by exploiting momentum. Of course you can also use autograd/autodiff systems to do so. It's just the common jvp technique makes it more efficient.

@ssnl ssnl closed this as completed May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants