Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify training: use Adam and remove the need for manual tuning #58

Closed
3 tasks done
daniel-j-h opened this issue Jun 30, 2018 · 0 comments
Closed
3 tasks done

Comments

@daniel-j-h
Copy link
Collaborator

daniel-j-h commented Jun 30, 2018

At the moment we are using stochastic gradient descent and a multi-step weight decay policy.

In this setup the user has to set

  • the initial sgd learning rate
  • the sgd momentum to use
  • the weight decay milestones
  • the weight decay factor

optimizer = SGD(net.parameters(), lr=model["opt"]["lr"], momentum=model["opt"]["momentum"])
scheduler = MultiStepLR(optimizer, milestones=model["opt"]["milestones"], gamma=model["opt"]["gamma"])

And while this allows for great flexibility and control over details it might be to complicated for our users. We should look into replacing our current setup e.g. with the Adam optimizer only setting the initial learning rate and the weight decay.

We can then set these two values to reasonable defaults and users can get started without thinking too much about parameters and without having to run multiple experiments just to get basic parameters figured out.

Tasks

  • Implement Adam optimizer with learning rate and weight decay
  • Benchmark and check results; if it looks reasonable go for it
  • Remove sgd parameters from config; use learning rate and weight decay only
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant