Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWATS optimizer #42

Merged
merged 7 commits into from Dec 18, 2018

Conversation

@zoq
Copy link
Member

commented Oct 28, 2018

Implementation of "Improving Generalization Performance by Switching from Adam to SGD" by Nitish Shirish Keskar and Richard Socher.

zoq added some commits Oct 28, 2018

@zoq zoq referenced this pull request Oct 28, 2018

Closed

Swats optimizer #1464

@rcurtin
Copy link
Member

left a comment

Looks good to me. 👍 It was an interesting paper to read, although I actually found the results to be a little unconvincing (that doesn't mean that we should merge this or anything). What I am most interested in is an optimizer that can reach a good optimum with fewer epochs of training, but in each case here the best results only come after the step size is significantly reduced (i.e. at epoch 150 in Figure 6). I have to wonder, would it be possible with an improved optimizer like this to reduce the step size earlier (i.e., perhaps at epoch 100 instead of epoch 150) and still achieve the same generalization results? I suppose an experiment like that might be a little hard to put in a paper, but still, to me, that would be the most interesting outcome of this line of work.

Also, 3 weeks of computation on 16 GPUs?? I can only imagine how expensive this paper was to produce (in terms of power cost)...

I do think we should add some documentation to optimizers.md and a link to this optimizer in the differentiable separable function documentation in function_types.md.

Show resolved Hide resolved include/ensmallen_bits/swats/swats_update.hpp Outdated
Show resolved Hide resolved include/ensmallen_bits/swats/swats_update.hpp
Show resolved Hide resolved tests/swats_test.cpp Outdated
@zoq

This comment has been minimized.

Copy link
Member Author

commented Dec 5, 2018

I have to wonder, would it be possible with an improved optimizer like this to reduce the step size earlier (i.e., perhaps at epoch 100 instead of epoch 150) and still achieve the same generalization results? I suppose an experiment like that might be a little hard to put in a paper, but still, to me, that would be the most interesting outcome of this line of work.

Agreed, the basic idea is really simple and I think it would be quite interesting to test some of the other optimizers and see if the parameters could be reduced.

zoq added some commits Dec 6, 2018

@zoq

This comment has been minimized.

Copy link
Member Author

commented Dec 7, 2018

This should be ready as well.

@rcurtin
Copy link
Member

left a comment

Looks good to me; merge whenever you're ready. 👍 If you can add documentation to optimizers.md and update HISTORY.md before merge, that would be great, otherwise we can do it before the next release. :)

* such as NCA, NumFunctions() should return the number of points in the
* dataset, and Evaluate(coordinates, 0) will evaluate the objective function on
* the first point in the dataset (presumably, the dataset is held internally in
* the DecomposableFunctionType).

This comment has been minimized.

Copy link
@rcurtin

rcurtin Dec 10, 2018

Member

We can replace this with something like

 * SWATS can optimize differentiable separable functions.  For more details, see
 * the documentation on function types include with this distribution or on the
 * ensmallen website.

zoq added some commits Dec 14, 2018

@zoq zoq merged commit eca1ec8 into mlpack:master Dec 18, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.