-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add norm constraints #8
Conversation
this includes a definition for an l2 constrained as used in many recent papers
also add a readme for the MNIST examples
# This difference is likely due to slight differences in the | ||
# learning parameters. Also note that our hyperparameters | ||
# are not chosen using a validation set, as one would do | ||
# for a paper. | ||
############################################################ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to add a comment here that the script used to convert the MNIST dataset to HDF5 format do some randomization in the order of data samples. And I didn't fix the random seed there. So other people might not get exactly the same results if they prepare their HDF5 MNIST dataset separately. A fix might be just to fix the random seed in the data conversion script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think I'll simply fix the random seed in the conversion script then as it is quite useful to be able to reproduce exact results. I'll still add a comment here though since different GPUs/cuda versions etc could potentially lead to small changes as well.
@stokasto Thank you very much for this PR! This is a great add-on. I will merge this PR later when I after a closer look. But I do want to mention there is one thing that I dislike: I don't like allocating a new blob and destroying it every iteration. I will look at that after merging. Maybe we could either eliminate those temp blobs or, if that is possible, make norm constraints like data-transformers: they will have a state where they could store and re-use temp blobs. |
@pluskid yes I am not very fond of the blob allocation in each iteration either :). |
I am letting the example run again with the fixed random seed and will then adapt the description in the scrip.t |
OK, changed the script to reflect the new behavior, should be good to go! |
@stokasto I'm merging this PR. Could you please isolate the code you used to benchmark different ways of implementing the norm constraint in CUDA and put it to the |
@stokasto BTW: thanks for fixing the data conversion script. The reproducibility could also act as a regression test for Mocha to make sure we did not break things when new stuff is introduced. |
This pull request implements a constraints mechanism for the solver effectively allowing us to implement e.g. norm constraints which are often used either in conjunction or instead of L2/L1 constraints on the weights.
I also added an implementation of an L2 norm constraint on the weights as this is often used in combination with dropout (e.g. see the dropout/maxout papers).