Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add norm constraints #8

Merged
merged 9 commits into from
Dec 1, 2014
Merged

Add norm constraints #8

merged 9 commits into from
Dec 1, 2014

Conversation

stokasto
Copy link
Contributor

This pull request implements a constraints mechanism for the solver effectively allowing us to implement e.g. norm constraints which are often used either in conjunction or instead of L2/L1 constraints on the weights.
I also added an implementation of an L2 norm constraint on the weights as this is often used in combination with dropout (e.g. see the dropout/maxout papers).

@coveralls
Copy link

Coverage Status

Coverage increased (+0.11%) when pulling 8716cf7 on stokasto:add_constraints into 77274da on pluskid:master.

# This difference is likely due to slight differences in the
# learning parameters. Also note that our hyperparameters
# are not chosen using a validation set, as one would do
# for a paper.
############################################################
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to add a comment here that the script used to convert the MNIST dataset to HDF5 format do some randomization in the order of data samples. And I didn't fix the random seed there. So other people might not get exactly the same results if they prepare their HDF5 MNIST dataset separately. A fix might be just to fix the random seed in the data conversion script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think I'll simply fix the random seed in the conversion script then as it is quite useful to be able to reproduce exact results. I'll still add a comment here though since different GPUs/cuda versions etc could potentially lead to small changes as well.

@pluskid
Copy link
Owner

pluskid commented Dec 1, 2014

@stokasto Thank you very much for this PR! This is a great add-on. I will merge this PR later when I after a closer look. But I do want to mention there is one thing that I dislike: I don't like allocating a new blob and destroying it every iteration. I will look at that after merging. Maybe we could either eliminate those temp blobs or, if that is possible, make norm constraints like data-transformers: they will have a state where they could store and re-use temp blobs.

@stokasto
Copy link
Contributor Author

stokasto commented Dec 1, 2014

@pluskid yes I am not very fond of the blob allocation in each iteration either :).
However, as I wrote in the code simply allocating a blob statically for each weight blob beforehand would double the memory footprint of the model! If we (hopefully soon) try larger models (e.g. the AlexNet, VGG net etc) this will be a big problem.
I quickly added the cublasSnorm2_v2 function and computed the norm inside the for loop for each column but that turned out to be quite slow for some reason. Do you have any other idea how to solve this better ?
EDIT: One option would of course be to write a seperate kernel for doing the normalization since in principle it does not have to use additional memory.

@stokasto
Copy link
Contributor Author

stokasto commented Dec 1, 2014

I am letting the example run again with the fixed random seed and will then adapt the description in the scrip.t

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) when pulling 78d6250 on stokasto:add_constraints into 77274da on pluskid:master.

@stokasto
Copy link
Contributor Author

stokasto commented Dec 1, 2014

OK, changed the script to reflect the new behavior, should be good to go!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) when pulling cb3ffaf on stokasto:add_constraints into 77274da on pluskid:master.

@pluskid
Copy link
Owner

pluskid commented Dec 1, 2014

@stokasto I'm merging this PR.

Could you please isolate the code you used to benchmark different ways of implementing the norm constraint in CUDA and put it to the benchmarks directory? I will try to look at this when I have time and see if using a CUDA kernel would be better. This is also related to the current implementation of L2 norm regularizer, which is not very meaningful as the whole parameter blob is treated as a vector.

pluskid added a commit that referenced this pull request Dec 1, 2014
@pluskid pluskid merged commit 74dae49 into pluskid:master Dec 1, 2014
@pluskid
Copy link
Owner

pluskid commented Dec 1, 2014

@stokasto BTW: thanks for fixing the data conversion script. The reproducibility could also act as a regression test for Mocha to make sure we did not break things when new stuff is introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants