Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coffee break to allow for accumulation of statistics and reproduce ~100 errors on MNIST #6

Merged
merged 4 commits into from
Nov 29, 2014

Conversation

stokasto
Copy link
Contributor

This is a slightl more involved (bu still small) pull request that slightly changes the way the coffee breaks work:
-> each break now returns a dict in which statistics are recorded.
These dicts can then be recoreded by the AccumulateStatistics break and saved to a hdf5 file.
I have also added a plot_statistics.jl script to tools with which such a statistics file can be plotted. This is quite useful for monitoring training and resembles the way pylearn2 handles plotting which I quite like :).
I have also added a first example for training a fully connected network on MNIST with dropout that gives ~103 errors on MNIST and thus reproduces the results from the dropout paper.
Let me know if you prefer a different way of handling the stats logging or feel free to pull as is.

This commit changes the way coffee breaks work -> each break now returns a dict in which statistics are recorded.
These dicts can then be recoreded by the AccumulateStatistics break and saved to a hdf5 file.
This is quite useful for monitoring training.
it can be used to plot data that was written by the AccumulateStatistics break
and even works during training.
This is analogous to the plot_monitor.py script of pylearn2.
…ishs dropout paper.

This is a first quick sketch that gives the desired result of ~100 errors on MNIST
using a fully connected network of 2 layers a 1200 units with dropout.
NOTE: this is still missing the constraint on the L2 norm of the weights, will add them in a later comit.
@stokasto
Copy link
Contributor Author

Oh and I have already added norm constraints (as mentioned in the commit for the dropout example) but still need to add tests for them.
I'll open a seperate pull request once these are ready.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 09ae436 on stokasto:accumulate_statistics into 3392bb6 on pluskid:master.

@pluskid
Copy link
Owner

pluskid commented Nov 28, 2014

@stokasto Nice job! Thanks! This is very cool! I have been thinking about allowing coffee-breaks to return the statistics and save somewhere, but haven't arrive at a conclusion of what is the best way to do it yet. I will pull down your changes and try your example locally this afternoon, and merge this PR.

Meanwhile, could you add a comment header to the new MNIST example describing it? It would be even better if you could provide the reference to the paper whose results we are reproducing here. I think I might try to convert your MNIST example to a IJulia notebook example at a later stage since now we have the ability to show some cool plot of the training progress. :)

Again, good job! Thank you very much for the contributions!

@stokasto
Copy link
Contributor Author

Sure I'll do that, so maybe wait with pulling until then.
Oh btw there is one more change in here that I did not mention. I split up the update_solver function in two since the way it was before the two breaks (Morning and Evening) would actually never happen in the same timestep, which is counter intuitive and does not play nice with logging :).

@pluskid
Copy link
Owner

pluskid commented Nov 29, 2014

@stokasto Yes, I noticed that. I was having some struggle with that. Mainly how to allow the snapshot-coffee-break to load at iteration 0 yet do not save at iteration 0. I ended up making the morning coffee break one "day" before the evening coffee break. But I agree with you that making them the same day is more consistent. I will do some refactoring to deal with the snapshot coffee break later.

BTW: your new MNIST example runs smoothly for me. I'm ready to merge this PR after you add some short comment to the example file. Thanks again!

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling b7a6053 on stokasto:accumulate_statistics into 3392bb6 on pluskid:master.

@stokasto
Copy link
Contributor Author

I added a short description to the example file.
Also changed the plot script so that it can handle multiple statistics files at the same time which is what I usually wnat to do in practice :).

@stokasto
Copy link
Contributor Author

Oh and btw, I actually don't think it is too bad to save a model with iteration count 0

pluskid added a commit that referenced this pull request Nov 29, 2014
Add coffee break to allow for accumulation of statistics and reproduce ~100 errors on MNIST
@pluskid pluskid merged commit 77274da into pluskid:master Nov 29, 2014
@pluskid
Copy link
Owner

pluskid commented Nov 29, 2014

@stokasto Awesome!

@pluskid
Copy link
Owner

pluskid commented Nov 29, 2014

@stokasto I am thinking of doing some refactoring for your statistics accumulation code. Currently statistics are put explicitly in a container AccumulateStatistics. I'm thinking of making the interface for coffee break capable of storing statistics for any coffee breaks. So each coffee break will operate independently as before (e.g. obj-val could have a higher frequency and validation set accuracy could have a lower frequency). But this is not urgent. I can work on something else if you are still working on this part (let me know). As two people working on the same component would make merging a bit painful.

@stokasto
Copy link
Contributor Author

Sure, sounds good I just added it this way because it was easiest and I wanted to get some plots quickly :).
I'll open another pull request with a small change that cleans the plot script up at least a bit and then you can take over!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants