Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout rate should be set to 0 if not using dropout #14

Open
droid666 opened this issue Jul 13, 2015 · 3 comments
Open

Dropout rate should be set to 0 if not using dropout #14

droid666 opened this issue Jul 13, 2015 · 3 comments

Comments

@droid666
Copy link

Hi,
Thanks for this code.
From looking at it, I think the dropout rate should be set to 0 if not using dropout. Since it looks like the code is going to adapt the weights W based on that rate independent of whether dropout is used or not.

@yliu90
Copy link

yliu90 commented Jan 25, 2016

I check the dropout code and find that issues bother me too the dropout flag seems to switch the way model calculate the gradient but I find that the theano graph of backpro mode is actually wrong ...

@mdenil
Copy link
Owner

mdenil commented Jan 25, 2016

Hi,

I don't understand the problem you are identifying. Can you explain in more detail?

The code always builds a theano graph for both dropout and no dropout regardless of whether dropout will be used or not during training. The dropout flag comes in on this line: https://github.com/mdenil/dropout/blob/master/mlp.py#L314 and chooses which part of the graph is used to compute updates.

@droid666
Copy link
Author

droid666 commented Apr 25, 2016

Sorry missed your message.

The problem is with the case that you don't want to use dropout, but still have values given for dropout rates. Then the normal net will reduce the (initial) output per layer with the given dropout rates.

Because this is done on normal net:
W=next_dropout_layer.W * (1 - dropout_rates[layer_counter])

This is not a problem. It will learn fine. It is just not so intuitive. The intialization of W is messed up a little, if you chose them carefully but then multiply with whatever random value is still in dropout_rates.

I my local fork I removed that, so the normal net is a "pure net". Instead I do the adaption on the dropout net:
W=layer.W / (1 - dropout_rates[layer_counter])
But then layer.W must be used in params and not W.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants