-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropout performance #64
Comments
I don't recall if there are any other reasons to do it this way. At any rate I usually pay attention to using symbolic shapes in |
Yes, my own code used a nonsymbolic shape and
I'm not sure if the performance difference would be notable in practice, but at least the uncomfortable |
Yeah, I guess it would be nice to get rid of that warning :) So feel free to modify it to use the compile-time shape, although it would still be good to keep the "unknown batch size" use case in mind, so then we definitely need to check for it and fall back to the symbolic shape if necessary. |
Related (maybe) issue I have been debugging today: It seems like there's a memory leak when using INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
l_in = nntools.layers.InputLayer(shape=x_in.shape)
l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
l_hidden1_dropout = nntools.layers.DropoutLayer(l_hidden1, p=0.5)
l_out = nntools.layers.DenseLayer(l_hidden1_dropout, num_units=output_dim)
net_in = T.matrix()
output = theano.function([net_in], l_out.get_output(net_in))
return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net(x)
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0]) yields
If I remove the dropout layer: INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
l_in = nntools.layers.InputLayer(shape=x_in.shape)
l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
l_out = nntools.layers.DenseLayer(l_hidden1, num_units=output_dim)
net_in = T.matrix()
output = theano.function([net_in], l_out.get_output(net_in))
return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net(x)
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0]) it yields
The amount of memory leaked is of course larger when the network is bigger. Not sure if this is a |
@craffel Does it still happen with the new version without the superfluous cast? Also, what Theano version are you using? Maybe they've fixed this in the latest version from git? |
Yeah, sorry, should have specified. Using the latest nntools (including the removal of the superfluous cast), and using the latest Theano from github (well, latest as of a few hours ago). |
Interesting! That seems more likely to be a Theano problem than an nntools problem though. Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, that could make a difference. But then there's still a bug somewhere. |
Yeah, the use of the symbolic shape is my only guess in terms of it being an |
Even if that fixes it we should probably send a bug report, because it should really just work as it is. |
I think we probably shouldn't do that unconditionally. Let's instead decide on how to handle switching between compile-time shape and runtime shape in the DropoutLayer, convolutional layers and anything else that may need it. As I said, the convolutional layers currently have an extra
@craffel: To check for that, just replace But in any case, as Sander said, we should write a test case that doesn't use nntools and file a Theano issue. |
Right, sorry, should have specified that I have |
OK, do you still observe the leak when you remove all the layers except for the dropout layer? If so, you can copy out the implementation of the dropout layer and have a somewhat minimal example that does not use nntools. If you create an Issue for Theano, Frédéric will probably be able to track down the problem. |
I was curious and can confirm the leak. I've been able to minimize it to: import theano
T = theano.tensor
import numpy as np
import nntools
def build_net():
l_in = nntools.layers.InputLayer(shape=(10,4))
l_out = nntools.layers.dropout(l_in, p=0.5)
fn = theano.function([l_in.input_var], l_out.get_output())
for n in xrange(10):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net()
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0]) That is, the sheer compilation triggers the leak, the function does not need to be run at all. |
Reported as Theano/Theano#2335. |
I will close this as the spurious cast has been removed, and the other is being taken care of at the Theano level. |
The second point of my initial post is still open, though -- we should use a nonsymbolic shape for the dropout mask when possible. I'm not sure what "when possible" should mean, though. "when there is no |
Good point, I missed that. I guess we can use it whenever the necessary shape info is specified? I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape. |
But isn't the shape always specified? Won't |
well, I figured we want to support at least the case of a variable batch size, so it would be okay to specify |
For the record (search engines). For those like me who (wrongly) set a fixed batch size in the input layer, but were passing batches with variable size into the network, this change in
The right thing to do is to set the batch size to @benanne I tested both |
@dnouri do you mean |
@benanne Yes, I meant |
Looking at the dropout code, I see two potential performance problems:
binomial()
) and then back to 'floatX'. It should be better to set the dtype totheano.config.floatX
right away, or was there a reason?input.shape
) rather than explicit (self.input_layer.get_output_shape()
). This wayMRG_RandomStreams
cannot determine the number of streams to use and falls back to a default value. Is there a reason against using the explicit shape? DoesDropoutLayer
have to support shapes different from the compile time shape? Should we support this via an extra optionalinput_shape
argument inget_output_for
, similar to the convolutional layers, or maybe via a flag in the constructor (use_runtime_shape=True
,fixed_shape=False
or something)?The text was updated successfully, but these errors were encountered: