Dropout performance #64

f0k · 2014-12-10T18:46:50Z

Looking at the dropout code, I see two potential performance problems:

return input * utils.floatX(_srng.binomial(input.shape, p=retain_prob, dtype='int32'))

If you follow the implementation, the binomial is computed in 'floatX', then casted to 'int32' (inside binomial()) and then back to 'floatX'. It should be better to set the dtype to theano.config.floatX right away, or was there a reason?
The size is symbolic (input.shape) rather than explicit (self.input_layer.get_output_shape()). This way MRG_RandomStreams cannot determine the number of streams to use and falls back to a default value. Is there a reason against using the explicit shape? Does DropoutLayer have to support shapes different from the compile time shape? Should we support this via an extra optional input_shape argument in get_output_for, similar to the convolutional layers, or maybe via a flag in the constructor (use_runtime_shape=True, fixed_shape=False or something)?

The text was updated successfully, but these errors were encountered:

benanne · 2014-12-10T19:35:06Z

I think I put int32 there because _srng.binomial would output int64 and I figured it needed to be 32 bit to be able to live on the GPU. Since floatX cannot be assumed to be 'float32' here, this is not really proper. I guess I figured using floatX as the target dtype wouldn't work, but if it does, we should just do that. Have you tested it?
As far as I can remember, the mean reason to use the symbolic shape in get_output_for() as much as possible, is because we want to support the case where the batch size is undefined (i.e. None) at compile time. But I guess we could just check inside the layer whether the input shape is available and then use that, or fall back to the symbolic shape otherwise.

I don't recall if there are any other reasons to do it this way. At any rate I usually pay attention to using symbolic shapes in get_output_for, for the "unknown batch size" use case. If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

f0k · 2014-12-10T20:57:48Z

I guess I figured using floatX as the target dtype wouldn't work, but if it does I see no reason not to do that. Have you tested it?

Yes, my own code used a nonsymbolic shape and dtype=theano.config.floatX, that's why I noticed at all. If you specify the dtype to be 'int64', it will compute a uniform distribution with 'floatX', threshold it and cast the result to 'int64'. If you specify the dtype to be 'floatX', it will compute the uniform distribution with 'floatX', threshold it and cast the result to 'floatX', which is what we want.

If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

I'm not sure if the performance difference would be notable in practice, but at least the uncomfortable UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256 disappears if you specify a fixed shape at compile time.

benanne · 2014-12-10T21:29:13Z

Yeah, I guess it would be nice to get rid of that warning :) So feel free to modify it to use the compile-time shape, although it would still be good to keep the "unknown batch size" use case in mind, so then we definitely need to check for it and fall back to the symbolic shape if necessary.

craffel · 2014-12-10T21:37:29Z

Related (maybe) issue I have been debugging today: It seems like there's a memory leak when using DropoutLayer. Here's an example:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_hidden1_dropout = nntools.layers.DropoutLayer(l_hidden1, p=0.5)
    l_out = nntools.layers.DenseLayer(l_hidden1_dropout, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

yields

Call 0, before: 3108216832
Call 0, after:  3108216832
Call 1, before: 3108216832
Call 1, after:  3107168256
Call 2, before: 3107168256
Call 2, after:  3107168256
Call 3, before: 3107168256
Call 3, after:  3106119680

If I remove the dropout layer:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_out = nntools.layers.DenseLayer(l_hidden1, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

it yields

Call 0, before: 3106119680
Call 0, after:  3106119680
Call 1, before: 3106119680
Call 1, after:  3106119680
Call 2, before: 3106119680
Call 2, after:  3106119680
Call 3, before: 3106119680
Call 3, after:  3106119680

The amount of memory leaked is of course larger when the network is bigger. Not sure if this is a theano bug or a nntools bug, but it came up when I was doing a large hyperparameter search (so building and training lots of networks sequentially in a for loop), and the memory leaks accumulated over time until I ran out of memory on my GPU.

benanne · 2014-12-10T21:51:01Z

@craffel Does it still happen with the new version without the superfluous cast? Also, what Theano version are you using? Maybe they've fixed this in the latest version from git?

craffel · 2014-12-10T22:01:12Z

Yeah, sorry, should have specified. Using the latest nntools (including the removal of the superfluous cast), and using the latest Theano from github (well, latest as of a few hours ago).

benanne · 2014-12-10T22:02:24Z

Interesting! That seems more likely to be a Theano problem than an nntools problem though. Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, that could make a difference. But then there's still a bug somewhere.

craffel · 2014-12-10T22:05:07Z

Yeah, the use of the symbolic shape is my only guess in terms of it being an nntools issue. If it turns out not to be that, I can try to distill it into pure Theano to create a bug request over there.

benanne · 2014-12-10T22:06:31Z

Even if that fixes it we should probably send a bug report, because it should really just work as it is.

f0k · 2014-12-10T23:46:11Z

Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, [...]

I think we probably shouldn't do that unconditionally. Let's instead decide on how to handle switching between compile-time shape and runtime shape in the DropoutLayer, convolutional layers and anything else that may need it. As I said, the convolutional layers currently have an extra input_shape parameter in get_output_for -- should we do the same in the dropout layer to allow overriding the shape it would use?

[...] that could make a difference.

@craffel: To check for that, just replace input.shape by self.input_layer.get_output_shape() and try! You could also try THEANO_FLAGS=allow_gc=1 in case you disabled it.

But in any case, as Sander said, we should write a test case that doesn't use nntools and file a Theano issue.

craffel · 2014-12-11T16:02:11Z

Right, sorry, should have specified that I have allow_gc=True and linker=cvm' (you can also disable garbage collection via the linker config setting). And, just confirming that changingDropoutLayerto useself.input_layer.get_output_shape()` doesn't fix the memory leak.

f0k · 2014-12-11T16:53:41Z

OK, do you still observe the leak when you remove all the layers except for the dropout layer? If so, you can copy out the implementation of the dropout layer and have a somewhat minimal example that does not use nntools. If you create an Issue for Theano, Frédéric will probably be able to track down the problem.

f0k · 2014-12-11T17:03:50Z

I was curious and can confirm the leak. I've been able to minimize it to:

import theano
T = theano.tensor
import numpy as np
import nntools

def build_net():
    l_in = nntools.layers.InputLayer(shape=(10,4))
    l_out = nntools.layers.dropout(l_in, p=0.5)
    fn = theano.function([l_in.input_var], l_out.get_output())

for n in xrange(10):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net()
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

That is, the sheer compilation triggers the leak, the function does not need to be run at all.

f0k · 2014-12-11T17:27:45Z

Reported as Theano/Theano#2335.

benanne · 2014-12-18T15:34:26Z

I will close this as the spurious cast has been removed, and the other is being taken care of at the Theano level.

f0k · 2014-12-25T09:43:19Z

The second point of my initial post is still open, though -- we should use a nonsymbolic shape for the dropout mask when possible. I'm not sure what "when possible" should mean, though. "when there is no None in the input layer's output shape"? "when the user did not tell otherwise in the constructor"? "when the user did not specify a shape in get_output_for"?

benanne · 2014-12-25T09:46:02Z

Good point, I missed that. I guess we can use it whenever the necessary shape info is specified? I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

f0k · 2014-12-25T10:03:00Z

I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

But isn't the shape always specified? Won't self.input_layer.get_output_shape() always return a fully-specified nonsymbolic shape? Is there a way to create an InputLayer with a partially-defined shape?

benanne · 2014-12-25T10:13:11Z

well, I figured we want to support at least the case of a variable batch size, so it would be okay to specify None for the batch size. I don't know how well this is supported at the moment (the convolution layers might fail I guess).

dnouri · 2015-01-02T01:21:41Z

For the record (search engines). For those like me who (wrongly) set a fixed batch size in the input layer, but were passing batches with variable size into the network, this change in DropoutLayer may trigger an error like:

ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 128, but the output's size on that axis is <SOME_NUMBER>.

The right thing to do is to set the batch size to None when specifying the shape of your input layers; this is what's referred to in the previous comment.

@benanne I tested both ~~layers.Conv2DCCLayer~~ layers.Conv2DLayer and layers.cuda_convnet.Conv2DCCLayer, and both seem to be okay with a variable batch size indicated by None.

benanne · 2015-01-02T10:53:23Z

@dnouri do you mean layers.Conv2DLayer? Because those two are the same now :) The cuda-convnet wrappers don't require any shape info so I'm sure they're fine. The regular Conv2DLayer might or might not be (it shouldn't be too hard to fix that anyway). We'll need to add tests for this use case because it seems easy to forget about it.

dnouri · 2015-01-02T11:32:31Z

@benanne Yes, I meant layers.Conv2DLayer; it just works. Haven't tried with dnn_conv or GpuCorrMM yet; these are the defaults in newer versions of Theano.

f0k mentioned this issue Dec 10, 2014

Removed a redundant typecast in DropoutLayer #65

Merged

craffel added a commit to craffel/midi-dataset that referenced this issue Dec 10, 2014

Dropout is causing memory leaks (see Lasagne/Lasagne#64)

4ae63ef

benanne closed this as completed Dec 18, 2014

benanne reopened this Dec 25, 2014

This was referenced Dec 25, 2014

Documentation #26

Closed

DropoutLayer: Use nonsymbolic shape for dropout mask if possible #80

Merged

benanne closed this as completed Dec 26, 2014

craffel added a commit to craffel/dhs that referenced this issue Jan 17, 2016

Dropout is causing memory leaks (see Lasagne/Lasagne#64)

0335400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout performance #64

Dropout performance #64

f0k commented Dec 10, 2014

benanne commented Dec 10, 2014

f0k commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

f0k commented Dec 10, 2014

craffel commented Dec 11, 2014

f0k commented Dec 11, 2014

f0k commented Dec 11, 2014

f0k commented Dec 11, 2014

benanne commented Dec 18, 2014

f0k commented Dec 25, 2014

benanne commented Dec 25, 2014

f0k commented Dec 25, 2014

benanne commented Dec 25, 2014

dnouri commented Jan 2, 2015

benanne commented Jan 2, 2015

dnouri commented Jan 2, 2015

Dropout performance #64

Dropout performance #64

Comments

f0k commented Dec 10, 2014

benanne commented Dec 10, 2014

f0k commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

craffel commented Dec 10, 2014

benanne commented Dec 10, 2014

f0k commented Dec 10, 2014

craffel commented Dec 11, 2014

f0k commented Dec 11, 2014

f0k commented Dec 11, 2014

f0k commented Dec 11, 2014

benanne commented Dec 18, 2014

f0k commented Dec 25, 2014

benanne commented Dec 25, 2014

f0k commented Dec 25, 2014

benanne commented Dec 25, 2014

dnouri commented Jan 2, 2015

benanne commented Jan 2, 2015

dnouri commented Jan 2, 2015