Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout performance #64

Closed
f0k opened this issue Dec 10, 2014 · 22 comments
Closed

Dropout performance #64

f0k opened this issue Dec 10, 2014 · 22 comments

Comments

@f0k
Copy link
Member

f0k commented Dec 10, 2014

Looking at the dropout code, I see two potential performance problems:

return input * utils.floatX(_srng.binomial(input.shape, p=retain_prob, dtype='int32'))
  1. If you follow the implementation, the binomial is computed in 'floatX', then casted to 'int32' (inside binomial()) and then back to 'floatX'. It should be better to set the dtype to theano.config.floatX right away, or was there a reason?
  2. The size is symbolic (input.shape) rather than explicit (self.input_layer.get_output_shape()). This way MRG_RandomStreams cannot determine the number of streams to use and falls back to a default value. Is there a reason against using the explicit shape? Does DropoutLayer have to support shapes different from the compile time shape? Should we support this via an extra optional input_shape argument in get_output_for, similar to the convolutional layers, or maybe via a flag in the constructor (use_runtime_shape=True, fixed_shape=False or something)?
@benanne
Copy link
Member

benanne commented Dec 10, 2014

  1. I think I put int32 there because _srng.binomial would output int64 and I figured it needed to be 32 bit to be able to live on the GPU. Since floatX cannot be assumed to be 'float32' here, this is not really proper. I guess I figured using floatX as the target dtype wouldn't work, but if it does, we should just do that. Have you tested it?
  2. As far as I can remember, the mean reason to use the symbolic shape in get_output_for() as much as possible, is because we want to support the case where the batch size is undefined (i.e. None) at compile time. But I guess we could just check inside the layer whether the input shape is available and then use that, or fall back to the symbolic shape otherwise.

I don't recall if there are any other reasons to do it this way. At any rate I usually pay attention to using symbolic shapes in get_output_for, for the "unknown batch size" use case. If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

@f0k
Copy link
Member Author

f0k commented Dec 10, 2014

I guess I figured using floatX as the target dtype wouldn't work, but if it does I see no reason not to do that. Have you tested it?

Yes, my own code used a nonsymbolic shape and dtype=theano.config.floatX, that's why I noticed at all. If you specify the dtype to be 'int64', it will compute a uniform distribution with 'floatX', threshold it and cast the result to 'int64'. If you specify the dtype to be 'floatX', it will compute the uniform distribution with 'floatX', threshold it and cast the result to 'floatX', which is what we want.

If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

I'm not sure if the performance difference would be notable in practice, but at least the uncomfortable UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256 disappears if you specify a fixed shape at compile time.

@benanne
Copy link
Member

benanne commented Dec 10, 2014

Yeah, I guess it would be nice to get rid of that warning :) So feel free to modify it to use the compile-time shape, although it would still be good to keep the "unknown batch size" use case in mind, so then we definitely need to check for it and fall back to the symbolic shape if necessary.

@craffel
Copy link
Member

craffel commented Dec 10, 2014

Related (maybe) issue I have been debugging today: It seems like there's a memory leak when using DropoutLayer. Here's an example:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_hidden1_dropout = nntools.layers.DropoutLayer(l_hidden1, p=0.5)
    l_out = nntools.layers.DenseLayer(l_hidden1_dropout, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

yields

Call 0, before: 3108216832
Call 0, after:  3108216832
Call 1, before: 3108216832
Call 1, after:  3107168256
Call 2, before: 3107168256
Call 2, after:  3107168256
Call 3, before: 3107168256
Call 3, after:  3106119680

If I remove the dropout layer:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_out = nntools.layers.DenseLayer(l_hidden1, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

it yields

Call 0, before: 3106119680
Call 0, after:  3106119680
Call 1, before: 3106119680
Call 1, after:  3106119680
Call 2, before: 3106119680
Call 2, after:  3106119680
Call 3, before: 3106119680
Call 3, after:  3106119680

The amount of memory leaked is of course larger when the network is bigger. Not sure if this is a theano bug or a nntools bug, but it came up when I was doing a large hyperparameter search (so building and training lots of networks sequentially in a for loop), and the memory leaks accumulated over time until I ran out of memory on my GPU.

@benanne
Copy link
Member

benanne commented Dec 10, 2014

@craffel Does it still happen with the new version without the superfluous cast? Also, what Theano version are you using? Maybe they've fixed this in the latest version from git?

@craffel
Copy link
Member

craffel commented Dec 10, 2014

Yeah, sorry, should have specified. Using the latest nntools (including the removal of the superfluous cast), and using the latest Theano from github (well, latest as of a few hours ago).

@benanne
Copy link
Member

benanne commented Dec 10, 2014

Interesting! That seems more likely to be a Theano problem than an nntools problem though. Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, that could make a difference. But then there's still a bug somewhere.

@craffel
Copy link
Member

craffel commented Dec 10, 2014

Yeah, the use of the symbolic shape is my only guess in terms of it being an nntools issue. If it turns out not to be that, I can try to distill it into pure Theano to create a bug request over there.

@benanne
Copy link
Member

benanne commented Dec 10, 2014

Even if that fixes it we should probably send a bug report, because it should really just work as it is.

craffel added a commit to craffel/midi-dataset that referenced this issue Dec 10, 2014
@f0k
Copy link
Member Author

f0k commented Dec 10, 2014

Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, [...]

I think we probably shouldn't do that unconditionally. Let's instead decide on how to handle switching between compile-time shape and runtime shape in the DropoutLayer, convolutional layers and anything else that may need it. As I said, the convolutional layers currently have an extra input_shape parameter in get_output_for -- should we do the same in the dropout layer to allow overriding the shape it would use?

[...] that could make a difference.

@craffel: To check for that, just replace input.shape by self.input_layer.get_output_shape() and try! You could also try THEANO_FLAGS=allow_gc=1 in case you disabled it.

But in any case, as Sander said, we should write a test case that doesn't use nntools and file a Theano issue.

@craffel
Copy link
Member

craffel commented Dec 11, 2014

Right, sorry, should have specified that I have allow_gc=True and linker=cvm' (you can also disable garbage collection via the linker config setting). And, just confirming that changingDropoutLayerto useself.input_layer.get_output_shape()` doesn't fix the memory leak.

@f0k
Copy link
Member Author

f0k commented Dec 11, 2014

OK, do you still observe the leak when you remove all the layers except for the dropout layer? If so, you can copy out the implementation of the dropout layer and have a somewhat minimal example that does not use nntools. If you create an Issue for Theano, Frédéric will probably be able to track down the problem.

@f0k
Copy link
Member Author

f0k commented Dec 11, 2014

I was curious and can confirm the leak. I've been able to minimize it to:

import theano
T = theano.tensor
import numpy as np
import nntools

def build_net():
    l_in = nntools.layers.InputLayer(shape=(10,4))
    l_out = nntools.layers.dropout(l_in, p=0.5)
    fn = theano.function([l_in.input_var], l_out.get_output())

for n in xrange(10):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net()
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

That is, the sheer compilation triggers the leak, the function does not need to be run at all.

@f0k
Copy link
Member Author

f0k commented Dec 11, 2014

Reported as Theano/Theano#2335.

@benanne
Copy link
Member

benanne commented Dec 18, 2014

I will close this as the spurious cast has been removed, and the other is being taken care of at the Theano level.

@benanne benanne closed this as completed Dec 18, 2014
@f0k
Copy link
Member Author

f0k commented Dec 25, 2014

The second point of my initial post is still open, though -- we should use a nonsymbolic shape for the dropout mask when possible. I'm not sure what "when possible" should mean, though. "when there is no None in the input layer's output shape"? "when the user did not tell otherwise in the constructor"? "when the user did not specify a shape in get_output_for"?

@benanne
Copy link
Member

benanne commented Dec 25, 2014

Good point, I missed that. I guess we can use it whenever the necessary shape info is specified? I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

@benanne benanne reopened this Dec 25, 2014
@f0k
Copy link
Member Author

f0k commented Dec 25, 2014

I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

But isn't the shape always specified? Won't self.input_layer.get_output_shape() always return a fully-specified nonsymbolic shape? Is there a way to create an InputLayer with a partially-defined shape?

@benanne
Copy link
Member

benanne commented Dec 25, 2014

well, I figured we want to support at least the case of a variable batch size, so it would be okay to specify None for the batch size. I don't know how well this is supported at the moment (the convolution layers might fail I guess).

@dnouri
Copy link
Member

dnouri commented Jan 2, 2015

For the record (search engines). For those like me who (wrongly) set a fixed batch size in the input layer, but were passing batches with variable size into the network, this change in DropoutLayer may trigger an error like:

ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 128, but the output's size on that axis is <SOME_NUMBER>.

The right thing to do is to set the batch size to None when specifying the shape of your input layers; this is what's referred to in the previous comment.

@benanne I tested both layers.Conv2DCCLayer layers.Conv2DLayer and layers.cuda_convnet.Conv2DCCLayer, and both seem to be okay with a variable batch size indicated by None.

@benanne
Copy link
Member

benanne commented Jan 2, 2015

@dnouri do you mean layers.Conv2DLayer? Because those two are the same now :) The cuda-convnet wrappers don't require any shape info so I'm sure they're fine. The regular Conv2DLayer might or might not be (it shouldn't be too hard to fix that anyway). We'll need to add tests for this use case because it seems easy to forget about it.

@dnouri
Copy link
Member

dnouri commented Jan 2, 2015

@benanne Yes, I meant layers.Conv2DLayer; it just works. Haven't tried with dnn_conv or GpuCorrMM yet; these are the defaults in newer versions of Theano.

craffel added a commit to craffel/dhs that referenced this issue Jan 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants