Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A discussion on out of memory error! #5199

Closed
meysamgolm opened this issue Jan 27, 2017 · 7 comments
Closed

A discussion on out of memory error! #5199

meysamgolm opened this issue Jan 27, 2017 · 7 comments

Comments

@meysamgolm
Copy link

I am running a CNN/LSTM model on biomedical data and I am getting out of memory error. I am aware of all the discussion on this issue like here and here . I know that the solution is using a smaller network or batch size. But let's analyze the problem in this thread, because when I am doing calculations, I don't know why my GPU is out of memory. Additionally I am using GeForce GTX 980 Ti which has 6G memory. The most advance GPU is NVIDIA TITAN X which has 12G memory. Not a big difference! So please find paradox in my calculations at below. Here is my network parameters:

input_shape = (None, 210, 22, 26, 1)
total_params= 62545
batch_size =64

So the memory for this network should be:

network_memory = total_params * batch_size * 4(bytes) = 62545 * 64 * 4 = 16,011,520.

If we add the size of each batch of data then we have:

data_memory = 1210 * 22 * 26 * 64 * 4 = 30,750,720.

So the total used memory is 47 M which is very small in comparison with 6G memory that I have on the cluster. I am sure that GPU memory is not used by other devices also. So why I am getting out of memory error?

This is the error that I am getting:

MemoryError: Error allocating 123002880 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuContiguous(GpuDimShuffle{0,3,1,2}.0)
Toposort index: 562
Inputs types: [CudaNdarrayType(float32, 4D)]
Inputs shapes: [(13440, 16, 11, 13)]
Inputs strides: [(2288, 1, 208, 16)]
Inputs values: ['not shown']
Outputs clients: [[GpuDnnPoolGrad{mode='max'}(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 0})]]

This is my network just in case:

___________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
timedistributed_1 (TimeDistribute(None, 210, 22, 26, 16160         timedistributed_input_1[0][0]    
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 210, 22, 26, 160           timedistributed_1[0][0]          
____________________________________________________________________________________________________
timedistributed_2 (TimeDistribute(None, 210, 11, 13, 160           activation_1[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 210, 11, 13, 160           timedistributed_2[0][0]          
____________________________________________________________________________________________________
timedistributed_3 (TimeDistribute(None, 210, 11, 13, 324640        dropout_1[0][0]                  
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 210, 11, 13, 320           timedistributed_3[0][0]          
____________________________________________________________________________________________________
timedistributed_4 (TimeDistribute(None, 210, 5, 6, 32) 0           activation_2[0][0]               
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 210, 5, 6, 32) 0           timedistributed_4[0][0]          
____________________________________________________________________________________________________
timedistributed_5 (TimeDistribute(None, 210, 5, 6, 64) 18496       dropout_2[0][0]                  
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 210, 5, 6, 64) 0           timedistributed_5[0][0]          
____________________________________________________________________________________________________
timedistributed_6 (TimeDistribute(None, 210, 2, 3, 64) 0           activation_3[0][0]               
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 210, 2, 3, 64) 0           timedistributed_6[0][0]          
____________________________________________________________________________________________________
timedistributed_7 (TimeDistribute(None, 210, 384)      0           dropout_3[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 210, 384)      0           timedistributed_7[0][0]          
____________________________________________________________________________________________________
convolution1d_1 (Convolution1D)  (None, 210, 16)       18448       dropout_4[0][0]                  
____________________________________________________________________________________________________
maxpooling1d_1 (MaxPooling1D)    (None, 26, 16)        0           convolution1d_1[0][0]            
____________________________________________________________________________________________________
dropout_5 (Dropout)              (None, 26, 16)        0           maxpooling1d_1[0][0]             
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 64)            20736       dropout_5[0][0]                  
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             65          lstm_1[0][0]                     
====================================================================================================
Total params: 62545
____________________________________________________________________________________________________
@gvtulder
Copy link
Contributor

The parameter values and the input data are only a small part of your memory requirements. You'll need far more memory to store the intermediate results, i.e., the outputs of each layer. For non-trivial models, this is where most memory is going (you could use the numbers from the "Output Shape" column to get an estimate of that). Theano will try to use the available memory somewhat efficiently -- running some operations in-place, for instance -- but many intermediate results need to be kept in memory to be able to compute the gradients.

@nouiz
Copy link
Contributor

nouiz commented Jan 27, 2017 via email

@gvtulder
Copy link
Contributor

gvtulder commented Jan 27, 2017

It might also help to make sure that you're using the Theano image_dim_ordering = 'th' in your Keras config and in your code. It looks like you might be using the TensorFlow ordering now, is that correct? If you are, switching to the Theano ordering might avoid some copying of your data.

The error message you linked to failed at GpuContiguous and GpuDimShuffle, which is a combination that is introduced when converting your data from the TensorFlow ordering to Theano and back.

MemoryError: Error allocating 123002880 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuContiguous(GpuDimShuffle{0,3,1,2}.0)
Toposort index: 562
Inputs types: [CudaNdarrayType(float32, 4D)]
Inputs shapes: [(13440, 16, 11, 13)]
Inputs strides: [(2288, 1, 208, 16)]
Inputs values: ['not shown']
Outputs clients: [[GpuDnnPoolGrad{mode='max'}(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 0})]]

@irisliucy
Copy link

I don't think this error is caused by Tensorflow/ Theano specifically. I got the same error even though I use Theano as backend.

{
    "image_dim_ordering": "th", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "theano"
}
MemoryError: Error allocating 4678800000 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAdvancedSubtensor1(embedding_1_W, Elemwise{Cast{int64}}.0)
Toposort index: 6
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(int64, vector)]
Inputs shapes: [(20001, 100), (11697000,)]
Inputs strides: [(100, 1), (8,)]
Inputs values: ['not shown', 'not shown']

@maderafunk
Copy link

I'm also trying to calculate the used memory, but there are many factors, that are different to calculate, i could not find a pattern. Besides the numbers of parameters and the batch_size, the choice of optimizer has also a big effect on the memory usage (SGD seems to consume much less than for example adadelta). But sometimes, the memory usage just seems to be quite random with theano. It might use 2500MB in one run, and 5000MB in another and back to 2500MB at a different time.

@nouiz
Copy link
Contributor

nouiz commented May 29, 2017 via email

@stale stale bot added the stale label Aug 27, 2017
@stale
Copy link

stale bot commented Aug 27, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants