A discussion on out of memory error! #5199

meysamgolm · 2017-01-27T04:38:20Z

I am running a CNN/LSTM model on biomedical data and I am getting out of memory error. I am aware of all the discussion on this issue like here and here . I know that the solution is using a smaller network or batch size. But let's analyze the problem in this thread, because when I am doing calculations, I don't know why my GPU is out of memory. Additionally I am using GeForce GTX 980 Ti which has 6G memory. The most advance GPU is NVIDIA TITAN X which has 12G memory. Not a big difference! So please find paradox in my calculations at below. Here is my network parameters:

input_shape = (None, 210, 22, 26, 1)
total_params= 62545
batch_size =64

So the memory for this network should be:

network_memory = total_params * batch_size * 4(bytes) = 62545 * 64 * 4 = 16,011,520.

If we add the size of each batch of data then we have:

data_memory = 1210 * 22 * 26 * 64 * 4 = 30,750,720.

So the total used memory is 47 M which is very small in comparison with 6G memory that I have on the cluster. I am sure that GPU memory is not used by other devices also. So why I am getting out of memory error?

This is the error that I am getting:

MemoryError: Error allocating 123002880 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuContiguous(GpuDimShuffle{0,3,1,2}.0)
Toposort index: 562
Inputs types: [CudaNdarrayType(float32, 4D)]
Inputs shapes: [(13440, 16, 11, 13)]
Inputs strides: [(2288, 1, 208, 16)]
Inputs values: ['not shown']
Outputs clients: [[GpuDnnPoolGrad{mode='max'}(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 0})]]

This is my network just in case:

___________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
timedistributed_1 (TimeDistribute(None, 210, 22, 26, 16160         timedistributed_input_1[0][0]    
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 210, 22, 26, 160           timedistributed_1[0][0]          
____________________________________________________________________________________________________
timedistributed_2 (TimeDistribute(None, 210, 11, 13, 160           activation_1[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 210, 11, 13, 160           timedistributed_2[0][0]          
____________________________________________________________________________________________________
timedistributed_3 (TimeDistribute(None, 210, 11, 13, 324640        dropout_1[0][0]                  
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 210, 11, 13, 320           timedistributed_3[0][0]          
____________________________________________________________________________________________________
timedistributed_4 (TimeDistribute(None, 210, 5, 6, 32) 0           activation_2[0][0]               
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 210, 5, 6, 32) 0           timedistributed_4[0][0]          
____________________________________________________________________________________________________
timedistributed_5 (TimeDistribute(None, 210, 5, 6, 64) 18496       dropout_2[0][0]                  
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 210, 5, 6, 64) 0           timedistributed_5[0][0]          
____________________________________________________________________________________________________
timedistributed_6 (TimeDistribute(None, 210, 2, 3, 64) 0           activation_3[0][0]               
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 210, 2, 3, 64) 0           timedistributed_6[0][0]          
____________________________________________________________________________________________________
timedistributed_7 (TimeDistribute(None, 210, 384)      0           dropout_3[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 210, 384)      0           timedistributed_7[0][0]          
____________________________________________________________________________________________________
convolution1d_1 (Convolution1D)  (None, 210, 16)       18448       dropout_4[0][0]                  
____________________________________________________________________________________________________
maxpooling1d_1 (MaxPooling1D)    (None, 26, 16)        0           convolution1d_1[0][0]            
____________________________________________________________________________________________________
dropout_5 (Dropout)              (None, 26, 16)        0           maxpooling1d_1[0][0]             
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 64)            20736       dropout_5[0][0]                  
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             65          lstm_1[0][0]                     
====================================================================================================
Total params: 62545
____________________________________________________________________________________________________

The text was updated successfully, but these errors were encountered:

gvtulder · 2017-01-27T14:32:33Z

The parameter values and the input data are only a small part of your memory requirements. You'll need far more memory to store the intermediate results, i.e., the outputs of each layer. For non-trivial models, this is where most memory is going (you could use the numbers from the "Output Shape" column to get an estimate of that). Theano will try to use the available memory somewhat efficiently -- running some operations in-place, for instance -- but many intermediate results need to be kept in memory to be able to compute the gradients.

nouiz · 2017-01-27T14:37:28Z

Smaller mini-batch size is a easy way to lower the requested memory usage. For RNN model type, Theano now have a new "scan" feature that do a thread off between memory usage and speed: http://deeplearning.net/software/theano_versions/dev/ library/scan.html?highlight=scan#theano.scan_checkpoints If you don't want to lower the mini-batch size. I don't think keras offer to use it, but it is an easy change to do.

…

On Fri, Jan 27, 2017 at 9:32 AM, Gijs van Tulder ***@***.***> wrote: The parameter values and the input data are only a small part of your memory requirements. You'll need far more memory to store the intermediate results, i.e., the outputs of each layer. For non-trivial models, this is where most memory is going (you could use the numbers from the "Output Shape" column to get an estimate of that). Theano will try to use the available memory somewhat efficiently -- running some operations in-place, for instance -- but many intermediate results need to be kept in memory to be able to compute the gradients. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5199 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-6-kaDisN2_s8mq91NGOTgRvNDs4ks5rWgADgaJpZM4LvbZE> .

gvtulder · 2017-01-27T14:51:28Z

It might also help to make sure that you're using the Theano image_dim_ordering = 'th' in your Keras config and in your code. It looks like you might be using the TensorFlow ordering now, is that correct? If you are, switching to the Theano ordering might avoid some copying of your data.

The error message you linked to failed at GpuContiguous and GpuDimShuffle, which is a combination that is introduced when converting your data from the TensorFlow ordering to Theano and back.

MemoryError: Error allocating 123002880 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuContiguous(GpuDimShuffle{0,3,1,2}.0)
Toposort index: 562
Inputs types: [CudaNdarrayType(float32, 4D)]
Inputs shapes: [(13440, 16, 11, 13)]
Inputs strides: [(2288, 1, 208, 16)]
Inputs values: ['not shown']
Outputs clients: [[GpuDnnPoolGrad{mode='max'}(GpuContiguous.0, GpuContiguous.0, GpuContiguous.0, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 2}, TensorConstant{(2,) of 0})]]

irisliucy · 2017-03-02T03:16:01Z

I don't think this error is caused by Tensorflow/ Theano specifically. I got the same error even though I use Theano as backend.

{
    "image_dim_ordering": "th", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "theano"
}

MemoryError: Error allocating 4678800000 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuAdvancedSubtensor1(embedding_1_W, Elemwise{Cast{int64}}.0)
Toposort index: 6
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(int64, vector)]
Inputs shapes: [(20001, 100), (11697000,)]
Inputs strides: [(100, 1), (8,)]
Inputs values: ['not shown', 'not shown']

maderafunk · 2017-05-26T11:09:56Z

I'm also trying to calculate the used memory, but there are many factors, that are different to calculate, i could not find a pattern. Besides the numbers of parameters and the batch_size, the choice of optimizer has also a big effect on the memory usage (SGD seems to consume much less than for example adadelta). But sometimes, the memory usage just seems to be quite random with theano. It might use 2500MB in one run, and 5000MB in another and back to 2500MB at a different time.

nouiz · 2017-05-29T11:55:19Z

Which value do you use for the half device? If it start with GPU, do you use the flag lib.cnmem? If not using it will speed up the computation and "fix" this problem. I would recommend that you use the new GPU backend, with device=cuda. Check Theano web site about it. Le ven. 26 mai 2017 07:10, maderafunk <notifications@github.com> a écrit :

…

I'm also trying to calculate the used memory, but there are many factors, that are different to calculate, i could not find a pattern. Besides the numbers of parameters and the batch_size, the choice of optimizer has also a big effect on the memory usage (SGD seems to consume much less than for example adadelta). But sometimes, the memory usage just seems to be quite random with theano. It might use 2500MB in one run, and 5000MB in another and back to 2500MB at a different time. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5199 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-65aPY-gPwc3pJA8jAT58aAJib5Xks5r9rMGgaJpZM4LvbZE> .

stale · 2017-08-27T12:24:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale bot added the stale label Aug 27, 2017

stale bot closed this as completed Sep 26, 2017

gowthamkpr mentioned this issue Jul 24, 2019

Memory Error--Input Data Are Huge #13139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A discussion on out of memory error! #5199

A discussion on out of memory error! #5199

meysamgolm commented Jan 27, 2017

gvtulder commented Jan 27, 2017

nouiz commented Jan 27, 2017 via email

gvtulder commented Jan 27, 2017 •

edited

irisliucy commented Mar 2, 2017

maderafunk commented May 26, 2017

nouiz commented May 29, 2017 via email

stale bot commented Aug 27, 2017

A discussion on out of memory error! #5199

A discussion on out of memory error! #5199

Comments

meysamgolm commented Jan 27, 2017

gvtulder commented Jan 27, 2017

nouiz commented Jan 27, 2017 via email

gvtulder commented Jan 27, 2017 • edited

irisliucy commented Mar 2, 2017

maderafunk commented May 26, 2017

nouiz commented May 29, 2017 via email

stale bot commented Aug 27, 2017

gvtulder commented Jan 27, 2017 •

edited