New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResourceExhaustedError MultiResUNet3D #4
Comments
In our experiments with the MultiResUNet 3D we used 3D MRI images of dimension 80x80x48x4 and batch size =2. We used Titan XP gpu with 12GB memory. It would seem that your input images are too large to fit the model into an 11GB gpu. Perhaps, you can reduce the size of images as we did to overcome memory constraint, as the 3D model is indeed quite expensive. Alternatively you can reduce the layers and/or kernels to fit the model with the given image size into your gpu. As the get_model_memory_usage() function returns you would need 21.628 GB memory to fit the model in your gpu. |
I have a similar problem: Even with severely reduced image resolution and batch size I have 46.409 GB model memory usage. But I also have three graphics cards with 32 GB memory each. Do you have a hint how I can let your model use all of them together? Traceback (most recent call last):
File "/home/x/PycharmProjects/MultiResUNet/run_on_2D.py", line 95, in <module>
model_dir=model_dir2)
File "/home/x/PycharmProjects/MultiResUNet/mrun_functions.py", line 298, in train_step
callbacks=[es, TqdmCallback(verbose=1)])
File "/home/x/anaconda3/envs/MultiResUNet/lib/python3.7/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/home/x/anaconda3/envs/MultiResUNet/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/home/x/anaconda3/envs/MultiResUNet/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
run_metadata=self.run_metadata)
File "/home/x/anaconda3/envs/MultiResUNet/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[10,64,288,288] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/zeros_86}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[metrics/dice_coef/Identity/_2691]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[10,64,288,288] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/zeros_86}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
Process finished with exit code 1 Edit: In my case this seems to help. --> from keras.utils import multi_gpu_model
model = multi_gpu_model(model, gpus=3) |
Sometimes I get the above mentioned error although get_model_memory_usage had told me that it should easily fit. Anyone else? |
Not sure about what may happen. But maybe for some caching in GPU some memory is already allocated? You may have a look at here |
Unfortunately, I wouldn't know how to restart the python interpreter in a python loop for leave-one-out cross-validation. Perhaps the above-mentioned function for calculating the required memory is simply wrong. (I haven't found a better one yet either, though.) When I select batch size and image size so that it should just fit in the graphics memory, it basically never works. I always use one of my three graphics cards as a buffer, and even then it doesn't always work. |
@saskra if you running multiple training sessions in one python code, like doing LOOCV or 5 Fold CV, I follow a kind of hack. I don't know if it is general or not so please don't quote me on this 😅 I too found that the memory crashes on such cases.So, after a training session has completed, in the code I put the following lines
This somewhat frees the cache (at least from my experience). You may try to see if this works for you or not, as this is just kind of a hack. |
Thank you, I will try that out! But does the gc actually free GPU memory? Nevertheless, the calculation of the required memory of the model seems to be wrong, because the crash can also occur on the first run of the loop. |
No @saskra . GC is unlikely to free GPU memory. I think the time.sleep() does the trick mostly. Nevertheless I put it there if it frees some RAM. |
When I try to train the MultiResUNet3D model with input shape = (128, 128, 128, 1) and batch size = 1, keras raises this exception:
also, I used the following function to calculate the gpu memory that keras needs:
and the output when I execute the next code:
it is normal that the model does not run?
How much GPU memory do I need?
my GPU is a gtx 1080ti (11GB)
The text was updated successfully, but these errors were encountered: