OOM Error #4

nerdogram · 2019-12-03T10:40:14Z

I get OOM errors when an input image is bigger than about 1200 pixels each side (this varies by image for some reason). Can you help me understand how the model is breaking because of this? Is it the shape of the model or some other error and if we can configure it?

Thanks!

This is the error:

2019-12-03 10:39:21.705187: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *_________________****************************__________***************************_________________
2019-12-03 10:39:21.705596: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at transpose_op.cc:198 : Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "infer.py", line 50, in <module>
    main()
  File "infer.py", line 37, in main
    sr = model.predict(np.expand_dims(low_res, axis=0))[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 908, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 723, in predict
    callbacks=callbacks)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node model_2/p_re_lu_2/Relu_1-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[model_2/conv2d_13/Tanh/_743]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node model_2/p_re_lu_2/Relu_1-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

The text was updated successfully, but these errors were encountered:

HasnainRaz · 2019-12-03T10:50:45Z

If you have a GPU, it is not big enough to store the model + feature maps + output. You can see that the OOM happens on the final layer, when the output has the size 4800x4800x32. You can try to do inference on smaller images, or try splitting the model over multiple GPUs.

If you have more RAM than your GPU does, you can try doing inference on CPU (which will use RAM for allocating tensors). It will be slow but at least it will work. You can see here how to do this: link

nerdogram · 2019-12-03T12:00:41Z

I see. Thanks for replying so quickly.

HasnainRaz closed this as completed Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM Error #4

OOM Error #4

nerdogram commented Dec 3, 2019 •

edited

Loading

HasnainRaz commented Dec 3, 2019 •

edited

Loading

nerdogram commented Dec 3, 2019

OOM Error #4

OOM Error #4

Comments

nerdogram commented Dec 3, 2019 • edited Loading

HasnainRaz commented Dec 3, 2019 • edited Loading

nerdogram commented Dec 3, 2019

nerdogram commented Dec 3, 2019 •

edited

Loading

HasnainRaz commented Dec 3, 2019 •

edited

Loading