Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Error #4

Closed
nerdogram opened this issue Dec 3, 2019 · 2 comments
Closed

OOM Error #4

nerdogram opened this issue Dec 3, 2019 · 2 comments

Comments

@nerdogram
Copy link

nerdogram commented Dec 3, 2019

I get OOM errors when an input image is bigger than about 1200 pixels each side (this varies by image for some reason). Can you help me understand how the model is breaking because of this? Is it the shape of the model or some other error and if we can configure it?

Thanks!

This is the error:

2019-12-03 10:39:21.705187: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *_________________****************************__________***************************_________________
2019-12-03 10:39:21.705596: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at transpose_op.cc:198 : Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "infer.py", line 50, in <module>
    main()
  File "infer.py", line 37, in main
    sr = model.predict(np.expand_dims(low_res, axis=0))[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 908, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 723, in predict
    callbacks=callbacks)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node model_2/p_re_lu_2/Relu_1-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[model_2/conv2d_13/Tanh/_743]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[1,4800,4800,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node model_2/p_re_lu_2/Relu_1-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.
@HasnainRaz
Copy link
Owner

HasnainRaz commented Dec 3, 2019

If you have a GPU, it is not big enough to store the model + feature maps + output. You can see that the OOM happens on the final layer, when the output has the size 4800x4800x32. You can try to do inference on smaller images, or try splitting the model over multiple GPUs.

If you have more RAM than your GPU does, you can try doing inference on CPU (which will use RAM for allocating tensors). It will be slow but at least it will work. You can see here how to do this: link

@nerdogram
Copy link
Author

I see. Thanks for replying so quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants