New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: RuntimeError: CUDA out of memory. #7
Comments
I got same error even after resizing the image to 360p. [12/26 14:49:07 d2.engine.train_loop]: Starting training from iteration 0
ERROR [12/26 14:49:08 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 140, in train
self.run_step()
File "/home/numai/lib/maskal/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 242, in run_step
losses.backward()
File "/home/numai/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/numai/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 5.81 GiB total capacity; 3.61 GiB already allocated; 77.75 MiB free; 3.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[12/26 14:49:08 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[12/26 14:49:08 d2.utils.events]: iter: 1 total_loss: 3.166 loss_cls: 1.695 loss_box_reg: 0.6886 loss_mask: 0.6974 loss_rpn_cls: 0.05768 loss_rpn_loc: 0.02732 data_time: 0.0986 lr: 1e-05 max_mem: 3713M |
I have verified that torch recognizes the cuda device >>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.current_device())
0
>>> print(torch.cuda.get_device_name())
NVIDIA GeForce RTX 3060 Laptop GPU
>>> print(torch.cuda.get_device_capability())
(8, 6) |
@nyxrobotics that's unfortunate. Quite honestly, I've never checked maskal or detectron2 with a GPU lower than 8 GB memory. A 6 GB GPU might be too small for detectron2. What you can do, besides lowering image resolution, is to use a less big Mask R-CNN architecture, like the one with Resnet50. Change in maskal.yaml both the network_config and pretrained_weights to a Mask R-CNN path with Resnet50 backbone, for example: mask_rcnn_R_50_FPN_3x.yaml That might help to reduce GPU memory load... |
mask_rcnn_R_50_FPN_3x.yaml worked for me. |
Please excuse the basic question and not the problem with this repository.
I want labelme annotation on 1920x1080 image.
However, active learning runs out of GPU memory.
So I want to change the resolution to 640x360 only when learning.
Can you give me some advice where to change?
Here is my nvidia-smi
The text was updated successfully, but these errors were encountered: