Question: RuntimeError: CUDA out of memory. #7

nyxrobotics · 2022-12-26T05:12:35Z

Please excuse the basic question and not the problem with this repository.
I want labelme annotation on 1920x1080 image.
However, active learning runs out of GPU memory.
So I want to change the resolution to 640x360 only when learning.
Can you give me some advice where to change?

[12/26 13:00:27 d2.data.datasets.coco]: Loaded 5 images in COCO format from ./noodle2/datasets/train.json
[12/26 13:00:27 d2.data.build]: Removed 0 images with no usable annotations. 5 images left.
[12/26 13:00:27 d2.data.build]: Distribution of instances among all 3 categories:
|   category    | #instances   |   category    | #instances   |   category    | #instances   |
|:-------------:|:-------------|:-------------:|:-------------|:-------------:|:-------------|
| curry_noodles | 28           | seafood_noo.. | 29           | soy_sauce_n.. | 0            |
|               |              |               |              |               |              |
|     total     | 57           |               |              |               |              |
[12/26 13:00:27 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[12/26 13:00:27 d2.data.build]: Using training sampler TrainingSampler
[12/26 13:00:27 d2.data.common]: Serializing 5 elements to byte tensors and concatenating them all ...
[12/26 13:00:27 d2.data.common]: Serialized dataset takes 0.01 MiB
Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (81, 1024) in the checkpoint but (4, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (4,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.weight' to the model due to incompatible shapes: (320, 1024) in the checkpoint but (12, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.bias' to the model due to incompatible shapes: (320,) in the checkpoint but (12,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.weight' to the model due to incompatible shapes: (80, 256, 1, 1) in the checkpoint but (3, 256, 1, 1) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (3,) in the model! You might want to double check if this is expected.
[12/26 13:00:28 d2.engine.train_loop]: Starting training from iteration 0
ERROR [12/26 13:00:29 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 140, in train
    self.run_step()
  File "/home/numai/lib/maskal/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 242, in run_step
    losses.backward()
  File "/home/numai/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/numai/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 5.81 GiB total capacity; 3.16 GiB already allocated; 96.25 MiB free; 3.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Here is my nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P5    14W /  N/A |    837MiB /  5946MiB |     26%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

nyxrobotics · 2022-12-26T05:51:30Z

I got same error even after resizing the image to 360p.
Is there a way to reduce GPU memory usage other than image size?

[12/26 14:49:07 d2.engine.train_loop]: Starting training from iteration 0
ERROR [12/26 14:49:08 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 140, in train
    self.run_step()
  File "/home/numai/lib/maskal/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 242, in run_step
    losses.backward()
  File "/home/numai/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/numai/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 5.81 GiB total capacity; 3.61 GiB already allocated; 77.75 MiB free; 3.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[12/26 14:49:08 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[12/26 14:49:08 d2.utils.events]:  iter: 1  total_loss: 3.166  loss_cls: 1.695  loss_box_reg: 0.6886  loss_mask: 0.6974  loss_rpn_cls: 0.05768  loss_rpn_loc: 0.02732  data_time: 0.0986  lr: 1e-05  max_mem: 3713M

nyxrobotics · 2022-12-26T05:57:26Z

I have verified that torch recognizes the cuda device

>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.current_device())
0
>>> print(torch.cuda.get_device_name())
NVIDIA GeForce RTX 3060 Laptop GPU
>>> print(torch.cuda.get_device_capability())
(8, 6)

pieterblok · 2022-12-26T07:56:34Z

I got same error even after resizing the image to 360p.

Is there a way to reduce GPU memory usage other than image size?

[12/26 14:49:07 d2.engine.train_loop]: Starting training from iteration 0

ERROR [12/26 14:49:08 d2.engine.train_loop]: Exception during training:

Traceback (most recent call last):

  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 140, in train

    self.run_step()

  File "/home/numai/lib/maskal/detectron2/engine/defaults.py", line 441, in run_step

    self._trainer.run_step()

  File "/home/numai/lib/maskal/detectron2/engine/train_loop.py", line 242, in run_step

    losses.backward()

  File "/home/numai/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward

    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

  File "/home/numai/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward

    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 5.81 GiB total capacity; 3.61 GiB already allocated; 77.75 MiB free; 3.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

[12/26 14:49:08 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)

[12/26 14:49:08 d2.utils.events]:  iter: 1  total_loss: 3.166  loss_cls: 1.695  loss_box_reg: 0.6886  loss_mask: 0.6974  loss_rpn_cls: 0.05768  loss_rpn_loc: 0.02732  data_time: 0.0986  lr: 1e-05  max_mem: 3713M

@nyxrobotics that's unfortunate. Quite honestly, I've never checked maskal or detectron2 with a GPU lower than 8 GB memory. A 6 GB GPU might be too small for detectron2.

What you can do, besides lowering image resolution, is to use a less big Mask R-CNN architecture, like the one with Resnet50. Change in maskal.yaml both the network_config and pretrained_weights to a Mask R-CNN path with Resnet50 backbone, for example: mask_rcnn_R_50_FPN_3x.yaml

That might help to reduce GPU memory load...

nyxrobotics · 2022-12-26T08:25:44Z

mask_rcnn_R_50_FPN_3x.yaml worked for me.
Thank you!

nyxrobotics closed this as completed Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: RuntimeError: CUDA out of memory. #7

Question: RuntimeError: CUDA out of memory. #7

nyxrobotics commented Dec 26, 2022

nyxrobotics commented Dec 26, 2022

nyxrobotics commented Dec 26, 2022

pieterblok commented Dec 26, 2022 •

edited

nyxrobotics commented Dec 26, 2022

Question: RuntimeError: CUDA out of memory. #7

Question: RuntimeError: CUDA out of memory. #7

Comments

nyxrobotics commented Dec 26, 2022

nyxrobotics commented Dec 26, 2022

nyxrobotics commented Dec 26, 2022

pieterblok commented Dec 26, 2022 • edited

nyxrobotics commented Dec 26, 2022

pieterblok commented Dec 26, 2022 •

edited