You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tue Nov 28 23:18:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 58C P8 10W / 60W | 9MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2686 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
I have encountered an error while trying to train my first model on YOLO3D model.
I have simply followed instruction in docs/mono3d.md
After entering this command: ./launchers/train.sh config/CONFIG_FILE_YOLO.py 0 proba program crashed with error below:
Traceback (most recent call last):
File "/home/einhart/visualDet3D/scripts/train.py", line 199, in <module>
Fire(main)
File "/home/einhart/anaconda3/envs/visualDet3D/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/einhart/anaconda3/envs/visualDet3D/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/einhart/anaconda3/envs/visualDet3D/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/einhart/visualDet3D/scripts/train.py", line 150, in main
training_dection(data, detector, optimizer, writer, training_loss_logger, global_step, epoch_num, cfg)
File "/home/einhart/visualDet3D/visualDet3D/networks/pipelines/trainers.py", line 35, in train_mono_detection
classification_loss, regression_loss, loss_dict = module(
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/visualDet3D/visualDet3D/networks/detectors/yolomono3d_detector.py", line 126, in forward
return self.training_forward(img_batch, annotations, calib)
File "/home/einhart/visualDet3D/visualDet3D/networks/detectors/yolomono3d_detector.py", line 91, in training_forward
features = self.core(dict(image=img_batch, P2=P2))
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/visualDet3D/visualDet3D/networks/detectors/yolomono3d_core.py", line 16, in forward
x = self.backbone(x['image'])
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/visualDet3D/visualDet3D/networks/backbones/resnet.py", line 195, in forward
x = layer(x)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward
input = module(input)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/visualDet3D/visualDet3D/networks/backbones/resnet.py", line 82, in forward
out = self.conv3(out)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/einhart/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacty of 3.81 GiB of which 6.31 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 3.60 GiB is allocated by PyTorch, and 89.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have tried reducing batch_size value from 8 to 2 in file CONFIG_FILE_YOLO.py which was generated by this command: cp Yolo3D_example $CONFIG_FILE.py
Have you got any idea how to reduce allocated memory ?
The text was updated successfully, but these errors were encountered:
I have not tried training the network with 4GB of memory. You could try modifying the backbone to Resnet50. Or further minimizing the batch size to 1 (you may need to tune learning rate after this).
But it will be difficult to reproduce the full result.
Dist: Pop OS 22.04
nvidia-smi output:
nvcc --version output:
I have encountered an error while trying to train my first model on YOLO3D model.
I have simply followed instruction in docs/mono3d.md
After entering this command:
./launchers/train.sh config/CONFIG_FILE_YOLO.py 0 proba
program crashed with error below:I have tried reducing batch_size value from 8 to 2 in file CONFIG_FILE_YOLO.py which was generated by this command:
cp Yolo3D_example $CONFIG_FILE.py
Have you got any idea how to reduce allocated memory ?
The text was updated successfully, but these errors were encountered: