Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: out of memory #19

Closed
albertchristian92 opened this issue Jan 11, 2021 · 2 comments
Closed

RuntimeError: CUDA error: out of memory #19

albertchristian92 opened this issue Jan 11, 2021 · 2 comments

Comments

@albertchristian92
Copy link

albertchristian92 commented Jan 11, 2021

Hi, Thank you for your works. Actually, I am interested in this works, but when I tried to start training your code using Docker, I met a problem RuntimeError: CUDA error: out of memory as shown here:
outofmemory

I am using Multi GPU GeForce GTX 1080 as following:
nvidia-smi

Here, how I run your code:

python3 main.py
ddd
--exp_id centerfusion
--shuffle_train
--train_split mini_train
--val_split mini_val
--val_intervals 1
--run_dataset_eval
--nuscenes_att
--velocity
--batch_size 4
--lr 2.5e-4
--num_epochs 60
--lr_step 50
--save_point 20,40,50
--gpus 0,2,3
--not_rand_crop
--flip 0.5
--shift 0.1
--pointcloud
--radar_sweeps 6
--pc_z_offset 0.0
--pillar_dims 1.0,0.2,0.2
--max_pc_dist 60.0
--num_workers 0
--load_model ../models/centerfusion_e60.pth \

Please give any suggestion regarding this issue. Thank you very much.

@fabrizioschiano
Copy link
Contributor

fabrizioschiano commented Oct 8, 2021

Hi @albertchristian92 , how did you solve this problem?

My solution was to decrease the batch_size parameter to 4 but I see that you have already did it.

@fabrizioschiano
Copy link
Contributor

The train.sh is now running and I have the following configuration

nvidia-smi
Fri Oct  8 17:22:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 470.74       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   68C    P0    74W /  N/A |   6350MiB /  7973MiB |     93%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1043      G   /usr/lib/xorg/Xorg                102MiB |
|    0   N/A  N/A      1656      G   /usr/lib/xorg/Xorg                442MiB |
|    0   N/A  N/A      1787      G   /usr/bin/gnome-shell               54MiB |
|    0   N/A  N/A     25109      G   ...Webex/bin/CiscoCollabHost       20MiB |
|    0   N/A  N/A     88863      G   .../debug.log --shared-files       13MiB |
|    0   N/A  N/A     88929      G   ...AAAAAAAAA= --shared-files      117MiB |
|    0   N/A  N/A    113326      C   python                           5579MiB |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants