Got 'RuntimeError: CUDA out of memory' #63

ntyoshi · 2021-03-19T07:52:23Z

Hi there,

I tried bash launch_dns.sh with the default parameters you gave us but I got the error messages below:

$ bash launch_dns.sh 
[2021-03-19 00:39:32,334][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_demucs.causal=1,demucs.hidden=64,demucs.resample=4,dset=dns
[2021-03-19 00:39:35,850][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-03-19 00:39:35,850][denoiser.solver][INFO] - Training...
Warning: Error detected in GluBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:42)
[2021-03-19 00:39:38,123][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 226, in _run_one_epoch
    loss.backward()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.96 GiB (GPU 0; 23.65 GiB total capacity; 20.24 GiB already allocated; 1.53 GiB free; 21.27 GiB reserved in total by PyTorch) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:289)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f79e7c5d536 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1cf1e (0x7f79e7ea6f1e in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1df9e (0x7f79e7ea7f9e in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: THCStorage_resize + 0x96 (0x7f79e911c3d6 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: THCTensor_resizeNd + 0x441 (0x7f79e912d591 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: THNN_CudaGatedLinear_updateGradInput + 0x100 (0x7f79e99dacb0 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x100a4f6 (0x7f79e90c34f6 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xf95376 (0x7f79e904e376 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x10c25b3 (0x7f7a2598d5b3 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x2d39136 (0x7f7a27604136 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x10c25b3 (0x7f7a2598d5b3 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::generated::GluBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x19d (0x7f7a2716f5cd in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x2d89705 (0x7f7a27654705 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f7a27651a03 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7f7a276527e2 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f7a2764ae59 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f7a33f92ac8 in /home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #17: <unknown function> + 0xbd6df (0x7f7a34e426df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #18: <unknown function> + 0x76db (0x7f7a37a466db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #19: clone + 0x3f (0x7f7a3776f71f in /lib/x86_64-linux-gnu/libc.so.6)

The GPU I'm using is QuadroRTX 6000 (memory size: 2.4GB).
I tried to use 2 GPUs, and batch_size=1 and segment=1 (I've seen #19 ) but I got same error.

Is it possible?
I'd like you to give me some advices to solve this error.
Thank you!

The text was updated successfully, but these errors were encountered:

adefossez · 2021-03-19T10:06:24Z

Are the GPU completely empty when you start training ? 24GB should be sufficient especially with a small segment or batch size.

ntyoshi · 2021-03-19T18:05:03Z

Technically it uses only 4MB for some process but I guess it won't affect the computation.
I could actually complete launch_valentini.sh instead so I wonder some settings or parameters of dns script or the dataset itself are wrong.
Can you guess the reason from the error message?

adefossez · 2021-04-06T09:36:14Z

when you said you tried with batch_size=1, did you edit it in the script or the original config file ? you need to edit that in the script as it will get overridden otherwise. If you pass verbose=1 in the script, this will also print more debug information that can prove useful for me to help you. Sorry for not replying sooner.

ntyoshi · 2021-04-08T07:13:02Z

I edited launch_dns.sh and config file, then it did work! I set batch_size=50.
One thing I'm worried is that compared to Valentini (batch_size=128), the training speed is very very slow (it took 6405.02s for one epoch).
Those hardware environments are completely same and I used default settings of launch_dns.sh and launch_valentini.sh except for batch_size of the dns case.
I guess the difference of speed would be caused by dataset but does it look weird?

adefossez · 2021-04-08T10:37:09Z

Yes epoch size is big on the DNS dataset, the dataset is pretty large. I think we were training on 8 to 16 GPUs, and already it was taking a few days to fully converge, so on 2 GPUs I would expect it to be even slower.

ntyoshi · 2021-04-08T16:17:31Z

@adefossez
I see. Thank you!

ntyoshi closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got 'RuntimeError: CUDA out of memory' #63

Got 'RuntimeError: CUDA out of memory' #63

ntyoshi commented Mar 19, 2021

adefossez commented Mar 19, 2021

ntyoshi commented Mar 19, 2021

adefossez commented Apr 6, 2021

ntyoshi commented Apr 8, 2021 •

edited

Loading

adefossez commented Apr 8, 2021

ntyoshi commented Apr 8, 2021

Got 'RuntimeError: CUDA out of memory' #63

Got 'RuntimeError: CUDA out of memory' #63

Comments

ntyoshi commented Mar 19, 2021

adefossez commented Mar 19, 2021

ntyoshi commented Mar 19, 2021

adefossez commented Apr 6, 2021

ntyoshi commented Apr 8, 2021 • edited Loading

adefossez commented Apr 8, 2021

ntyoshi commented Apr 8, 2021

ntyoshi commented Apr 8, 2021 •

edited

Loading