I am running out of memory on gpu #31

pythonmobile · 2020-12-27T00:16:11Z

Any ideas what I am doing wrong?

(I have provided set14 -> Set14/LRbicx4 and Set14/original for val parameter in options)
I also commented out pretrain_model_G in options (RRDB_PSNR_x4.pth - I do not have this)

20-12-26 19:11:14.119 - INFO: Random seed: 9
20-12-26 19:11:14.124 - INFO: Dataset [LRHRDataset - DIV2K] is created.
20-12-26 19:11:14.124 - INFO: Number of train images: 900, iters: 30
20-12-26 19:11:14.125 - INFO: Total epochs needed: 16667 for iters 500,000
20-12-26 19:11:14.125 - INFO: Dataset [LRHRDataset - Set14] is created.
20-12-26 19:11:14.125 - INFO: Number of val images in [Set14]: 14
20-12-26 19:11:14.275 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.317 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.482 - INFO: Initialization method [kaiming]
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_h] will not optimize.
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_v] will not optimize.
20-12-26 19:11:17.397 - INFO: Model [SPSRModel] is created.
20-12-26 19:11:17.397 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "train.py", line 182, in <module>
    main()
  File "train.py", line 105, in main
    model.optimize_parameters(current_step)
  File "/home/joe/prj/SPSR/code/models/SPSR_model.py", line 251, in optimize_parameters
    self.fake_H_branch, self.fake_H, self.grad_LR = self.netG(self.var_L)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/architecture.py", line 191, in forward
    x_f_cat = self.f_block(x_f_cat)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 229, in forward
    out = self.RDB3(out)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 206, in forward
    x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0; 10.76 GiB total capacity; 6.11 GiB already allocated; 274.69 MiB free; 6.97 GiB reserved in total by PyTorch)
make: *** [Makefile:2: all] Error 1

Sometimes I see this after a few epochs.

The text was updated successfully, but these errors were encountered:

Maclory · 2020-12-30T09:06:22Z

Hi, it seems that GPU memory is not enough. I think reducing the batch size may be a solution.

pythonmobile · 2021-01-01T19:11:02Z

Thanks @Maclory. What card do you recommend for training? My input sizes are typically 256**2

Maclory · 2021-01-02T15:46:29Z

Do you mean GPU for "card"? Actually, I have no idea about the exact number of the required GPU memory. If the image size is too large for your GPU, I think you can try running the inference on CPU or reduce the input sizes.

Maclory closed this as completed Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am running out of memory on gpu #31

I am running out of memory on gpu #31

pythonmobile commented Dec 27, 2020

Maclory commented Dec 30, 2020

pythonmobile commented Jan 1, 2021

Maclory commented Jan 2, 2021

I am running out of memory on gpu #31

I am running out of memory on gpu #31

Comments

pythonmobile commented Dec 27, 2020

Maclory commented Dec 30, 2020

pythonmobile commented Jan 1, 2021

Maclory commented Jan 2, 2021