Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am running out of memory on gpu #31

Closed
pythonmobile opened this issue Dec 27, 2020 · 3 comments
Closed

I am running out of memory on gpu #31

pythonmobile opened this issue Dec 27, 2020 · 3 comments

Comments

@pythonmobile
Copy link

Any ideas what I am doing wrong?

(I have provided set14 -> Set14/LRbicx4 and Set14/original for val parameter in options)
I also commented out pretrain_model_G in options (RRDB_PSNR_x4.pth - I do not have this)

20-12-26 19:11:14.119 - INFO: Random seed: 9
20-12-26 19:11:14.124 - INFO: Dataset [LRHRDataset - DIV2K] is created.
20-12-26 19:11:14.124 - INFO: Number of train images: 900, iters: 30
20-12-26 19:11:14.125 - INFO: Total epochs needed: 16667 for iters 500,000
20-12-26 19:11:14.125 - INFO: Dataset [LRHRDataset - Set14] is created.
20-12-26 19:11:14.125 - INFO: Number of val images in [Set14]: 14
20-12-26 19:11:14.275 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.317 - INFO: Initialization method [kaiming]
20-12-26 19:11:16.482 - INFO: Initialization method [kaiming]
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_h] will not optimize.
20-12-26 19:11:17.393 - WARNING: Params [module.get_g_nopadding.weight_v] will not optimize.
20-12-26 19:11:17.397 - INFO: Model [SPSRModel] is created.
20-12-26 19:11:17.397 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "train.py", line 182, in <module>
    main()
  File "train.py", line 105, in main
    model.optimize_parameters(current_step)
  File "/home/joe/prj/SPSR/code/models/SPSR_model.py", line 251, in optimize_parameters
    self.fake_H_branch, self.fake_H, self.grad_LR = self.netG(self.var_L)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/architecture.py", line 191, in forward
    x_f_cat = self.f_block(x_f_cat)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 229, in forward
    out = self.RDB3(out)
  File "/home/joe/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joe/prj/SPSR/code/models/modules/block.py", line 206, in forward
    x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0; 10.76 GiB total capacity; 6.11 GiB already allocated; 274.69 MiB free; 6.97 GiB reserved in total by PyTorch)
make: *** [Makefile:2: all] Error 1

Sometimes I see this after a few epochs.

@Maclory
Copy link
Owner

Maclory commented Dec 30, 2020

Hi, it seems that GPU memory is not enough. I think reducing the batch size may be a solution.

@pythonmobile
Copy link
Author

Thanks @Maclory. What card do you recommend for training? My input sizes are typically 256**2

@Maclory
Copy link
Owner

Maclory commented Jan 2, 2021

Do you mean GPU for "card"? Actually, I have no idea about the exact number of the required GPU memory. If the image size is too large for your GPU, I think you can try running the inference on CPU or reduce the input sizes.

@Maclory Maclory closed this as completed Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants