Training on a pre-trained model: RuntimeError: CUDA error: out of memory #238
Comments
if u use single gpu to train a model, u should change the |
As @zimenglan-sysu-512 pointed out, you are training on a single GPU with a batch size of 10, which is quite large in general. Try decreasing the batch size. |
Actually I also tried with this command line (setting
|
can you do import torch
print(torch.rand(1, device="cuda")) in your interpreter? |
Hm interesting, it reutrns
|
only
works |
It looks like there is a problem with your setup / gpu. Maybe a reboot would help? |
That was it, thank you! |
do you have a solution? |
I met the same question, I cant reboot it because its a public server. |
are we rebooting the gpu? how do you that safely? Is there a solution by chance? |
I solved this problem by rebooting the server. |
馃悰 Bug
I am launching training on a pretrained model and a 2 classes coco like dataset.
To Reproduce
Steps to reproduce the behavior:
python tools/train_net.py --config-file "configs/myconfig.yaml" SOLVER.IMS_PER_BATCH 10 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
Where myconfig.yaml points out to mymodel.pth like this:
WEIGHT: "/Users/karimimohammedbelhal/.torch/models/mymodel"
And mymodel.pth is a pre trained model with the right keys deleted as suggested in #15
Expected behavior
Training should start and complete.
Environment
PyTorch version: 1.0.0.dev20181123
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
Nvidia driver version: 396.51
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
Versions of relevant libraries:
[pip3] numpy (1.13.3)
[pip3] torch (0.4.1)
[pip3] torchvision (0.2.1)
[conda] pytorch-nightly 1.0.0.dev20181123 py3.7_cuda9.0.176_cudnn7.4.1_0 pytorch
Returned Error
The text was updated successfully, but these errors were encountered: