Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much memory is needed for infer? #51

Open
huangluyao opened this issue Jun 7, 2021 · 3 comments
Open

How much memory is needed for infer? #51

huangluyao opened this issue Jun 7, 2021 · 3 comments

Comments

@huangluyao
Copy link

My graphics boards is gtx1660ti, memory 6G.
I run this code to report an error:
RuntimeError: CUDA out of memory. Tried to allocate 1.00 GiB (GPU 0; 5.81 GiB total capacity; 2.90 GiB already allocated; 420.50 MiB free; 3.84 GiB reserved in total by PyTorch)

@selimlouis
Copy link

I have a similar problem. Just want to test the whole thing with my gtx 970 memory 4G.

I get:

Traceback (most recent call last):
  File "fsod_train_net.py", line 118, in <module>
    args=(args,),
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "fsod_train_net.py", line 106, in main
    return trainer.train()
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 431, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 138, in train
    self.run_step()
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
    loss_dict = self.model(data)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selim/FewShot/FewX/fewx/modeling/fsod/fsod_rcnn.py", line 153, in forward
    support_features = self.backbone(support_images)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 444, in forward
    x = self.stem(x)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 355, in forward
    x = self.conv1(x)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 88, in forward
    x = self.norm(x)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/layers/batch_norm.py", line 65, in forward
    eps=self.eps,
  File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 1000.00 MiB (GPU 0; 3.94 GiB total capacity; 2.15 GiB already allocated; 340.25 MiB free; 2.79 GiB reserved in total by PyTorch)

I tried halving the BATCH_SIZE_PER_IMAGE and IMS_PER_BATCH settings in the config but I still get memory problems. I dont want to make them too small, I think it would lead to bad results. Not an expert though.

Did anyone find a solution?

@selimlouis
Copy link

Ok so I continued trying to get it to work.

I found success when setting the SOLVER.IMS_PER_BATCH to 1 in configs/fsod/Base-FSOD-C4.yaml

I did not run a complete training process since it would have taken me 2 days and 11 hours, but it started training without issues.
Hope this helps someone else too

@xiaohei1001
Copy link

It depends on your support set. Maybe you can try to make RPN.POST_NMS_TOPK_TEST small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants