getting memory error on Tesla K80 #23

sujit420 · 2018-06-11T11:47:38Z

getting error on loading:
Traceback (most recent call last):
File "main.py", line 33, in
train_dset = VQAFeatureDataset('train', dictionary)
File "/home/sujitmishra/bottom-up-attention-vqa/dataset.py", line 120, in init
self.features = np.array(hf.get('image_features'))
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/sujitmishra/py2/local/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 690, in array
arr = numpy.empty(self.shape, dtype=self.dtype if dtype is None else dtype)
MemoryError

How much gpu does it need for training?

ZhuFengdaaa · 2018-06-11T15:21:34Z

The problem is you need more Memory and swap space, ~~which must be added up to at least 50G~~.
Correctness: total memory must be added up to at least 80G.

sujit420 · 2018-06-12T01:32:59Z

Thanks for your response @ZhuFengdaaa . I increased my swap space with total of more than 50G.
Getting other issue now:
Traceback (most recent call last):
File "main.py", line 45, in
train(model, train_loader, eval_loader, args.epochs, args.output)
File "/home/sujitmishra/bottom-up-attention-vqa/train.py", line 36, in train
for i, (v, b, q, a) in enumerate(train_loader):
File "/home/sujitmishra/py2/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 417, in iter
return DataLoaderIter(self)
File "/home/sujitmishra/py2/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 234, in init
w.start()
File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

How much memory does it need to train? Or do we have pertained model for doing evaluation/inference?

ZhuFengdaaa · 2018-06-13T06:26:15Z

I use htop to check the training process, and the virtual memory cost is 77.2G. You can try that again. I encountered exactly the same bug OSError: [Errno 12] Cannot allocate memory. It can be solved if you have enough virtual memory.

I have 40G physical memory so I just created 50G swap. You might need more.

sujit420 · 2018-06-13T12:03:41Z

Thanks a lot @ZhuFengdaaa . Training is pretty slow, but its running..
Will ask you if I encounter further errors.

DaddyWesker · 2018-06-30T07:03:06Z

So, if i got it right, i need to create more virtual memory? Can you tell me how to do that and as i see i need >50 gb virtual memory?

YuanEZhou · 2018-11-28T09:26:43Z

I encountered the same problem and fixed it by modifying the dataset.py file like following:
dataset.py.txt

division par 10 de la RAM utilisée: passe de ~90GB à 9GB. solution proposée par YuanEZhou ici: hengyuan-hu/bottom-up-attention-vqa#23

sujit420 closed this as completed Jun 13, 2018

ZhuFengdaaa mentioned this issue Jun 18, 2018

Cannot allocate memory #25

Closed

ZhuFengdaaa mentioned this issue Jun 30, 2018

Memory problem while training process SinghJasdeep/Attention-on-Attention-for-VQA#3

Closed

ArnaudVella added a commit to ArnaudVella/bottom-up-vqa that referenced this issue Jul 20, 2019

Update dataset.py

7eb0cd6

division par 10 de la RAM utilisée: passe de ~90GB à 9GB. solution proposée par YuanEZhou ici: hengyuan-hu/bottom-up-attention-vqa#23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting memory error on Tesla K80 #23

getting memory error on Tesla K80 #23

sujit420 commented Jun 11, 2018 •

edited

ZhuFengdaaa commented Jun 11, 2018 •

edited

sujit420 commented Jun 12, 2018 •

edited

ZhuFengdaaa commented Jun 13, 2018

sujit420 commented Jun 13, 2018

DaddyWesker commented Jun 30, 2018

YuanEZhou commented Nov 28, 2018

getting memory error on Tesla K80 #23

getting memory error on Tesla K80 #23

Comments

sujit420 commented Jun 11, 2018 • edited

ZhuFengdaaa commented Jun 11, 2018 • edited

sujit420 commented Jun 12, 2018 • edited

ZhuFengdaaa commented Jun 13, 2018

sujit420 commented Jun 13, 2018

DaddyWesker commented Jun 30, 2018

YuanEZhou commented Nov 28, 2018

sujit420 commented Jun 11, 2018 •

edited

ZhuFengdaaa commented Jun 11, 2018 •

edited

sujit420 commented Jun 12, 2018 •

edited