OSError: [Errno 12] Cannot allocate memory #5

nbansal90 · 2018-01-19T08:34:36Z

Hello,
I am getting Cannot allocate memory error;I understand this is something related to my GPU. But it is quite surprising that I should get this error because, I am training this on 3 1080TI GPUs, with a batch size of 64.

Traceback (most recent call last):
  File "train.py", line 162, in <module>
    train()
  File "train.py", line 113, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/usr/local/torch3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 310, in __iter__
    return DataLoaderIter(self)
  File "/usr/local/torch3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 167, in __init__
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

CUDA_VISIBLE_DEVICES=0,1,2 python train.py ~/DATASETS/cifar.python cifar10 -s ./snapshots --log ./logs --ngpu 3 --learning_rate 0.05 -b 64

Please suggest what I could do to avoid this issue.
Thank You!

The text was updated successfully, but these errors were encountered:

prlz77 · 2018-01-19T08:56:25Z

OSError: [Errno 12] Cannot allocate memory sounds more like a RAM problem, not a GPU problem. Check you have enough RAM/SWAP, and the correct user permissions.

nbansal90 · 2018-01-19T09:35:45Z

Yeah Exactly!
Meanwhile, I collected the output of lspci command:(for NVIDIA 1080 TI)

bansa01@vita:~/pytorch_resnext/tmp$ lspci -v -s 89:00.0
89:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. Device 1470
        Flags: bus master, fast devsel, latency 0, IRQ 105
        Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 2ff80000000 (64-bit, prefetchable) [size=256M]
        Memory at 2ff90000000 (64-bit, prefetchable) [size=32M]
        I/O ports at b000 [size=128]
        [virtual] Expansion ROM at f5000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384

Does this give us any information as to where we might be going wrong. Can I change anything myself,(given that I have root permission)which could help me prevent this issue.

ZhuFengdaaa · 2018-04-04T02:16:42Z

Have you fixed that ? I am facing the same issue.

penguinshin · 2018-04-04T21:45:26Z

I am also running into the same problem, although I am running everything on a CPU. I have more than enough memory (the error occurs when I'm using only 10G out of 32G)

ZhuFengdaaa · 2018-04-05T05:39:46Z

@penguinshin I fix this bug by adding 64G swap memory. When data loader forks workers to load data, the memory increases rapidly. You can try setting num_workers=1 first, then try allocating larger swap space.

prlz77 · 2018-04-05T20:42:21Z

Hi! As @ZhuFengdaaa confirms, it seems a peak memory problem although I am not able to reproduce it. Again as @ZhuFengdaaa suggests, this seems to be linked with the number of threads (also see https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/6).

prlz77 · 2018-04-05T20:44:02Z

Another related thread ruotianluo/self-critical.pytorch#11

henryccl · 2018-12-31T02:02:44Z

OSError: [Errno 12] Cannot allocate memory sounds more like a RAM problem, not a GPU problem. Check you have enough RAM/SWAP, and the correct user permissions.

Why does this have to do with permissions？and what should i do with permissions?

rajatagrawal193 · 2019-07-15T07:32:49Z

**Fixed this ** by allocating 4G swap memory. You can try allocating more memory if 4G does not suffice.
Follow this blog to allocate swap memory on your device.
https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

monacv · 2021-02-18T01:38:45Z

Fixed the problem by allocating 64GB of swap memory from the external disk.

BlueskyFR · 2022-07-13T17:19:29Z

Why would some swap be needed? It slows down everything
I am having the same problem while trying to allocate 30 GB even though I have 1 TB free...

usamec · 2022-07-21T06:29:33Z

Fix is almost always:
echo 1 > /proc/sys/vm/overcommit_memory

https://stackoverflow.com/a/52311756/1391392

jwr0218 · 2023-06-26T10:57:53Z

This problem is come from CPU memory allocation. check CPU Ram Memory

prlz77 closed this as completed May 10, 2018

neonb88 mentioned this issue Aug 2, 2019

OSError: [Errno 12] Cannot allocate memory chenyuntc/simple-faster-rcnn-pytorch#108

Open

svlandeg mentioned this issue Oct 4, 2019

Cannot allocate memory while training with cli on GPU explosion/spaCy#4026

Closed

JoPfeiff mentioned this issue Mar 2, 2021

training the language adapters in the MAD-X paper adapter-hub/adapters#125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError: [Errno 12] Cannot allocate memory #5

OSError: [Errno 12] Cannot allocate memory #5

nbansal90 commented Jan 19, 2018

prlz77 commented Jan 19, 2018

nbansal90 commented Jan 19, 2018 •

edited

Loading

ZhuFengdaaa commented Apr 4, 2018 •

edited

Loading

penguinshin commented Apr 4, 2018

ZhuFengdaaa commented Apr 5, 2018

prlz77 commented Apr 5, 2018

prlz77 commented Apr 5, 2018

henryccl commented Dec 31, 2018

rajatagrawal193 commented Jul 15, 2019

monacv commented Feb 18, 2021

BlueskyFR commented Jul 13, 2022 •

edited

Loading

usamec commented Jul 21, 2022 •

edited

Loading

jwr0218 commented Jun 26, 2023

OSError: [Errno 12] Cannot allocate memory #5

OSError: [Errno 12] Cannot allocate memory #5

Comments

nbansal90 commented Jan 19, 2018

prlz77 commented Jan 19, 2018

nbansal90 commented Jan 19, 2018 • edited Loading

ZhuFengdaaa commented Apr 4, 2018 • edited Loading

penguinshin commented Apr 4, 2018

ZhuFengdaaa commented Apr 5, 2018

prlz77 commented Apr 5, 2018

prlz77 commented Apr 5, 2018

henryccl commented Dec 31, 2018

rajatagrawal193 commented Jul 15, 2019

monacv commented Feb 18, 2021

BlueskyFR commented Jul 13, 2022 • edited Loading

usamec commented Jul 21, 2022 • edited Loading

jwr0218 commented Jun 26, 2023

nbansal90 commented Jan 19, 2018 •

edited

Loading

ZhuFengdaaa commented Apr 4, 2018 •

edited

Loading

BlueskyFR commented Jul 13, 2022 •

edited

Loading

usamec commented Jul 21, 2022 •

edited

Loading