Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 12] Cannot allocate memory #5

Closed
nbansal90 opened this issue Jan 19, 2018 · 13 comments
Closed

OSError: [Errno 12] Cannot allocate memory #5

nbansal90 opened this issue Jan 19, 2018 · 13 comments

Comments

@nbansal90
Copy link

Hello,
I am getting Cannot allocate memory error;I understand this is something related to my GPU. But it is quite surprising that I should get this error because, I am training this on 3 1080TI GPUs, with a batch size of 64.

Traceback (most recent call last):
  File "train.py", line 162, in <module>
    train()
  File "train.py", line 113, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/usr/local/torch3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 310, in __iter__
    return DataLoaderIter(self)
  File "/usr/local/torch3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 167, in __init__
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

CUDA_VISIBLE_DEVICES=0,1,2 python train.py ~/DATASETS/cifar.python cifar10 -s ./snapshots --log ./logs --ngpu 3 --learning_rate 0.05 -b 64

Please suggest what I could do to avoid this issue.
Thank You!

@prlz77
Copy link
Owner

prlz77 commented Jan 19, 2018

OSError: [Errno 12] Cannot allocate memory sounds more like a RAM problem, not a GPU problem. Check you have enough RAM/SWAP, and the correct user permissions.

@nbansal90
Copy link
Author

nbansal90 commented Jan 19, 2018

Yeah Exactly!
Meanwhile, I collected the output of lspci command:(for NVIDIA 1080 TI)

bansa01@vita:~/pytorch_resnext/tmp$ lspci -v -s 89:00.0
89:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. Device 1470
        Flags: bus master, fast devsel, latency 0, IRQ 105
        Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 2ff80000000 (64-bit, prefetchable) [size=256M]
        Memory at 2ff90000000 (64-bit, prefetchable) [size=32M]
        I/O ports at b000 [size=128]
        [virtual] Expansion ROM at f5000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384

Does this give us any information as to where we might be going wrong. Can I change anything myself,(given that I have root permission)which could help me prevent this issue.

@ZhuFengdaaa
Copy link

ZhuFengdaaa commented Apr 4, 2018

Have you fixed that ? I am facing the same issue.

@penguinshin
Copy link

I am also running into the same problem, although I am running everything on a CPU. I have more than enough memory (the error occurs when I'm using only 10G out of 32G)

@ZhuFengdaaa
Copy link

@penguinshin I fix this bug by adding 64G swap memory. When data loader forks workers to load data, the memory increases rapidly. You can try setting num_workers=1 first, then try allocating larger swap space.

@prlz77
Copy link
Owner

prlz77 commented Apr 5, 2018

Hi! As @ZhuFengdaaa confirms, it seems a peak memory problem although I am not able to reproduce it. Again as @ZhuFengdaaa suggests, this seems to be linked with the number of threads (also see https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/6).

@prlz77
Copy link
Owner

prlz77 commented Apr 5, 2018

Another related thread ruotianluo/self-critical.pytorch#11

@prlz77 prlz77 closed this as completed May 10, 2018
@henryccl
Copy link

OSError: [Errno 12] Cannot allocate memory sounds more like a RAM problem, not a GPU problem. Check you have enough RAM/SWAP, and the correct user permissions.

Why does this have to do with permissions?and what should i do with permissions?

@rajatagrawal193
Copy link

**Fixed this ** by allocating 4G swap memory. You can try allocating more memory if 4G does not suffice.
Follow this blog to allocate swap memory on your device.
https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

@monacv
Copy link

monacv commented Feb 18, 2021

Fixed the problem by allocating 64GB of swap memory from the external disk.

@BlueskyFR
Copy link

BlueskyFR commented Jul 13, 2022

Why would some swap be needed? It slows down everything
I am having the same problem while trying to allocate 30 GB even though I have 1 TB free...

@usamec
Copy link

usamec commented Jul 21, 2022

Fix is almost always:
echo 1 > /proc/sys/vm/overcommit_memory

https://stackoverflow.com/a/52311756/1391392

@jwr0218
Copy link

jwr0218 commented Jun 26, 2023

This problem is come from CPU memory allocation. check CPU Ram Memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants