Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM kill on pip install #1022

Closed
ezyang opened this issue Mar 17, 2017 · 12 comments
Closed

OOM kill on pip install #1022

ezyang opened this issue Mar 17, 2017 · 12 comments

Comments

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2017

When I attempt to install Torch with pip, the process gets OOM killed:

ezyang@sabre:~/Dev$ pip install https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl 
Collecting torch==0.1.10.post2 from https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl
  Downloading https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl (360.3MB)
    99% |████████████████████████████████| 360.3MB 32.1MB/s eta 0:00:01Killed

dmesg:

[326093.958653] Out of memory: Kill process 14093 (pip) score 452 or sacrifice child
[326093.958669] Killed process 14093 (pip) total-vm:5029612kB, anon-rss:4217296kB, file-rss:4kB

Feel free to close this if it is by design that 4G is not enough memory to install Torch (yes I don't have very much memory) but perhaps there is something here worth investigating?

@soumith
Copy link
Member

soumith commented Mar 17, 2017

thanks for reporting this.

We ship binaries, which means that all pip is doing is: unzipping the package on your computer and moving files into the right location.

With this context, there is very little I see I can do from the package side to improve this.

When I get time, I can try to simulate this in a limited memory environment and try to find out the exact reasons and report something to the pip folks upstream, but as the task is not really actionable from my side, I'll close this issue.

@soumith soumith closed this as completed Mar 17, 2017
@diwu1989
Copy link

diwu1989 commented Sep 7, 2017

You can wget the whl package locally and then run pip install on it, that seems to use less memory

@rkingery
Copy link

I'm still seeing this problem when trying to pip install torch inside the base Ubuntu Docker container. It'll get 99% installed and then kill. Other packages install fine.

@vikramriyer
Copy link

I am facing the exact issue as one faced by @rkingery.

@heiner
Copy link

heiner commented Apr 8, 2019

Same issue here.

Can be fixed by increasing Docker memory (e.g. https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container).

@diwu1989's comment is interesting though: Is there a less demanding way of downloading and installing PyTorch than pip?

@heiner
Copy link

heiner commented Apr 12, 2019

(Turns out a pip download torch followed by pip install torch*.whl does not go OOM for me.)

@wpietri
Copy link

wpietri commented Aug 22, 2020

In case others need a workaround, pip townload torch OOMed for me. Instead I had to:

This suggests to me that the problem here is a pip bug; it must be allocating a lot of memory when it apparently doesn't need to.

@perber
Copy link

perber commented Oct 6, 2020

I got the same issue. Another way to install the package is to use the --no-cache-dir option.
It worked on our environment.
pip --no-cache-dir install torch
Hopefully this helps some one.

@Horsmann
Copy link

Same problem @perber's solution worked for me in my docker container

@espears1
Copy link

espears1 commented Nov 5, 2020

I am experiencing this same problem but --no-cache-dir does not solve it.

@bthiban
Copy link

bthiban commented May 3, 2021

RUN pip install -r requirements.txt --no-cache-dir when your torch is inside requirements.txt

@espears1
Copy link

espears1 commented May 3, 2021

@bthiban I mentioned that in #1022 (comment) but unfortunately is did not solve the problem.

jjsjann123 added a commit to jjsjann123/pytorch that referenced this issue Aug 5, 2021
Vectorization was disabled when broadcast inner axes exist.
Fixes pytorch#1021

patched with CI failure

Co-authored-by: jjsjann123 <alex.jann2012@gmail.com>
jaglinux pushed a commit to jaglinux/pytorch that referenced this issue Jul 19, 2022
jpvillam-amd pushed a commit to jpvillam-amd/pytorch that referenced this issue Aug 10, 2022
syed-ahmed pushed a commit to syed-ahmed/pytorch that referenced this issue Sep 22, 2022
we pass DESIRED_CUDA=cpu-cxx11-abi to the container to build
pytorch wheel with file name like *cpu.cxx11.abi*, and so it is
different with the original cpu wheel file.

this patch corrects the test setting to use same test for cpu
and cpu-cxx11-abi.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests