Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient memory leads to package install failure with micromamba, but docker build still succeeds #1100

Closed
jli opened this issue Jul 23, 2021 · 2 comments · Fixed by #1250
Labels
type::bug Something isn't working

Comments

@jli
Copy link

jli commented Jul 23, 2021

Issue

When there's not enough memory, micromamba install fails to install some packages, but still has exit code 0. This can cause, eg, a docker build to succeed and generate a broken image that's missing packages.

Steps to reproduce

The following examples installs tensorflow when limited to 300MB of memory. Note the /tmp/mambaf8guFlIGAYT: line 4: 57 Killed bit near the end. I believe this is due to a process being killed due to memory limits, leading to the failed install. But the exit code is 0.

$ cat pip.yaml 
name: base
channels:
  - conda-forge
dependencies:
  - pip
  - pip:
    - tensorflow
$ docker run -it --rm --memory=300m --memory-swap=300m --mount type=bind,source=$(pwd),target=/mnt mambaorg/micromamba:latest micromamba install -y -vvvv -f /mnt/pip.yaml

##### MUCH OUTPUT REMOVED ######

TRACE   hard-linked '/opt/conda/pkgs/wheel-0.36.2-pyhd3deb0d_0/site-packages/wheel/wheelfile.py'
         --> '/opt/conda/lib/python3.9/site-packages/wheel/wheelfile.py'
DEBUG   23 files linked
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/__init__.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/__main__.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/bdist_wheel.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/cli/__pycache__/__init__.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/cli/__pycache__/convert.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/cli/__pycache__/pack.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/cli/__pycache__/unpack.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/macosx_libfile.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/metadata.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/pkginfo.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/util.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/vendored/__pycache__/__init__.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/vendored/packaging/__pycache__/__init__.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/vendored/packaging/__pycache__/_typing.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/vendored/packaging/__pycache__/tags.cpython-39.pyc"
INFO    Compiling "lib/python3.9/site-packages/wheel/__pycache__/wheelfile.cpython-39.pyc"
INFO    Running wrapped python compilation command /opt/conda/bin/python3.9 -Wi -m compileall -q -l -i /tmp/mambafTl74sI4Jss -j0
INFO    noarch pyc compilation failed (cross-compiling?). Success
INFO    /tmp/mambafJxKSiatqAG: line 43: : command not found
        
TRACE   entry point path: "bin/wheel"
        
DEBUG   post-link script for 'wheel' does not exist ('/opt/conda/bin/.wheel-post-link.sh')
DEBUG   Finalizing linking
TRACE   Adding package to prefix metadata at '/opt/conda/conda-meta/wheel-0.36.2-pyhd3deb0d_0.json'
Transaction finished
INFO    Opening history file: "/opt/conda/conda-meta/history"

Installing pip packages: tensorflow
INFO    Calling: /opt/conda/bin/pip install -r /tmp/mambafvJT15WettK --no-input
/tmp/mambaf8guFlIGAYT: line 43: : command not found
Collecting tensorflow
  Downloading tensorflow-2.5.0-cp39-cp39-manylinux2010_x86_64.whl (454.4 MB)
     |██████████▍                     | 147.2 MB 220 bytes/s eta 16 days, 2:11:32/tmp/mambaf8guFlIGAYT: line 4:    57 Killed                  /opt/conda/bin/pip install -r /tmp/mambafvJT15WettK --no-input
INFO    Freeing transaction.
INFO    Freeing solver.
INFO    Freeing pool.
$ echo $?
0

Root cause / solution(?)

With some random poking, I found an example /tmp/mamba* file that looked like this:

lib/python3.8/site-packages/ansiwrap/__init__.py
lib/python3.8/site-packages/ansiwrap/ansistate.py
lib/python3.8/site-packages/ansiwrap/core.py
eval "$('/bin/micromamba' 'shell' 'hook' '-s' 'bash' '-p' '/opt/conda')"

It seems like the "Line 4: ... Killed" is referring to the final line in the script mentioned above. I think the problem is that the system is terminating the command on line 4 due to running out of memory, but this failure isn't propagated so micromamba install has exit code 0.

I believe including set -e or using exec in this script would cause the error to propagate to micromamba.

Misc

Some instances of people saying they needed to increase the memory limits for docker build:

Thanks to @wholtz for the simple reproduction above (originally from mamba-org/micromamba-docker#29 (comment)).

@adriendelsalle
Copy link
Member

Thanks for this issue @jli !
We will try to reproduce and understand that one.

@adriendelsalle adriendelsalle added the status::need-triage New feature proposal that have not been reviewed. label Jul 30, 2021
@adriendelsalle adriendelsalle added type::bug Something isn't working and removed status::need-triage New feature proposal that have not been reviewed. labels Nov 3, 2021
@adriendelsalle
Copy link
Member

adriendelsalle commented Nov 3, 2021

When the sub-process ran using reproc++ get OOM-killed, the error code still reflects success even if the status is 137, reflecting a fatal error with SIGKILL.
We could catch it using the status code but it looks like on error the status code should be negative.

The sub-process call is done at:

auto [_, ec] = reproc::run(wrapped_command, options);

I opened an issue upstream : DaanDeMeyer/reproc#68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants