New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use pipenv with multistage docker builds? #3160

Open
haizaar opened this Issue Nov 4, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@haizaar

haizaar commented Nov 4, 2018

Good day,

I'm exploring on how to use pipenv with multi-stage docker builds. In the nutshell, the idea is to "compile" stuff in base image and only copy the resulting artifacts to the final image.

With Python is gets tricky, since you need to copy package dependencies as well.

I've checked out several ideas and looks like pip install --user together with setting PYTHONUSERBASE is the simplest ways to install dependencies to a side directory, e.g.:

FROM alpine AS builder
# Install your gcc, python3-dev, etc. here
apk add --no-cache python3
COPY . /src/
WORKDIR /src
ENV PYROOT /pyroot
RUN PYTHONUSERBASE=$PYROOT pip3 install --user -r requirements.txt
RUN PYTHONUSERBASE=$PYROOT pip3 install --user .

# The final image
FROM alpine
apk add --no-cache python3
ENV PYROOT /pyroot
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/

(The full story)

The problem is that pipenv disregards PYTHONUSERBASE:

$ docker run --rm -ti python:3.6-alpine sh
/ # pip install --upgrade pip; pip install pipenv==2018.10.13  # skipped output
/ # mkdir /tmp/foo; cd /tmp/foo
/tmp/foo # pipenv install requests  # skipped output
/tmp/foo # PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:00:01
/tmp/foo # ls /pyroot
ls: /pyroot: No such file or directory

I found a workaround by using pipenv lock -r and then installing requirements.txt as in my original idea, but I'm not sure this is the best way to go, particularly if I have custom (private) sources defined in my Pipefile - I don't want to replicate their configuration into pip.

Any other ideas?

@haizaar

This comment has been minimized.

haizaar commented Nov 4, 2018

Apologies if the original intent was not clear enough.

All I want is to install my package and its dependencies into a separate directory instead of python's site-packages; and then to copy that directory into the final docker image.

Since --system flag does not use virtualenv (it seems) may be it worth supporting PIP* env vars in that special case?

I'll try to elaborate on the problem: I want to keep my final, "production" docker image as small as possible. Therefore I need to build/install my app dependencies in earlier stage.
Having dependencies installed into general site-packages (either system or of a dedicate venv) is problematic, since I can't reliably pick the actual packages my app depends on later on.
Wheels do no help much either because: a) I don't know which wheels to pick up from cache (some may belong to my app reqs, and some may belong to, e.g. a code generation tool my app uses during install); and b) I need to copy wheels to the final docker image prior to installing them there, meaning they will endup in the resulting docker image.

@techalchemy

This comment has been minimized.

Member

techalchemy commented Nov 4, 2018

Pipenv unsets user settings because they are incompatible with virtualenv settings. I understand the approach you’re taking, but maybe you can say more about what problem you are trying to solve? You want the wheels and sdists or whatever? If so, on Linux they’re stored in ~/.cache/pipenv/wheels

@haizaar

This comment has been minimized.

haizaar commented Nov 4, 2018

@techalchemy My comment appeared before yours for some reason (github still recovers?). Just making sure you got notification.

@techalchemy

This comment has been minimized.

Member

techalchemy commented Nov 4, 2018

Ah okay, you may be able to just use PIPENV_VENV_IN_PROJECT=1 and copy the local .venv directory’s site packages

@haizaar

This comment has been minimized.

haizaar commented Nov 5, 2018

While it's nice to know about this option, I don't see show does it help me. Site packages from vanilla venv weigh 12MB (because of pip and setuptools)

What I'm trying to do is to run

pipenv install --dev
pipenv install
pipenv install .    # May require stuff installed by --dev

and easily collect results of only the last two lines. So another venv does not help me. I need to "split" the installation paths within the venv.

Do you think PEEP that suggest obeying PIP* options if --system flag is used gonna fly?

@techalchemy

This comment has been minimized.

Member

techalchemy commented Nov 5, 2018

Why not just specify your library dependencies in your package metadata and just add uninstall steps for pip/setuptools via pipenv run pip uninstall setuptools in any case we already support any pip environment variable you set — besides the user one. I’m struggling to understand what you are gaining by doing all this extra work

@haizaar

This comment has been minimized.

haizaar commented Nov 5, 2018

OK, I'll try once again to explain. Bear with me please.

Here is what would've wanted to do (in dockerfile):

# Install all stuff in our system so we can install setup.py for our app later on
RUN pipenv install --deploy --system --dev
# Install our app LOCKED dependencies aside
RUN PYTHONUSERBASE=/pyroot pipenv install --deploy --system
# Install the app aside as well - all dependencies already present, so it's just the app
RUN PYTHONUSERBASE=/pyroot pip install --user .

# Take pyroot to the next docker image stage

This look very clear and "human" (as in pipenv's motto) to me :)

Other solutions that require fiddling with virtualenv look error-prone IMHO. I've tried PIPENV_VENV_IN_PROJECT=1 approach + uninstall and there are couple of issues:

Uninstall does not remove dependencies
(Skipped pipenv output for brevity)

$ export PIPENV_VENV_IN_PROJECT=1
$ pipenv run pip freeze  # empty new vanilla
$ pipenv install --dev requests
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
requests==2.20.0
urllib3==1.24.1
$ pipenv uninstall --all-dev
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
urllib3==1.24.1

I.e. requests package has gone, but its dependencies are left behind.

Venv bin directory contains other files
And I need to fish out my app's entrypoints from there.
image
Again, it's doable, we are in SW after all, but I think it should be easier.

Whatever we conclude in this discussion, I think there should be PEEP/doc explaining the recommended way to use pipenv with docker multi-stage.
After all, pipenv install --system --deploy looks very good and almost nails it.

@haizaar

This comment has been minimized.

haizaar commented Nov 5, 2018

FOUND IT!!!

Here is what works: PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy

In action:

/tmp/foo $ grep -A 2 packages Pipfile
[packages]
requests = "*"

[dev-packages]

[requires]
/tmp/foo $ PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:00:02
/tmp/foo $ ls /pyroot/bin/
chardetect
/tmp/foo $ ls /pyroot/lib/python3.6/site-packages/
chardet                    chardet-3.0.4.dist-info    idna                       idna-2.7.dist-info         requests                   requests-2.20.0.dist-info  urllib3                    urllib3-1.24.1.dist-info

Hooray!
Thanks for the hint about "in any case we already support any pip environment variable you set — besides the user one" - I thought what is have a look on other vars I can override...

I'm happy now.

@techalchemy

This comment has been minimized.

Member

techalchemy commented Nov 5, 2018

I mean I understand the steps you are attempting. I am trying to understand why there is a strict constraint on not including incidental dependencies and why you have to fish entry points out.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that. Also, we don’t really recommend running pipenv as root, even in docker containers.

@haizaar

This comment has been minimized.

haizaar commented Nov 6, 2018

I am trying to understand why there is a strict constraint on not including incidental dependencies

We'd like to have our docker images as lean as possible. One of the reasons that we have some systems running over mobile line internet, so when pushing upgrade for 20 microservices, those megabytes start to add up. On alpine, pip + setuptools alone weigh 10MB. Compare it to alpine itself (5MB) + python3 (40MB) and that's over 20% increase. Another reason is security - I want to include only what's really used by my app, to minimize attack surface. It's not just me freaking out. It seems where the whole industry is going. With Go, they compile statically and bundle docker image that even does not have a shell. Distroless is another example.

and why you have to fish entry points out.

My setup.py installs entrypoints that will go to venv's bin dir. Since that dir contains other stuff that does not belong to my app, I now need to explicitly specify which files to collect from there. When developer adds new entrypoint, he now needs to remember to update Dockerfile - yet another thing to remember. And it also makes it harder to use one generic Dockerfile to build all my Python apps. With "aside" installation, my Dockerfile instruction can just take all of the /pyroot.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that.

I'm not sure what you mean. Can you please elaborate? Which artifacts are not preserved exactly?

Also, we don’t really recommend running pipenv as root, even in docker containers.

Even not during build stage?
Regardless, it works well under non-root user as well (inside docker container):

$ mkdir /pyroot; chown appinstall:appinstall /pyroot
$ PIP_USER=1 PYTHONUSERBASE=/pyroot su-exec appinstall:appinstall pipenv install --system --deploy
$ ls -lah /pyroot/
total 16
drwxr-xr-x    4 appinsta appinsta    4.0K Nov  6 02:25 .
drwxr-xr-x   19 root     root        4.0K Nov  6 02:23 ..
drwxr-xr-x    2 appinsta appinsta    4.0K Nov  6 02:25 bin
drwxr-xr-x    3 appinsta appinsta    4.0K Nov  6 02:25 lib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment