Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use pipenv with multistage docker builds? #3160

Open
haizaar opened this issue Nov 4, 2018 · 20 comments
Open

How to use pipenv with multistage docker builds? #3160

haizaar opened this issue Nov 4, 2018 · 20 comments

Comments

@haizaar
Copy link

@haizaar haizaar commented Nov 4, 2018

Good day,

I'm exploring on how to use pipenv with multi-stage docker builds. In the nutshell, the idea is to "compile" stuff in base image and only copy the resulting artifacts to the final image.

With Python is gets tricky, since you need to copy package dependencies as well.

I've checked out several ideas and looks like pip install --user together with setting PYTHONUSERBASE is the simplest ways to install dependencies to a side directory, e.g.:

FROM alpine AS builder
# Install your gcc, python3-dev, etc. here
apk add --no-cache python3
COPY . /src/
WORKDIR /src
ENV PYROOT /pyroot
RUN PYTHONUSERBASE=$PYROOT pip3 install --user -r requirements.txt
RUN PYTHONUSERBASE=$PYROOT pip3 install --user .

# The final image
FROM alpine
apk add --no-cache python3
ENV PYROOT /pyroot
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/

(The full story)

The problem is that pipenv disregards PYTHONUSERBASE:

$ docker run --rm -ti python:3.6-alpine sh
/ # pip install --upgrade pip; pip install pipenv==2018.10.13  # skipped output
/ # mkdir /tmp/foo; cd /tmp/foo
/tmp/foo # pipenv install requests  # skipped output
/tmp/foo # PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:00:01
/tmp/foo # ls /pyroot
ls: /pyroot: No such file or directory

I found a workaround by using pipenv lock -r and then installing requirements.txt as in my original idea, but I'm not sure this is the best way to go, particularly if I have custom (private) sources defined in my Pipefile - I don't want to replicate their configuration into pip.

Any other ideas?

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 4, 2018

Apologies if the original intent was not clear enough.

All I want is to install my package and its dependencies into a separate directory instead of python's site-packages; and then to copy that directory into the final docker image.

Since --system flag does not use virtualenv (it seems) may be it worth supporting PIP* env vars in that special case?

I'll try to elaborate on the problem: I want to keep my final, "production" docker image as small as possible. Therefore I need to build/install my app dependencies in earlier stage.
Having dependencies installed into general site-packages (either system or of a dedicate venv) is problematic, since I can't reliably pick the actual packages my app depends on later on.
Wheels do no help much either because: a) I don't know which wheels to pick up from cache (some may belong to my app reqs, and some may belong to, e.g. a code generation tool my app uses during install); and b) I need to copy wheels to the final docker image prior to installing them there, meaning they will endup in the resulting docker image.

@techalchemy
Copy link
Member

@techalchemy techalchemy commented Nov 4, 2018

Pipenv unsets user settings because they are incompatible with virtualenv settings. I understand the approach you’re taking, but maybe you can say more about what problem you are trying to solve? You want the wheels and sdists or whatever? If so, on Linux they’re stored in ~/.cache/pipenv/wheels

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 4, 2018

@techalchemy My comment appeared before yours for some reason (github still recovers?). Just making sure you got notification.

@techalchemy
Copy link
Member

@techalchemy techalchemy commented Nov 4, 2018

Ah okay, you may be able to just use PIPENV_VENV_IN_PROJECT=1 and copy the local .venv directory’s site packages

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 5, 2018

While it's nice to know about this option, I don't see show does it help me. Site packages from vanilla venv weigh 12MB (because of pip and setuptools)

What I'm trying to do is to run

pipenv install --dev
pipenv install
pipenv install .    # May require stuff installed by --dev

and easily collect results of only the last two lines. So another venv does not help me. I need to "split" the installation paths within the venv.

Do you think PEEP that suggest obeying PIP* options if --system flag is used gonna fly?

@techalchemy
Copy link
Member

@techalchemy techalchemy commented Nov 5, 2018

Why not just specify your library dependencies in your package metadata and just add uninstall steps for pip/setuptools via pipenv run pip uninstall setuptools in any case we already support any pip environment variable you set — besides the user one. I’m struggling to understand what you are gaining by doing all this extra work

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 5, 2018

OK, I'll try once again to explain. Bear with me please.

Here is what would've wanted to do (in dockerfile):

# Install all stuff in our system so we can install setup.py for our app later on
RUN pipenv install --deploy --system --dev
# Install our app LOCKED dependencies aside
RUN PYTHONUSERBASE=/pyroot pipenv install --deploy --system
# Install the app aside as well - all dependencies already present, so it's just the app
RUN PYTHONUSERBASE=/pyroot pip install --user .

# Take pyroot to the next docker image stage

This look very clear and "human" (as in pipenv's motto) to me :)

Other solutions that require fiddling with virtualenv look error-prone IMHO. I've tried PIPENV_VENV_IN_PROJECT=1 approach + uninstall and there are couple of issues:

Uninstall does not remove dependencies
(Skipped pipenv output for brevity)

$ export PIPENV_VENV_IN_PROJECT=1
$ pipenv run pip freeze  # empty new vanilla
$ pipenv install --dev requests
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
requests==2.20.0
urllib3==1.24.1
$ pipenv uninstall --all-dev
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
urllib3==1.24.1

I.e. requests package has gone, but its dependencies are left behind.

Venv bin directory contains other files
And I need to fish out my app's entrypoints from there.
image
Again, it's doable, we are in SW after all, but I think it should be easier.

Whatever we conclude in this discussion, I think there should be PEEP/doc explaining the recommended way to use pipenv with docker multi-stage.
After all, pipenv install --system --deploy looks very good and almost nails it.

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 5, 2018

FOUND IT!!!

Here is what works: PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy

In action:

/tmp/foo $ grep -A 2 packages Pipfile
[packages]
requests = "*"

[dev-packages]

[requires]
/tmp/foo $ PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:00:02
/tmp/foo $ ls /pyroot/bin/
chardetect
/tmp/foo $ ls /pyroot/lib/python3.6/site-packages/
chardet                    chardet-3.0.4.dist-info    idna                       idna-2.7.dist-info         requests                   requests-2.20.0.dist-info  urllib3                    urllib3-1.24.1.dist-info

Hooray!
Thanks for the hint about "in any case we already support any pip environment variable you set — besides the user one" - I thought what is have a look on other vars I can override...

I'm happy now.

@techalchemy
Copy link
Member

@techalchemy techalchemy commented Nov 5, 2018

I mean I understand the steps you are attempting. I am trying to understand why there is a strict constraint on not including incidental dependencies and why you have to fish entry points out.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that. Also, we don’t really recommend running pipenv as root, even in docker containers.

@haizaar
Copy link
Author

@haizaar haizaar commented Nov 6, 2018

I am trying to understand why there is a strict constraint on not including incidental dependencies

We'd like to have our docker images as lean as possible. One of the reasons that we have some systems running over mobile line internet, so when pushing upgrade for 20 microservices, those megabytes start to add up. On alpine, pip + setuptools alone weigh 10MB. Compare it to alpine itself (5MB) + python3 (40MB) and that's over 20% increase. Another reason is security - I want to include only what's really used by my app, to minimize attack surface. It's not just me freaking out. It seems where the whole industry is going. With Go, they compile statically and bundle docker image that even does not have a shell. Distroless is another example.

and why you have to fish entry points out.

My setup.py installs entrypoints that will go to venv's bin dir. Since that dir contains other stuff that does not belong to my app, I now need to explicitly specify which files to collect from there. When developer adds new entrypoint, he now needs to remember to update Dockerfile - yet another thing to remember. And it also makes it harder to use one generic Dockerfile to build all my Python apps. With "aside" installation, my Dockerfile instruction can just take all of the /pyroot.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that.

I'm not sure what you mean. Can you please elaborate? Which artifacts are not preserved exactly?

Also, we don’t really recommend running pipenv as root, even in docker containers.

Even not during build stage?
Regardless, it works well under non-root user as well (inside docker container):

$ mkdir /pyroot; chown appinstall:appinstall /pyroot
$ PIP_USER=1 PYTHONUSERBASE=/pyroot su-exec appinstall:appinstall pipenv install --system --deploy
$ ls -lah /pyroot/
total 16
drwxr-xr-x    4 appinsta appinsta    4.0K Nov  6 02:25 .
drwxr-xr-x   19 root     root        4.0K Nov  6 02:23 ..
drwxr-xr-x    2 appinsta appinsta    4.0K Nov  6 02:25 bin
drwxr-xr-x    3 appinsta appinsta    4.0K Nov  6 02:25 lib
@puinenveturi
Copy link

@puinenveturi puinenveturi commented Dec 17, 2018

where to copy these /bin and /lib dirs, to reach it from an other build stage?

@haizaar Can you publish a working Dockerfile? (without private code, of course)

@haizaar
Copy link
Author

@haizaar haizaar commented Dec 18, 2018

@derPuntigamer It's all here: https://tech.zarmory.com/2018/09/docker-multi-stage-builds-for-python-app.html (scroll down for pipenv version). Questions are welcome.

@ekhaydarov
Copy link

@ekhaydarov ekhaydarov commented May 21, 2019

@haizaar thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though

@haizaar
Copy link
Author

@haizaar haizaar commented May 21, 2019

@ekhaydarov

thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though

Can you elaborate what you mean? Why pipenv steps for Ubuntu should be any different?

@ClaytonJY
Copy link

@ClaytonJY ClaytonJY commented Jul 12, 2019

can confirm this works on the non-alpine python:3.7-slim-stretch base image.

I only needed a simple script on top of the pipenv deps, I added an initial base step for the env vars, and I didn't need to muck around with symlinks, console scripts, or entrypoints, so I love how simple this ended up, only 2-3 steps per stage:

FROM python:3.7-slim-stretch AS base

ENV PYROOT /pyroot
ENV PYTHONUSERBASE $PYROOT


FROM base AS builder

RUN pip install pipenv

COPY Pipfile* ./

RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy --ignore-pipfile


FROM base

COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY myscript.py ./

CMD ["python","myscript.py"]

many thanks for the hard work here @haizaar!

@MattFanto
Copy link

@MattFanto MattFanto commented Sep 2, 2020

I would suggest also to add $PYROOT/bin to avoid command not found issue when running python bin cmd like gunicorn

FROM base

COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY --from=builder $PYROOT/bin/ $PYROOT/bin/

ENV PATH="$PYROOT/bin:$PATH"

COPY myscript.py ./

CMD ["python","myscript.py"]
@uranusjr
Copy link
Member

@uranusjr uranusjr commented Sep 3, 2020

Are there anything unresolved on this topic? It seems to me that everything in recent comments is working.

@haizaar
Copy link
Author

@haizaar haizaar commented Sep 3, 2020

LGTM.

@haizaar
Copy link
Author

@haizaar haizaar commented Sep 10, 2020

I'll revert my last comment - PIP_IGNORE_INSTALLED is broken in the latest release: #4453

@Zerthick
Copy link

@Zerthick Zerthick commented Sep 14, 2020

Since running into #4432 which is possibly related to #4453 I have changed my multistage docker builds to utilize the venv method described here: https://sourcery.ai/blog/python-docker/ This does come with the drawbacks that @haizaar had mentioned of including unneeded additional files in the final image, however it appears to be the most stable approach, with using $PYTHONUSERBASE causing packages to be missing or not found in the final image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants