Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use python-builds instead of Miniconda #370

Closed
ianpittwood opened this issue Aug 9, 2022 · 3 comments · Fixed by #415
Closed

Use python-builds instead of Miniconda #370

ianpittwood opened this issue Aug 9, 2022 · 3 comments · Fixed by #415
Assignees

Comments

@ianpittwood
Copy link
Collaborator

ianpittwood commented Aug 9, 2022

Miniconda can be around 400MB fully installed and we install it multiple times in most images. There are a couple alternatives to Miniconda, but we do have an internal python build setup that we want to consider using here. Investigate using this and seeing how it works. We will have a follow-up ticket to roll this out more fully, which shouldn't happen yet.

@ianpittwood ianpittwood added the needs refinement Marked for review at a future meeting label Aug 18, 2022
@bdeitte bdeitte changed the title Evaluate alternatives to Miniconda Use python-builds instead of Miniconda Aug 25, 2022
@bdeitte bdeitte removed the needs refinement Marked for review at a future meeting label Aug 25, 2022
@msarahan
Copy link
Contributor

I'd be happy to have a conversation about this. For context, I used to work at Anaconda on conda/package builds/miniconda, and I've also spent a fair amount of time with docker.

I don't think that changing from miniconda to our own internal python build will be a huge win. Miniconda may duplicate some low-level system libraries, but with docker images being very stripped down already, I don't think there's much overlap. This is a guess and I have not verified it with hard numbers. Our own internal python can only be as much better as the elimination of that overlap. However, neither miniconda nor our internal python build are likely to be stripped down for the sake of creating a small docker image. Debug symbols are generally what take up most space in these things. You can strip them as a step in creating the docker image. The big downside of stripping debug symbols is that it's harder to make sense of things when they crash. Here's some more info: docker-library/php#297

One possible alternative for miniconda is using micromamba to create environments: https://hub.docker.com/r/mambaorg/micromamba

This doesn't imply that the created environments have stripped binaries - they'll probably also be too big. It's a way to avoid installing miniconda, but the packages that are being installed are the same. If you install a basic python environment with this tool, it'll be probably equivalent to the miniconda base environment (minus conda and any dependencies outside of Python). Information around stripping in conda packages: conda-forge/conda-forge.github.io#520

There's an article at https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html about trimming content. That's pretty specialized and perhaps fragile, since it entails removing so many files based on domain knowledge.

This is also not an argument that you should keep using conda. This is a means of provisioning a python installation, and it is separate from the question of how you install other software in that installation after it has been provisioned. Using micromamba as linked above will not install conda into the created environment unless you explicitly list it for installation.

The idea of binary stripping applies just as well to any internal python build setup, but it's probably easier to use the official docker python images where extensive effort has gone in to trimming them to minimal size: https://hub.docker.com/_/python. Even so, the compressed images aren't much, if any, better than the Miniconda installs (around 300 MB compressed) - https://hub.docker.com/_/python/tags. I'm not sure if those have stripped binaries.

@msarahan
Copy link
Contributor

Ran an experiment. Here's a dockerfile to build an image with micromamba that comes out to 310 MB compressed:

FROM mambaorg/micromamba:0.25.1
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yaml /tmp/env.yaml
RUN micromamba install -y -n base -f /tmp/env.yaml && \
    micromamba clean --all --yes

env.yaml (the environment spec for micromamba) has:

name: base
channels:
  - conda-forge
dependencies:
  - python>=3.10
  - pip

Here's a dockerfile using micromamba and stripping that gets down to 212 MB:

FROM mambaorg/micromamba:0.25.1
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yaml /tmp/env.yaml
RUN micromamba install -y -n base -f /tmp/env.yaml && \
    find -P -O3 /usr/bin/ /usr/local/bin /opt/conda -type f -not -name strip -and -not -name dbus-daemon -execdir /opt/conda/bin/strip -v --strip-unneeded '{}' \; && \
    micromamba remove -y -n base binutils && \
    micromamba clean --all --yes

env.yaml has:

name: base
channels:
  - conda-forge
dependencies:
  - python>=3.10
  - pip
  - binutils

binutils is transient - it provides the strip binary.

If you build this, there's a whole lot of noisy warnings from the stripping, because the input file search is pretty heavy-handed and throws a lot of non-binary stuff at it. Just ignore it.

@bdeitte
Copy link
Contributor

bdeitte commented Sep 22, 2022

There is a lot of discussion still going on for this. We will make sure to do this the same way it's happening elsewhere, which is still tbd.

@ianpittwood ianpittwood self-assigned this Nov 2, 2022
@ianpittwood ianpittwood linked a pull request Nov 9, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants