-
Notifications
You must be signed in to change notification settings - Fork 330
Misc Dockerfiles updates to reduce image footprints #1363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,31 +1,51 @@ | ||
|
|
||
|
|
||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| FROM python:3.11-slim | ||
| ARG IMAGE_NAME=python | ||
| ARG IMAGE_TAG=3.11-slim | ||
| FROM ${IMAGE_NAME}:${IMAGE_TAG} as base | ||
|
|
||
| RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ | ||
| ENV LANG C.UTF-8 | ||
|
|
||
| RUN apt-get update -y && apt-get install -y \ | ||
| libgl1-mesa-glx \ | ||
| libjemalloc-dev \ | ||
| git | ||
| libjemalloc-dev && \ | ||
|
Comment on lines
11
to
+12
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why these are installed? Nothing should be using Mesa / 3D libs, and jemalloc usage would need explicit e.g. LD_PRELOAD for it, which I do not see here. If those are dependencies of some pip package, pip should take care it, shouldn't it?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No unfortunately Pip won't take care of development packages.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What actually needs these specific packages / why? I mean,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tracked from Git history through file renames/moves when these were first used. However, neither the related PRs nor files in them did not mention why these were added. See e.g: #136 There were few earlier Dockerfiles that already apt-get these, that used langchain as base, but they were running something else than ChatQnA. So it seems like these items are just blindly copy-pasted, without any justification, and maybe used just for some development activities, instead of being needed in production. @ashahba Could you find out whether there is some justification for adding these? (If yes, I should be included to base image after all, and document the reason there.) |
||
| apt-get clean autoclean && \ | ||
| apt-get autoremove -y && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN useradd -m -s /bin/bash user && \ | ||
| mkdir -p /home/user && \ | ||
| chown -R user /home/user/ | ||
|
|
||
| WORKDIR /home/user/ | ||
| RUN git clone https://github.com/opea-project/GenAIComps.git | ||
|
|
||
| WORKDIR /home/user/GenAIComps | ||
|
|
||
| FROM base AS devel | ||
|
|
||
| ARG GenAICompsRepo="https://github.com/opea-project/GenAIComps" | ||
| ARG GenAICompsBranch="main" # Branch name or Commit ID | ||
|
|
||
| RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ | ||
| curl | ||
|
|
||
| RUN curl -sSL --retry 5 ${GenAICompsRepo}/tarball/${GenAICompsBranch} | tar --strip-components=1 -xzf - | ||
|
Comment on lines
+25
to
+31
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using base image from GenAIComps project is the final goal. These optimization are unnecessary there, and shared base image gives >10x space improvement, when all application images are present. => I would skip this and wait for the base image, so that Dockerfiles can be (greatly) simplified to reduce the resulting image size.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I compared using Both took about 6 secs (through company VPN), which was a bit disappointing. I would assume curling compressed tarball to use a bit less bandwidth than Git's "packed" data though. Extracted tarball took 24MB space. Git clone included extra 14MB in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, there's a rather large difference in installing Building curl image took 9s, and git one 18s, probably due to extra dependencies / their sizes (which won't be in the final image): => I'll add note about doing this change to my PR, if I need to update it again. |
||
|
|
||
| RUN pip install --no-cache-dir --upgrade pip setuptools && \ | ||
| pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \ | ||
| pip install --no-cache-dir langchain_core | ||
|
|
||
| FROM base AS prod | ||
|
|
||
| ARG PYTHON=python3.11 | ||
|
|
||
| COPY ./chatqna.py /home/user/chatqna.py | ||
|
|
||
| ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps | ||
|
|
||
| COPY --from=devel /usr/local/lib/${PYTHON}/site-packages /usr/local/lib/${PYTHON}/site-packages | ||
| COPY --from=devel /usr/local/bin /usr/local/bin | ||
| COPY --from=devel /home/user/GenAIComps /home/user/GenAIComps | ||
|
Comment on lines
+45
to
+47
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I tested using separate stage for ChatQnA Or are you seeing larger savings?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please double check your setup.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did a build with just pip install being done in separate step, or not, and image size difference reported by Docker was just few megs. How did you test it, and what different did you see? (I tested it with the comps base image, but that should not matter as it does exactly the same install step.)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When caching is disabled, Are you sure you're not confusing the size difference with disk space used for curl install? (I'm asking because if there is significant size difference, then I should consider same thing for base image.) |
||
|
|
||
| USER user | ||
|
|
||
| WORKDIR /home/user | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python base images already include this workaround for Python versions needing it: https://github.com/docker-library/python/blob/master/3.11/slim-bookworm/Dockerfile
(LANG env is removed in 3.13 version.)
So I do not think such override to be needed for the base image. Do you agree?