Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions ChatQnA/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,31 +1,51 @@


# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

FROM python:3.11-slim
ARG IMAGE_NAME=python
ARG IMAGE_TAG=3.11-slim
FROM ${IMAGE_NAME}:${IMAGE_TAG} as base

RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
ENV LANG C.UTF-8
Copy link
Contributor

@eero-t eero-t Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python base images already include this workaround for Python versions needing it: https://github.com/docker-library/python/blob/master/3.11/slim-bookworm/Dockerfile

(LANG env is removed in 3.13 version.)

So I do not think such override to be needed for the base image. Do you agree?


RUN apt-get update -y && apt-get install -y \
libgl1-mesa-glx \
libjemalloc-dev \
git
libjemalloc-dev && \
Comment on lines 11 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these are installed? Nothing should be using Mesa / 3D libs, and jemalloc usage would need explicit e.g. LD_PRELOAD for it, which I do not see here.

If those are dependencies of some pip package, pip should take care it, shouldn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No unfortunately Pip won't take care of development packages.

Copy link
Contributor

@eero-t eero-t Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What actually needs these specific packages / why?

I mean, pip install in that Dockerfile works fine without either of them, application itself is Python / does not link such libs, and I do not see any problem with the app when those libs are missing...

Copy link
Contributor

@eero-t eero-t Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked from Git history through file renames/moves when these were first used. However, neither the related PRs nor files in them did not mention why these were added. See e.g: #136

There were few earlier Dockerfiles that already apt-get these, that used langchain as base, but they were running something else than ChatQnA. So it seems like these items are just blindly copy-pasted, without any justification, and maybe used just for some development activities, instead of being needed in production.

@ashahba Could you find out whether there is some justification for adding these?

(If yes, I should be included to base image after all, and document the reason there.)

apt-get clean autoclean && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*

RUN useradd -m -s /bin/bash user && \
mkdir -p /home/user && \
chown -R user /home/user/

WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git

WORKDIR /home/user/GenAIComps

FROM base AS devel

ARG GenAICompsRepo="https://github.com/opea-project/GenAIComps"
ARG GenAICompsBranch="main" # Branch name or Commit ID

RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
curl

RUN curl -sSL --retry 5 ${GenAICompsRepo}/tarball/${GenAICompsBranch} | tar --strip-components=1 -xzf -
Comment on lines +25 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using base image from GenAIComps project is the final goal. These optimization are unnecessary there, and shared base image gives >10x space improvement, when all application images are present.

=> I would skip this and wait for the base image, so that Dockerfiles can be (greatly) simplified to reduce the resulting image size.

Copy link
Contributor

@eero-t eero-t Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared using curl against git clone --depth 1 used in my PR: #1031

Both took about 6 secs (through company VPN), which was a bit disappointing.

I would assume curling compressed tarball to use a bit less bandwidth than Git's "packed" data though.

Extracted tarball took 24MB space. Git clone included extra 14MB in .git subdir, but that would not go not go to final target, it's only on the temporary layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there's a rather large difference in installing curl vs. git on top of python:3.11.

Building curl image took 9s, and git one 18s, probably due to extra dependencies / their sizes (which won't be in the final image):

$ docker images |grep  -E "(git|curl)-test"
localhost/git-test                                        latest           d5dfb6922688  2 minutes ago  239 MB
localhost/curl-test                                       latest           0e44588dadfa  3 minutes ago  158 MB

=> I'll add note about doing this change to my PR, if I need to update it again.


RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

FROM base AS prod

ARG PYTHON=python3.11

COPY ./chatqna.py /home/user/chatqna.py

ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps

COPY --from=devel /usr/local/lib/${PYTHON}/site-packages /usr/local/lib/${PYTHON}/site-packages
COPY --from=devel /usr/local/bin /usr/local/bin
COPY --from=devel /home/user/GenAIComps /home/user/GenAIComps
Comment on lines +45 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested using separate stage for ChatQnA pip install, and copying the files from there, like you do here, it saved only couple of MBs. With shared base image, that's pretty insignificant, so I would skip it for simplicity.

Or are you seeing larger savings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check your setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a build with just pip install being done in separate step, or not, and image size difference reported by Docker was just few megs. How did you test it, and what different did you see?

(I tested it with the comps base image, but that should not matter as it does exactly the same install step.)

Copy link
Contributor

@eero-t eero-t Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When caching is disabled, pip should clean out any downloading artefacts after installation, so I think only size difference would come from pip & setuptools upgrades, and those are only couple of MB in size. Which corresponds to what I saw...

Are you sure you're not confusing the size difference with disk space used for curl install? apt-get update itself takes some space, and curl + its deps take ~4.5MB.

(I'm asking because if there is significant size difference, then I should consider same thing for base image.)


USER user

WORKDIR /home/user
Expand Down
Loading