Skip to content

Commit

Permalink
Optimize PROD image caching in CI (apache#35438)
Browse files Browse the repository at this point in the history
Turns out that some of the layers in our PROD image got
invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the
cache for PROD image is "constraints" by default, while building
images in "build-images" workflow for regular PRs and canary
build uses "constraints-source-providers". The former is fine as
default for PROD image (as oppose to CI image we build PROD image
from released PyPI packages by default) but the latter is "proper"
for the CI cache, because there, the image is built out of local
packages prepared from sources.

Turns out that the CONSTRAINT_MODE parameter had a profound impact
on caching - because it was set before the
"install_packages_from_branch_tip" step and - in fact - even
before "install database clients" step, which caused our cache to
only work for the "base OS dependencies" - installing database
clients and installing airflow from branch tip (which works great
for CI image) had always been done in PRs because the layers in
cache with constraints env invalidated all subsequent layers.

This had no big impact before when testing usually took much longer
time - but since the testing has been vastly improved in apache#35160, now
PROD image building continues running even after test complete and
becomes the next frontier of optimization.

This PR optimizes PROD image building in two ways:

* caching is prepared with "source_providers" constraint mode, same
  as regular build

* the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after
  installing database clients, so that this parameter does not
  impact their caching.
  • Loading branch information
potiuk authored and romsharon98 committed Nov 10, 2023
1 parent a6587f7 commit d776fff
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 37 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1886,6 +1886,7 @@ jobs:
--builder airflow_cache
--install-packages-from-context
--run-in-parallel
--airflow-constraints-mode constraints-source-providers
--prepare-buildx-cache
--platform ${{ matrix.platform }}
env:
Expand Down
76 changes: 39 additions & 37 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -1223,6 +1223,44 @@ ARG INSTALL_MYSQL_CLIENT="true"
ARG INSTALL_MYSQL_CLIENT_TYPE="mysql"
ARG INSTALL_MSSQL_CLIENT="true"
ARG INSTALL_POSTGRES_CLIENT="true"
ARG AIRFLOW_PIP_VERSION

ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
INSTALL_MYSQL_CLIENT_TYPE=${INSTALL_MYSQL_CLIENT_TYPE} \
INSTALL_MSSQL_CLIENT=${INSTALL_MSSQL_CLIENT} \
INSTALL_POSTGRES_CLIENT=${INSTALL_POSTGRES_CLIENT}

# Only copy mysql/mssql installation scripts for now - so that changing the other
# scripts which are needed much later will not invalidate the docker layer here
COPY --from=scripts install_mysql.sh install_mssql.sh install_postgres.sh /scripts/docker/

# THE 3 LINES ARE ONLY NEEDED IN ORDER TO MAKE PYMSSQL BUILD WORK WITH LATEST CYTHON
# AND SHOULD BE REMOVED WHEN WORKAROUND IN install_mssql.sh IS REMOVED
ARG AIRFLOW_PIP_VERSION=23.3.1
ENV AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION}
COPY --from=scripts common.sh /scripts/docker/

RUN bash /scripts/docker/install_mysql.sh dev && \
bash /scripts/docker/install_mssql.sh dev && \
bash /scripts/docker/install_postgres.sh dev
ENV PATH=${PATH}:/opt/mssql-tools/bin

# By default we do not install from docker context files but if we decide to install from docker context
# files, we should override those variables to "docker-context-files"
ARG DOCKER_CONTEXT_FILES="Dockerfile"

COPY ${DOCKER_CONTEXT_FILES} /docker-context-files

ARG AIRFLOW_HOME
ARG AIRFLOW_USER_HOME_DIR
ARG AIRFLOW_UID

RUN adduser --gecos "First Last,RoomNumber,WorkPhone,HomePhone" --disabled-password \
--quiet "airflow" --uid "${AIRFLOW_UID}" --gid "0" --home "${AIRFLOW_USER_HOME_DIR}" && \
mkdir -p ${AIRFLOW_HOME} && chown -R "airflow:0" "${AIRFLOW_USER_HOME_DIR}" ${AIRFLOW_HOME}

USER airflow

ARG AIRFLOW_REPO=apache/airflow
ARG AIRFLOW_BRANCH=main
ARG AIRFLOW_EXTRAS
Expand All @@ -1233,7 +1271,7 @@ ARG AIRFLOW_CONSTRAINTS_MODE="constraints"
ARG AIRFLOW_CONSTRAINTS_REFERENCE=""
ARG AIRFLOW_CONSTRAINTS_LOCATION=""
ARG DEFAULT_CONSTRAINTS_BRANCH="constraints-main"
ARG AIRFLOW_PIP_VERSION

# By default PIP has progress bar but you can disable it.
ARG PIP_PROGRESS_BAR
# By default we do not use pre-cached packages, but in CI/Breeze environment we override this to speed up
Expand Down Expand Up @@ -1262,42 +1300,6 @@ ARG UPGRADE_TO_NEWER_DEPENDENCIES="false"
ARG AIRFLOW_SOURCES_FROM="Dockerfile"
ARG AIRFLOW_SOURCES_TO="/Dockerfile"

# By default we do not install from docker context files but if we decide to install from docker context
# files, we should override those variables to "docker-context-files"
ARG DOCKER_CONTEXT_FILES="Dockerfile"

ARG AIRFLOW_HOME
ARG AIRFLOW_USER_HOME_DIR
ARG AIRFLOW_UID

ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
INSTALL_MYSQL_CLIENT_TYPE=${INSTALL_MYSQL_CLIENT_TYPE} \
INSTALL_MSSQL_CLIENT=${INSTALL_MSSQL_CLIENT} \
INSTALL_POSTGRES_CLIENT=${INSTALL_POSTGRES_CLIENT}

# Only copy mysql/mssql installation scripts for now - so that changing the other
# scripts which are needed much later will not invalidate the docker layer here
COPY --from=scripts install_mysql.sh install_mssql.sh install_postgres.sh /scripts/docker/

# THE 3 LINES ARE ONLY NEEDED IN ORDER TO MAKE PYMSSQL BUILD WORK WITH LATEST CYTHON
# AND SHOULD BE REMOVED WHEN WORKAROUND IN install_mssql.sh IS REMOVED
ARG AIRFLOW_PIP_VERSION=23.3.1
ENV AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION}
COPY --from=scripts common.sh /scripts/docker/


RUN bash /scripts/docker/install_mysql.sh dev && \
bash /scripts/docker/install_mssql.sh dev && \
bash /scripts/docker/install_postgres.sh dev
ENV PATH=${PATH}:/opt/mssql-tools/bin

COPY ${DOCKER_CONTEXT_FILES} /docker-context-files

RUN adduser --gecos "First Last,RoomNumber,WorkPhone,HomePhone" --disabled-password \
--quiet "airflow" --uid "${AIRFLOW_UID}" --gid "0" --home "${AIRFLOW_USER_HOME_DIR}" && \
mkdir -p ${AIRFLOW_HOME} && chown -R "airflow:0" "${AIRFLOW_USER_HOME_DIR}" ${AIRFLOW_HOME}

USER airflow

RUN if [[ -f /docker-context-files/pip.conf ]]; then \
mkdir -p ${AIRFLOW_USER_HOME_DIR}/.config/pip; \
Expand Down

0 comments on commit d776fff

Please sign in to comment.