Skip to content

Commit

Permalink
Try to build the Docker image if it doesn't exist (#102562)
Browse files Browse the repository at this point in the history
There is a bug in the test workflow where it could fail to find the new Docker image when the image hasn't yet became available on ECR, for example https://hud.pytorch.org/pytorch/pytorch/commit/e71ab214226af1f9dbded944e939c6447e0e8f09.  This basically is a race condition where the test job starts before the docker-build workflow could finish successfully.  The fix here is to make sure that the test job has the opportunity to build the image if it doesn't exist, same as what the build workflow does atm.  Once the docker-build workflow finishes pushing the new image to ECR, that can then be used instead.
Pull Request resolved: #102562
Approved by: https://github.com/PaliC
  • Loading branch information
huydhn authored and pytorchmergebot committed May 31, 2023
1 parent 9a2df0a commit 04c1c2b
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 11 deletions.
29 changes: 21 additions & 8 deletions .github/actions/calculate-docker-image/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,30 @@ runs:
env:
IS_XLA: ${{ inputs.xla == 'true' && 'true' || '' }}
XLA_IMAGE_TAG: v1.0
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${{ inputs.docker-image-name }}
DOCKER_IMAGE_NAME: ${{ inputs.docker-image-name }}
run: |
if [ -n "${IS_XLA}" ]; then
echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}"
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
set -x
DOCKER_IMAGE_ECR_PREFIX="308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch"
if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_IMAGE_ECR_PREFIX}"* ]]; then
# The docker image name already includes the ECR prefix and tag, so we can just
# use it as it is, but first let's extract the tag
DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"
else
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
DOCKER_IMAGE_BASE="${DOCKER_IMAGE_ECR_PREFIX}/${DOCKER_IMAGE_NAME}"
if [ -n "${IS_XLA}" ]; then
echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}"
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_OUTPUT}"
else
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
fi
fi
- name: Check if image should be built
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,17 @@ jobs:
- name: Setup Linux
uses: ./.github/actions/setup-linux

- name: Calculate docker image
id: calculate-docker-image
uses: ./.github/actions/calculate-docker-image
with:
docker-image-name: ${{ inputs.docker-image }}
xla: ${{ contains(inputs.build-environment, 'xla') }}

- name: Pull docker image
uses: pytorch/test-infra/.github/actions/pull-docker-image@main
with:
docker-image: ${{ inputs.docker-image }}
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}

- name: Download build artifacts
uses: ./.github/actions/download-build-artifacts
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/_linux-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,17 @@ jobs:
- name: Setup Linux
uses: ./.github/actions/setup-linux

- name: Calculate docker image
id: calculate-docker-image
uses: ./.github/actions/calculate-docker-image
with:
docker-image-name: ${{ inputs.docker-image }}
xla: ${{ contains(inputs.build-environment, 'xla') }}

- name: Pull docker image
uses: pytorch/test-infra/.github/actions/pull-docker-image@main
with:
docker-image: ${{ inputs.docker-image }}
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}

- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
id: install-nvidia-driver
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/_rocm-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,17 @@ jobs:
- name: Setup ROCm
uses: ./.github/actions/setup-rocm

- name: Calculate docker image
id: calculate-docker-image
uses: ./.github/actions/calculate-docker-image
with:
docker-image-name: ${{ inputs.docker-image }}
xla: ${{ contains(inputs.build-environment, 'xla') }}

- name: Pull docker image
uses: pytorch/test-infra/.github/actions/pull-docker-image@main
with:
docker-image: ${{ inputs.docker-image }}
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}

- name: Start monitoring script
id: monitor-script
Expand Down

0 comments on commit 04c1c2b

Please sign in to comment.