[ci] Nightly for building+uploading manylinux baseimage#59204
[ci] Nightly for building+uploading manylinux baseimage#59204
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a set of new shell scripts to automate the nightly build and upload of the manylinux base image. The code is well-organized into a main script and reusable utility libraries. The use of shell best practices like set -euo pipefail and proper variable quoting is commendable. My review provides a few suggestions to further improve the robustness and efficiency of the utility scripts, such as adding include guards, redirecting error output to stderr, and conditionally defining shared variables. Overall, this is a solid addition to the CI process.
| RAYCI_DISABLE_JAVA: "true" | ||
| JDK_SUFFIX: "" |
There was a problem hiding this comment.
I wanted to use a {{matrix}} operation here, but couldn't find a way to set RAYCI_DISABLE_JAVA to true and cover the JDK_SUFFIX for the image name...
There was a problem hiding this comment.
I think this is fine. the matrix only has two elements, so not a big deal..
we probably should also have the manylinux aarch64 ones here. for completeness.
| steps: | ||
| - name: manylinux-nightly-no-jdk | ||
| label: ":crescent_moon: wanda: manylinux-nightly (no JDK)" | ||
| wanda: ci/docker/manylinux-nightly.wanda.yaml |
There was a problem hiding this comment.
Is there a way to pass -rebuild via this interface? https://github.com/ray-project/rayci/blob/main/wanda/wanda/main.go#L42
There was a problem hiding this comment.
I also found DisableCaching, which I believe would require a new wanda.yaml definition-- https://github.com/ray-project/rayci/blob/main/wanda/spec.go#L30
There was a problem hiding this comment.
I looked into the code, and it does not seem to have one.
I think we can add a global env var like RAYCI_WANDA_ALWAYS_REBUILD to control the behavior.
also, I realized that force-rebuild is that critical. the main purpose is really to have a pre-built one, and rayci+wanda will by default invalidate the cache in around 3 or 4 days.
acc0234 to
d6e44c6
Compare
| @@ -0,0 +1,10 @@ | |||
| name: "manylinux-nightly$JDK_SUFFIX" | |||
There was a problem hiding this comment.
since this is a SUFFIX, can we just reuse the existing wanda file rather than creating a new one?
There was a problem hiding this comment.
If I use the same existing wanda file with the same name, how do I make sure that the JDK and non-JDK versions are passed from the respective Builder -> Uploader?
I.e. today it looks soemthing like:
WANDA_IMAGE_NAME="manylinux-nightly${JDK_SUFFIX}"
WANDA_TAG="${RAYCI_WORK_REPO}:${RAYCI_BUILD_ID}-${WANDA_IMAGE_NAME}"
# TODO: Change to `crane`
docker pull "$WANDA_TAG"
Which depends on the ${JDK_SUFFIX} for differentiating the two images. What can I do instead for getting the correct image created+uploaded by the Builder?
|
|
||
| printHeader "Pulling image from Wanda cache" | ||
| printInfo "Pulling: ${WANDA_TAG}" | ||
| docker pull "$WANDA_TAG" |
There was a problem hiding this comment.
can we use crane cp? so that we do not need to pull down the common layers.
| : "${JDK_SUFFIX:?Error: JDK_SUFFIX is not set}" | ||
| UPLOAD=${UPLOAD:-"false"} | ||
|
|
||
| COMMIT_HASH=$(git rev-parse HEAD) |
There was a problem hiding this comment.
hate to ask.. but can we rewrite this script in bazel py_binary so that the logic is unit-testable and more maintainable? most ci scripts / tools live in ci/ray_ci directories.
which means you would also need to get a forge env for running the bazel py_binary I think
have you tried doing a dry run on the new cicd-cron pipeline yet?
There was a problem hiding this comment.
Reasonable ask! I can rewrite as a py_binary
have you tried doing a dry run on the new cicd-cron pipeline yet?
I have not yet
There was a problem hiding this comment.
It has been py_binary-d https://github.com/ray-project/ray/pull/59204/files#diff-0d54920723c7187d8f71dbd0f9b62c5f9673575ba155040b2e53a04badb8d6d2
Also refactored crane library so we could use those #59360
| steps: | ||
| - name: manylinux-nightly-no-jdk | ||
| label: ":crescent_moon: wanda: manylinux-nightly (no JDK)" | ||
| wanda: ci/docker/manylinux-nightly.wanda.yaml |
There was a problem hiding this comment.
I looked into the code, and it does not seem to have one.
I think we can add a global env var like RAYCI_WANDA_ALWAYS_REBUILD to control the behavior.
also, I realized that force-rebuild is that critical. the main purpose is really to have a pre-built one, and rayci+wanda will by default invalidate the cache in around 3 or 4 days.
| RAYCI_DISABLE_JAVA: "true" | ||
| JDK_SUFFIX: "" |
There was a problem hiding this comment.
I think this is fine. the matrix only has two elements, so not a big deal..
we probably should also have the manylinux aarch64 ones here. for completeness.
Signed-off-by: andrew <andrew@anyscale.com>
Now installs and uses Wanda to build a JDK and non-JDK version of manylinux. Depends on new upload flag to push to Dockerhub once that is enabled Focused heavily on error handling and being able to run from anywhere in the repo Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Scoped to just run build-many script. Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
We use wanda now Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
To be used in ray-project/ray#59204 Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
b7aaa4a to
a3a3def
Compare
| --destination-repository rayproject/manylinux2014 | ||
| --tag-suffix -{{matrix}} | ||
|
|
||
| - label: ":docker: push: Push manylinux-cibase-jdk to Docker Hub" |
There was a problem hiding this comment.
Bug: Missing matrix variable in JDK push step label
The label for the JDK push step is missing {{matrix}} in the label string. Line 30 correctly uses Push manylinux-cibase-{{matrix}} to Docker Hub but line 45 uses Push manylinux-cibase-jdk to Docker Hub without the matrix variable. This causes both x86_64 and aarch64 variants of this step to have identical labels in the CI UI, which is inconsistent with the non-JDK push step pattern.
Signed-off-by: andrew <andrew@anyscale.com>
| RAYCI_DISABLE_JAVA: "false" | ||
| RAYCI_WANDA_ALWAYS_REBUILD: "true" | ||
| JDK_SUFFIX: "-jdk" | ||
| HOSTTYPE: "{{matrix}}" |
There was a problem hiding this comment.
Bug: HOSTTYPE uses matrix placeholder without matrix defined
The manylinux-cibase-jdk-x86_64 step sets HOSTTYPE: "{{matrix}}" but this step has no matrix: field defined. The {{matrix}} placeholder won't be substituted and will remain as a literal string, causing the wanda build to use invalid values like quay.io/pypa/manylinux2014_{{matrix}} as the base image. This contrasts with the aarch64 variant on line 45 which correctly uses HOSTTYPE: "aarch64", and the x86_64 non-JDK variant on line 14 which uses HOSTTYPE: "x86_64".
| - bazel run //ci/ray_ci/automation:copy_wanda_image -- | ||
| --wanda-image-name manylinux-cibase-{{matrix}} | ||
| --destination-repository rayproject/manylinux2014 | ||
| --tag-suffix -{{matrix}} |
There was a problem hiding this comment.
Bug: Push steps run in dry-run mode without uploading images
The bazel run //ci/ray_ci/automation:copy_wanda_image commands in both push steps don't include the --upload flag. Since copy_wanda_image.py defaults to dry-run mode when --upload is not provided (the upload parameter defaults to False), the pipeline will never actually copy images to Docker Hub despite the step labels saying "Push ... to Docker Hub". The script will just log "DRY RUN: Would copy..." and exit. This may be intentional for initial testing as noted in the PR description, but the nightly cron job won't upload any images if merged in this state.
Additional Locations (1)
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
|
Successfully built + uploaded as part of https://buildkite.com/ray-project/cicd-cron/builds/22/steps/canvas Now verifying separately in PR #59463 |
Adds new buildkite job for building + uploading a base `manylinux2014`
image. Planning to build in flavors of with and without JDK included to
help reduce image size when JDK is not needed.
* Building is handled by a `wanda` builder step
* Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64`
and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is
handled by a separate runner that calls the new
`ci/ray_ci/automation/copy_wanda_image.py`
* Runs by default without uploading for initial testing. Once passed
with flag `--upload`, it will upload to the specified registry
---------
Signed-off-by: andrew <andrew@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
We now prebuild manylinux2014 with JDK as part of #59204 We can directly consume this, rather than rebuilding each time we need it --------- Signed-off-by: andrew <andrew@anyscale.com>
…59204) Adds new buildkite job for building + uploading a base `manylinux2014` image. Planning to build in flavors of with and without JDK included to help reduce image size when JDK is not needed. * Building is handled by a `wanda` builder step * Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64` and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is handled by a separate runner that calls the new `ci/ray_ci/automation/copy_wanda_image.py` * Runs by default without uploading for initial testing. Once passed with flag `--upload`, it will upload to the specified registry --------- Signed-off-by: andrew <andrew@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
We now prebuild manylinux2014 with JDK as part of ray-project#59204 We can directly consume this, rather than rebuilding each time we need it --------- Signed-off-by: andrew <andrew@anyscale.com>
…59204) Adds new buildkite job for building + uploading a base `manylinux2014` image. Planning to build in flavors of with and without JDK included to help reduce image size when JDK is not needed. * Building is handled by a `wanda` builder step * Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64` and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is handled by a separate runner that calls the new `ci/ray_ci/automation/copy_wanda_image.py` * Runs by default without uploading for initial testing. Once passed with flag `--upload`, it will upload to the specified registry --------- Signed-off-by: andrew <andrew@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
We now prebuild manylinux2014 with JDK as part of ray-project#59204 We can directly consume this, rather than rebuilding each time we need it --------- Signed-off-by: andrew <andrew@anyscale.com>
Adds new buildkite job for building + uploading a base
manylinux2014image. Planning to build in flavors of with and without JDK included to help reduce image size when JDK is not needed.wandabuilder steprayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64andrayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is handled by a separate runner that calls the newci/ray_ci/automation/copy_wanda_image.py`--upload, it will upload to the specified registry