Skip to content

[ci] Nightly for building+uploading manylinux baseimage#59204

Merged
aslonnie merged 47 commits intomasterfrom
andrewpollack/add-manylinux-nightly
Dec 16, 2025
Merged

[ci] Nightly for building+uploading manylinux baseimage#59204
aslonnie merged 47 commits intomasterfrom
andrewpollack/add-manylinux-nightly

Conversation

@andrew-anyscale
Copy link
Contributor

@andrew-anyscale andrew-anyscale commented Dec 5, 2025

Adds new buildkite job for building + uploading a base manylinux2014 image. Planning to build in flavors of with and without JDK included to help reduce image size when JDK is not needed.

  • Building is handled by a wanda builder step
  • Uploading to rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64 and rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is handled by a separate runner that calls the new ci/ray_ci/automation/copy_wanda_image.py`
  • Runs by default without uploading for initial testing. Once passed with flag --upload, it will upload to the specified registry

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a set of new shell scripts to automate the nightly build and upload of the manylinux base image. The code is well-organized into a main script and reusable utility libraries. The use of shell best practices like set -euo pipefail and proper variable quoting is commendable. My review provides a few suggestions to further improve the robustness and efficiency of the utility scripts, such as adding include guards, redirecting error output to stderr, and conditionally defining shared variables. Overall, this is a solid addition to the CI process.

@aslonnie aslonnie self-requested a review December 8, 2025 22:32
Comment on lines 7 to 8
RAYCI_DISABLE_JAVA: "true"
JDK_SUFFIX: ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use a {{matrix}} operation here, but couldn't find a way to set RAYCI_DISABLE_JAVA to true and cover the JDK_SUFFIX for the image name...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. the matrix only has two elements, so not a big deal..

we probably should also have the manylinux aarch64 ones here. for completeness.

steps:
- name: manylinux-nightly-no-jdk
label: ":crescent_moon: wanda: manylinux-nightly (no JDK)"
wanda: ci/docker/manylinux-nightly.wanda.yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to pass -rebuild via this interface? https://github.com/ray-project/rayci/blob/main/wanda/wanda/main.go#L42

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found DisableCaching, which I believe would require a new wanda.yaml definition-- https://github.com/ray-project/rayci/blob/main/wanda/spec.go#L30

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the code, and it does not seem to have one.

I think we can add a global env var like RAYCI_WANDA_ALWAYS_REBUILD to control the behavior.

also, I realized that force-rebuild is that critical. the main purpose is really to have a pre-built one, and rayci+wanda will by default invalidate the cache in around 3 or 4 days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-anyscale andrew-anyscale force-pushed the andrewpollack/add-manylinux-nightly branch from acc0234 to d6e44c6 Compare December 9, 2025 18:06
@andrew-anyscale andrew-anyscale marked this pull request as ready for review December 9, 2025 18:32
@andrew-anyscale andrew-anyscale requested a review from a team as a code owner December 9, 2025 18:32
@andrew-anyscale andrew-anyscale changed the title [draft][ci] Nightly for building+uploading manylinux baseimage [ci] Nightly for building+uploading manylinux baseimage Dec 9, 2025
@ray-gardener ray-gardener bot added the devprod label Dec 9, 2025
@@ -0,0 +1,10 @@
name: "manylinux-nightly$JDK_SUFFIX"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a SUFFIX, can we just reuse the existing wanda file rather than creating a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use the same existing wanda file with the same name, how do I make sure that the JDK and non-JDK versions are passed from the respective Builder -> Uploader?

I.e. today it looks soemthing like:

WANDA_IMAGE_NAME="manylinux-nightly${JDK_SUFFIX}"
WANDA_TAG="${RAYCI_WORK_REPO}:${RAYCI_BUILD_ID}-${WANDA_IMAGE_NAME}"

# TODO: Change to `crane`
docker pull "$WANDA_TAG"

Which depends on the ${JDK_SUFFIX} for differentiating the two images. What can I do instead for getting the correct image created+uploaded by the Builder?


printHeader "Pulling image from Wanda cache"
printInfo "Pulling: ${WANDA_TAG}"
docker pull "$WANDA_TAG"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use crane cp? so that we do not need to pull down the common layers.

: "${JDK_SUFFIX:?Error: JDK_SUFFIX is not set}"
UPLOAD=${UPLOAD:-"false"}

COMMIT_HASH=$(git rev-parse HEAD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hate to ask.. but can we rewrite this script in bazel py_binary so that the logic is unit-testable and more maintainable? most ci scripts / tools live in ci/ray_ci directories.

which means you would also need to get a forge env for running the bazel py_binary I think

have you tried doing a dry run on the new cicd-cron pipeline yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable ask! I can rewrite as a py_binary

have you tried doing a dry run on the new cicd-cron pipeline yet?

I have not yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steps:
- name: manylinux-nightly-no-jdk
label: ":crescent_moon: wanda: manylinux-nightly (no JDK)"
wanda: ci/docker/manylinux-nightly.wanda.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the code, and it does not seem to have one.

I think we can add a global env var like RAYCI_WANDA_ALWAYS_REBUILD to control the behavior.

also, I realized that force-rebuild is that critical. the main purpose is really to have a pre-built one, and rayci+wanda will by default invalidate the cache in around 3 or 4 days.

Comment on lines 7 to 8
RAYCI_DISABLE_JAVA: "true"
JDK_SUFFIX: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. the matrix only has two elements, so not a big deal..

we probably should also have the manylinux aarch64 ones here. for completeness.

Signed-off-by: andrew <andrew@anyscale.com>
Now installs and uses Wanda to build a JDK and non-JDK version of manylinux. Depends on new upload flag to push to Dockerhub once that is enabled

Focused heavily on error handling and being able to run from anywhere in the repo

Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Scoped to just run build-many script.

Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
We use wanda now

Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
andrew-anyscale added a commit to ray-project/rayci that referenced this pull request Dec 11, 2025
To be used in ray-project/ray#59204

Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
@andrew-anyscale andrew-anyscale force-pushed the andrewpollack/add-manylinux-nightly branch from b7aaa4a to a3a3def Compare December 15, 2025 21:56
--destination-repository rayproject/manylinux2014
--tag-suffix -{{matrix}}

- label: ":docker: push: Push manylinux-cibase-jdk to Docker Hub"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing matrix variable in JDK push step label

The label for the JDK push step is missing {{matrix}} in the label string. Line 30 correctly uses Push manylinux-cibase-{{matrix}} to Docker Hub but line 45 uses Push manylinux-cibase-jdk to Docker Hub without the matrix variable. This causes both x86_64 and aarch64 variants of this step to have identical labels in the CI UI, which is inconsistent with the non-JDK push step pattern.

Fix in Cursor Fix in Web

Signed-off-by: andrew <andrew@anyscale.com>
RAYCI_DISABLE_JAVA: "false"
RAYCI_WANDA_ALWAYS_REBUILD: "true"
JDK_SUFFIX: "-jdk"
HOSTTYPE: "{{matrix}}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: HOSTTYPE uses matrix placeholder without matrix defined

The manylinux-cibase-jdk-x86_64 step sets HOSTTYPE: "{{matrix}}" but this step has no matrix: field defined. The {{matrix}} placeholder won't be substituted and will remain as a literal string, causing the wanda build to use invalid values like quay.io/pypa/manylinux2014_{{matrix}} as the base image. This contrasts with the aarch64 variant on line 45 which correctly uses HOSTTYPE: "aarch64", and the x86_64 non-JDK variant on line 14 which uses HOSTTYPE: "x86_64".

Fix in Cursor Fix in Web

Signed-off-by: andrew <andrew@anyscale.com>
- bazel run //ci/ray_ci/automation:copy_wanda_image --
--wanda-image-name manylinux-cibase-{{matrix}}
--destination-repository rayproject/manylinux2014
--tag-suffix -{{matrix}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Push steps run in dry-run mode without uploading images

The bazel run //ci/ray_ci/automation:copy_wanda_image commands in both push steps don't include the --upload flag. Since copy_wanda_image.py defaults to dry-run mode when --upload is not provided (the upload parameter defaults to False), the pipeline will never actually copy images to Docker Hub despite the step labels saying "Push ... to Docker Hub". The script will just log "DRY RUN: Would copy..." and exit. This may be intentional for initial testing as noted in the PR description, but the nightly cron job won't upload any images if merged in this state.

Additional Locations (1)

Fix in Cursor Fix in Web

Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
Signed-off-by: andrew <andrew@anyscale.com>
@aslonnie aslonnie self-requested a review December 16, 2025 02:39
Signed-off-by: andrew <andrew@anyscale.com>
@andrew-anyscale
Copy link
Contributor Author

Successfully built + uploaded as part of https://buildkite.com/ray-project/cicd-cron/builds/22/steps/canvas
See: https://hub.docker.com/r/rayproject/manylinux2014/tags

Now verifying separately in PR #59463

@aslonnie aslonnie self-requested a review December 16, 2025 03:23
@aslonnie aslonnie added the go add ONLY when ready to merge, run all tests label Dec 16, 2025
@aslonnie aslonnie merged commit e211eed into master Dec 16, 2025
6 checks passed
@aslonnie aslonnie deleted the andrewpollack/add-manylinux-nightly branch December 16, 2025 06:59
cszhu pushed a commit that referenced this pull request Dec 17, 2025
Adds new buildkite job for building + uploading a base `manylinux2014`
image. Planning to build in flavors of with and without JDK included to
help reduce image size when JDK is not needed.
* Building is handled by a `wanda` builder step
* Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64`
and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is
handled by a separate runner that calls the new
`ci/ray_ci/automation/copy_wanda_image.py`
* Runs by default without uploading for initial testing. Once passed
with flag `--upload`, it will upload to the specified registry

---------

Signed-off-by: andrew <andrew@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
aslonnie pushed a commit that referenced this pull request Dec 18, 2025
We now prebuild manylinux2014 with JDK as part of
#59204

We can directly consume this, rather than rebuilding each time we need
it

---------

Signed-off-by: andrew <andrew@anyscale.com>
zzchun pushed a commit to zzchun/ray that referenced this pull request Dec 18, 2025
…59204)

Adds new buildkite job for building + uploading a base `manylinux2014`
image. Planning to build in flavors of with and without JDK included to
help reduce image size when JDK is not needed.
* Building is handled by a `wanda` builder step
* Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64`
and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is
handled by a separate runner that calls the new
`ci/ray_ci/automation/copy_wanda_image.py`
* Runs by default without uploading for initial testing. Once passed
with flag `--upload`, it will upload to the specified registry

---------

Signed-off-by: andrew <andrew@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
zzchun pushed a commit to zzchun/ray that referenced this pull request Dec 18, 2025
We now prebuild manylinux2014 with JDK as part of
ray-project#59204

We can directly consume this, rather than rebuilding each time we need
it

---------

Signed-off-by: andrew <andrew@anyscale.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
…59204)

Adds new buildkite job for building + uploading a base `manylinux2014`
image. Planning to build in flavors of with and without JDK included to
help reduce image size when JDK is not needed.
* Building is handled by a `wanda` builder step
* Uploading to `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-x86_64`
and `rayproject/manylinux2014:{DATE}.{SHORT_COMMIT}-jdk-x86_64 is
handled by a separate runner that calls the new
`ci/ray_ci/automation/copy_wanda_image.py`
* Runs by default without uploading for initial testing. Once passed
with flag `--upload`, it will upload to the specified registry

---------

Signed-off-by: andrew <andrew@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
We now prebuild manylinux2014 with JDK as part of
ray-project#59204

We can directly consume this, rather than rebuilding each time we need
it

---------

Signed-off-by: andrew <andrew@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devprod go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants