Skip to content

[release test] chunk release_tests.json uploads under Buildkite 500-job limit#62736

Merged
aslonnie merged 1 commit intomasterfrom
andrew/revup/master/chunk-release-test-uploads
Apr 17, 2026
Merged

[release test] chunk release_tests.json uploads under Buildkite 500-job limit#62736
aslonnie merged 1 commit intomasterfrom
andrew/revup/master/chunk-release-test-uploads

Conversation

@andrew-anyscale
Copy link
Copy Markdown
Contributor

Buildkite rejects pipeline uploads above an organization-level job limit (500 at time of writing) with "Pipeline upload rejected: The number of jobs in this upload exceeds your organization limit of 500." The release pipeline's release_tests.json has grown past that; the previous "step dependencies not found" failure had been masking it.

custom_image_build_and_test_init now splits the computed steps into batches of at most --max-jobs-per-upload jobs (default 450 for headroom) and writes each batch to .buildkite/release/release_tests_.json. Groups are atomic — a single group that exceeds the limit raises, matching the approach taken in rayci (ray-project/rayci#483, #484). custom-image-build-and-test-init.sh iterates the chunks and uploads each in order so dependencies between steps in different chunks still resolve.

Topic: chunk-release-test-uploads
Signed-off-by: andrew andrew@anyscale.com

@andrew-anyscale andrew-anyscale requested a review from a team as a code owner April 17, 2026 21:40
@andrew-anyscale
Copy link
Copy Markdown
Contributor Author

Reviews in this chain:
#62736 [release test] chunk release_tests.json uploads under Buildkite 500-job limit
 └#62737 [release test] move release_tests.json upload into custom_image_build_and_test_init

@andrew-anyscale
Copy link
Copy Markdown
Contributor Author

andrew-anyscale commented Apr 17, 2026

# head base diff date summary
0 4b679292 e14b30fd diff Apr 17 14:40 PM 3 files changed, 270 insertions(+), 12 deletions(-)
1 15f109b5 7ea5a1aa diff Apr 17 15:11 PM 2 files changed, 48 insertions(+)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements chunking for Buildkite pipeline uploads to prevent exceeding the organization's per-upload job limit. Key changes include the addition of logic to count jobs within groups (accounting for parallelism) and a greedy-packing algorithm to split steps into batches. The output now generates indexed JSON files (e.g., _0.json, _1.json), and the corresponding shell script has been updated to upload these chunks sequentially. Additionally, the PR includes new unit and integration tests to verify the batching logic and the cleanup of stale chunk files. I have no feedback to provide as there were no review comments to evaluate.

…ob limit

Buildkite rejects pipeline uploads above an organization-level job limit (500 at time of writing) with "Pipeline upload rejected: The number of jobs in this upload exceeds your organization limit of 500." The release pipeline's release_tests.json has grown past that; the previous "step dependencies not found" failure had been masking it.

custom_image_build_and_test_init now splits the computed steps into batches of at most --max-jobs-per-upload jobs (default 450 for headroom) and writes each batch to .buildkite/release/release_tests_<i>.json. Groups are atomic — a single group that exceeds the limit raises, matching the approach taken in rayci (ray-project/rayci#483, #484). custom-image-build-and-test-init.sh iterates the chunks and uploads each in order so dependencies between steps in different chunks still resolve.

Topic: chunk-release-test-uploads
Signed-off-by: andrew <andrew@anyscale.com>
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/master/chunk-release-test-uploads branch from 4b67929 to 15f109b Compare April 17, 2026 22:11
@aslonnie aslonnie added the go add ONLY when ready to merge, run all tests label Apr 17, 2026
@aslonnie aslonnie enabled auto-merge (squash) April 17, 2026 22:18
@aslonnie aslonnie self-requested a review April 17, 2026 22:18
@aslonnie aslonnie merged commit 2489c88 into master Apr 17, 2026
9 checks passed
@aslonnie aslonnie deleted the andrew/revup/master/chunk-release-test-uploads branch April 17, 2026 22:37
HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
…ob limit (ray-project#62736)

Buildkite rejects pipeline uploads above an organization-level job limit
(500 at time of writing) with "Pipeline upload rejected: The number of
jobs in this upload exceeds your organization limit of 500." The release
pipeline's release_tests.json has grown past that; the previous "step
dependencies not found" failure had been masking it.

custom_image_build_and_test_init now splits the computed steps into
batches of at most --max-jobs-per-upload jobs (default 450 for headroom)
and writes each batch to .buildkite/release/release_tests_<i>.json.
Groups are atomic — a single group that exceeds the limit raises,
matching the approach taken in rayci (ray-project/rayci#483, ray-project#484).
custom-image-build-and-test-init.sh iterates the chunks and uploads each
in order so dependencies between steps in different chunks still
resolve.

Signed-off-by: andrew <andrew@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants