[ci] convert ml.rayci.yml matrix to array syntax#62802
[ci] convert ml.rayci.yml matrix to array syntax#62802andrew-anyscale wants to merge 1 commit intoandrew/revup/master/convert-array-core-testsfrom
Conversation
|
Reviews in this chain: |
|
There was a problem hiding this comment.
Code Review
This pull request migrates the Buildkite configuration in .buildkite/ml.rayci.yml from using matrix to array for several build and test steps, including updates to dependency references and template variables. A critical issue was identified in the sharding logic for the train v2 tests; the current implementation still uses Buildkite environment variables for worker identification instead of the newly defined array.worker_id, which will cause redundant test execution across the array jobs.
| - bazel run //ci/ray_ci:test_in_docker -- //python/ray/train/... ml | ||
| --workers "$${BUILDKITE_PARALLEL_JOB_COUNT}" --worker-id "$${BUILDKITE_PARALLEL_JOB}" --parallelism-per-worker 3 | ||
| --python-version {{matrix.python}} --build-name mlbuild-py{{matrix.python}} | ||
| --python-version {{array.python}} --build-name mlbuild-py{{array.python}} |
There was a problem hiding this comment.
The command for the train v2 {{array.python}} tests step still uses Buildkite environment variables ($${BUILDKITE_PARALLEL_JOB_COUNT} and $${BUILDKITE_PARALLEL_JOB}) for sharding, even though it has been converted to use an array for worker_id. Since parallelism is not set for this step, these environment variables will default to 1 and 0 respectively, causing both jobs in the array to run the same tests (worker 0).
To correctly shard the tests across the array, you should use the {{array.worker_id}} variable and a hardcoded total worker count (2), matching the implementation in the GPU test step at line 148.
--workers 2 --worker-id {{array.worker_id}} --parallelism-per-worker 3
--python-version {{array.python}} --build-name mlbuild-py{{array.python}}77cb290 to
82c6e3b
Compare
0f2e894 to
59e0122
Compare
82c6e3b to
0889fb4
Compare
59e0122 to
af9fd22
Compare
0889fb4 to
21e46f4
Compare
238bf95 to
7cdf718
Compare
682590d to
bae6542
Compare
7cdf718 to
f724ae4
Compare
bae6542 to
bc1d74e
Compare
Convert the three matrix build steps (minbuild-ml, mlbuild-multipy, mlgpubuild-multipy) and the three matrix-setup test labels ("train v2 {{python}} tests", "train v2 gpu {{python}} tests", "tune {{python}} tests") from matrix to array syntax. The three newly-array test steps use ($) against mlbuild-multipy and mlgpubuild-multipy since both sides share the python array key, and the build steps refine their oss-ci-base_* dependencies from (*) / (python=3.10) to ($) for the same reason. Every remaining plain depends_on to minbuild-ml / mlbuild-multipy / mlgpubuild-multipy becomes (python=3.10) since every consuming step pins py3.10. The non-array mllightning1gpubuild is unaffected.
Relative: convert-array-core-tests
Topic: convert-array-ml
Signed-off-by: andrew <andrew@anyscale.com>
bc1d74e to
f292ea3
Compare
8cf2eef to
7f9373b
Compare
Convert the three matrix build steps (minbuild-ml, mlbuild-multipy, mlgpubuild-multipy) and the three matrix-setup test labels ("train v2 {{python}} tests", "train v2 gpu {{python}} tests", "tune {{python}} tests") from matrix to array syntax. The three newly-array test steps use ($) against mlbuild-multipy and mlgpubuild-multipy since both sides share the python array key, and the build steps refine their oss-ci-base_* dependencies from (*) / (python=3.10) to ($ ) for the same reason. Every remaining plain depends_on to minbuild-ml / mlbuild-multipy / mlgpubuild-multipy becomes (python=3.10) since every consuming step pins py3.10. The non-array mllightning1gpubuild is unaffected.
Relative: convert-array-core-tests
Topic: convert-array-ml
Signed-off-by: andrew andrew@anyscale.com