Reduce CI cost#19919
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19919
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review this code |
|
Claude finished @metascroy's task in 2m 14s —— View job Code Review: Reduce CI Cost
Overall AssessmentThe design is sound — a deterministic 25% SHA-based sampling that's centralized in Issues1.
2. Missing As flagged by CodeQL, the following workflows lack explicit
Adding permissions: {} # No token permissions needed3. Misleading comment in
4.
github.event_name != 'pull_request' ||
contains(...) || ...Meaning: always run on push, path-filtered on PRs. The new condition is: contains(...) || ... ||
needs.run-decision.outputs.is-full-run == 'true'Meaning: path-filtered on both push and PR. For PRs, Design Observations (Non-blocking)Sampling approach is solid. Using the last hex character of SHA for deterministic 25% sampling ( Single source of truth. Having Promote workflow is well-secured. SHA validation (40-char hex, must exist, must be reachable from main/release) and the idempotent tag-exists check are good defensive programming. Concurrency control on the promote workflow ( SummaryThe PR achieves its goal of ~75% macOS runner cost reduction with a clean, maintainable design. The main actionable issue is the ungated |
| # Keep in sync with the sample in viable-strict-gate.yml. | ||
| if [ "$IS_FULL" = "false" ] && [ "$EVENT_NAME" = "push" ]; then | ||
| case "$SHA" in | ||
| *0|*4|*8|*c) IS_FULL=true ;; |
There was a problem hiding this comment.
you can hit 100 consecutive sha ending it 0,4,8,c in a row :p
There was a problem hiding this comment.
very unlikely, but I take the point :)
I'll change sampling to be based on git history depth so it regularly runs every 4 commits
|
|
||
| test-coreml-bc-macos: | ||
| needs: changed-files | ||
| needs: [changed-files, run-decision] |
There was a problem hiding this comment.
also do it for windows?
There was a problem hiding this comment.
I'll focus on macOS + core infra (viable strict update changes and main PR selection job) in this PR. windows and cuda enablement can be follow-ups.
| contains(needs.changed-files.outputs.changed-files, '.github/workflows/trunk.yml') || | ||
| needs.run-decision.outputs.is-full-run == 'true' |
There was a problem hiding this comment.
why not add the sha % 4 logic here and not kick off a new job? curious
There was a problem hiding this comment.
B/c then we can use it in other jobs (like mlx.yml or cuda.yml) and keep the selection logic in one place
digantdesai
left a comment
There was a problem hiding this comment.
LGTM. stamping to unblock.
|
@claude review this code |
|
Claude finished @metascroy's task in 2m 10s —— View job Code Review: Reduce CI Cost
Overall AssessmentWell-designed approach to reducing macOS CI costs. The depth-based sampling (addressing digantdesai's review feedback) is an improvement over the original SHA-based approach since it guarantees hard bounds on gaps between full runs. The architecture is clean with Issues1.
If you want deterministic "every 4th push to main," use: DEPTH=$(git rev-list --first-parent --count "$SHA")This counts only the linear history along the default branch, giving exactly one count per push event. 2. Stale comment in
3. When a commit is pushed to main, Whether Design Observations (Non-blocking)
SummaryThe main actionable issue is using |
|
@claude review this code |
|
Claude finished @metascroy's task in 3m 7s —— View job Code Review: Reduce CI Cost
Overall AssessmentThe design is solid — depth-based deterministic 25% sampling centralized in Issues1. Misleading comment about
This is incorrect. GitHub Actions The old code worked because it had This matters because a future reader trusting the comment might remove the Suggested fix: # Emits PR diff file list; non-PR events emit '*'. Note: contains('*', 'path')
# is false (literal substring match, not glob), so path-filtered jobs on push
# rely on run-decision's is-full-run output to gate execution.2.
This means these path-filtered jobs never run on non-sampled push commits, even if the push touches their relevant paths. This is different from the PR behavior where path filtering is meaningful. The same applies to If the intent is "on push: only run on sampled commits (25%)" regardless of paths touched, then this is correct behavior — just know that path filtering has no effect on push events with the current 3.
However, verify that the 4. Promote workflow —
for branch in main $(git branch -r | grep -E 'origin/release/' | sed 's|origin/||'); do
Design Observations (Non-blocking)
SummaryThe PR is well-structured and the core mechanism is correct. The main actionable items are:
The path-filter-on-push semantic (issue #2) is a design choice that's fine as-is if the intent is "push events are either full-run or skip" — just worth documenting explicitly. |
|
@claude review this code |
|
Claude finished @metascroy's task in 1m 31s —— View job Code Review: Reduce CI Cost
Overall AssessmentThis is a well-architected CI cost reduction. The depth-based deterministic 25% sampling centralized in Issues1.
This is unlikely for typical single-commit pushes to main, but could happen for merge commits or large automated updates. If it matters, you could add a check: if the response indicates 2.
3. Race between If a force-push to main replaces commits (unusual for ExecuTorch), Design Observations (Non-blocking)
SummaryThe PR is clean and ready. The only potentially actionable item is the 300-file Compare API limit (#1), which is a known GitHub limitation and unlikely to matter in practice for this repo's merge patterns. All previously identified issues have been addressed. The design achieves its ~75% macOS runner cost savings goal with good ergonomics for maintainers. |
viable-strict-gate.yml and mlx.yml had permissions: {} but call
_ci-run-decision.yml, which needs contents: read for actions/checkout.
GitHub intersects caller permissions with callee needs ({} ∩ {contents:
read} = {}), so both workflows were rejected at registration since
#19919 landed. The gate hasn't run (so update-viablestrict has had no
signal), and mlx.yml hasn't triggered (MLX / * checks show [does not
exist] on HUD). Loosen both to permissions: contents: read. Audited all
other callers of _ci-run-decision.yml / _get-changed-files.yml; none
affected.
Currently every push to main runs all macOS jobs unconditionally. This PR path-filters macOS jobs on push as well as PR, but samples 25% of push commits (deterministic by SHA hash) to still run the full suite for HUD/bisection signal. A new viable-strict-gate workflow fails on the 75% non-sampled commits and is added to viable/strict's requires list, so viable/strict only advances on commits where every job ran. Estimated ~75% macOS runner savings.
CI behavior changes: