Skip to content

Optimize CI workflow parallelization and reduce PR overhead#6270

Merged
pelikhan merged 2 commits intomainfrom
copilot/optimize-ci-workflow
Dec 12, 2025
Merged

Optimize CI workflow parallelization and reduce PR overhead#6270
pelikhan merged 2 commits intomainfrom
copilot/optimize-ci-workflow

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 12, 2025

Reduces CI runtime by ~40-50% on PRs and ~25-30% on main by parallelizing independent jobs and limiting expensive scans to main branch.

Changes

Job parallelization - Remove test bottleneck:

  • integration, security, logs-token-check, security-scan: needs: [test]needs: [lint]
  • These jobs don't consume test outputs, only need linting to pass
  • Reduces critical path from ~7.5min to ~5min

Conditional execution - Skip on PRs:

  • bench: Main-only (performance tracking, not PR validation)
  • fuzz: Main-only (10s provides minimal value, extended fuzzing needs hours)
  • security-scan: Main-only (zizmor/actionlint/poutine matrix - 3 jobs)

Impact

PRs: 5 fewer jobs, ~3-4min faster
Main: All jobs run, ~2-3min faster via parallelization

PRs retain full correctness coverage (unit + 6 integration groups + build + lint + js).

Original prompt

This section details on the original issue you should resolve

<issue_title>[ci-coach] Optimize CI workflow parallelization and reduce PR overhead</issue_title>
<issue_description>## CI Optimization Proposal

This PR implements high-impact optimizations to reduce CI run time by 40-50% on PRs and 25-30% on main branch pushes, based on analysis of the last 100 workflow runs.

Analysis Summary

Current State (last 100 runs):

  • Success rate: 13%
  • Failure rate: 74%
  • Key bottleneck: 7 jobs unnecessarily waiting on unit tests to complete before starting

Optimizations Implemented

1. Remove Test Job Dependency Bottleneck

Type: Job Parallelization
Impact: ~2-3 minutes per run (33% reduction in critical path)
Risk: LOW

Changes:

  • integration: needs: [test]needs: [lint]
  • security: needs: [test]needs: [lint]
  • logs-token-check: needs: [test]needs: [lint]

Current Job Flow:

lint (2min) → test (2.5min) → [integration + security + logs-token-check + ...] (3min)
Total critical path: ~7.5 minutes

Optimized Job Flow:

lint (2min) → [test (2.5min) + integration (3min) + security (3min) + logs-token-check (2min)] in parallel
Total critical path: ~5 minutes

Rationale: Integration tests, security regression tests, and logs token checks don't consume any outputs from unit tests. They only need the codebase to pass linting. Running them in parallel with unit tests eliminates a major bottleneck.

Safety: These are independent test suites with no shared artifacts or dependencies beyond the source code.


2. Conditional Benchmark Execution

Type: Selective Testing
Impact: Eliminates 1 job from PRs
Risk: LOW

Changes:

  • bench: Added if: github.ref == 'refs/heads/main'

Rationale: Benchmarks are for performance trend tracking, not PR validation. Running them only on main branch provides the historical data needed while reducing PR overhead.

Safety: PRs still get comprehensive correctness testing (unit + integration + build + lint + js). Performance tracking on main is sufficient.


3. Conditional Fuzz Testing

Type: Selective Testing
Impact: Eliminates 1 job from PRs
Risk: LOW

Changes:

  • fuzz: Added if: github.ref == 'refs/heads/main'

Rationale: Fuzz testing runs for only 10 seconds per target on PRs, which provides minimal coverage. Effective fuzz testing requires hours or days. Running extended fuzzing only on main branch is more valuable than token 10s runs on every PR.

Safety: PRs still have extensive test coverage. Focused fuzzing on main provides better security validation than brief PR runs.


4. Conditional Security Scans

Type: Selective Testing
Impact: Eliminates 3 matrix jobs (zizmor, actionlint, poutine) from PRs
Risk: LOW

Changes:

  • security-scan: needs: [test]needs: [lint]
  • security-scan: Added if: github.ref == 'refs/heads/main'

Rationale: Security scans (zizmor, actionlint, poutine) are expensive checks that rarely find issues in most PRs. These are monitoring/analysis tools best suited for main branch validation.

Safety: PRs still get comprehensive testing. Security regression tests still run. Full security scanning on main ensures the baseline remains secure.


Expected Impact

For Pull Requests:

  • Time savings: ~3-4 minutes per run (40-50% reduction)
  • Jobs eliminated: 5 (bench + fuzz + 3 security scans)
  • Jobs remaining: 7 core validation jobs (lint, test, integration×6, build, js, actions-build)
  • Critical path: Reduced from ~7.5 minutes to ~5 minutes

For Main Branch:

  • Time savings: ~2-3 minutes per run (25-30% reduction)
  • Jobs run: All 12 jobs (including bench, fuzz, security scans)
  • Critical path: Reduced from ~7.5 minutes to ~5 minutes

Validation

YAML syntax: Manually verified, all changes follow GitHub Actions syntax
Job dependencies: Reviewed to ensure no artifact dependencies broken
Risk assessment: All changes are LOW risk with high impact

Testing Plan

After merge:

  1. Monitor first PR build to verify job execution order
  2. Verify integration/security/logs-token-check start immediately after lint
  3. Confirm benchmark/fuzz/security-scan jobs skip on PRs
  4. Compare run times before/after (expect ~3-4 min savings on PRs)
  5. Validate all jobs still run on main branch pushes

Future Optimization Opportunities

Based on this analysis, additional improvements for future consideration:

  1. Rebalance integration test matrix: The "Workflow" test group likely runs more tests than the specific CLI groups (no pattern filter). Could split into more balanced groups.

  2. Path-based test filtering: Skip tests for unrelated file changes (e.g., skip integration tests for documentation-only changes).

  3. Unit test splitting: If unit tests grow beyond 3 minutes, consider splitti...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize CI workflow parallelization and reduce PR overhead Optimize CI workflow parallelization and reduce PR overhead Dec 12, 2025
Copilot AI requested a review from mnkiefer December 12, 2025 14:30
@pelikhan pelikhan marked this pull request as ready for review December 12, 2025 14:55
@pelikhan pelikhan merged commit b3bb271 into main Dec 12, 2025
4 checks passed
@pelikhan pelikhan deleted the copilot/optimize-ci-workflow branch December 12, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ci-coach] Optimize CI workflow parallelization and reduce PR overhead

3 participants