-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 176
Issue: #176
Branch: ci/merge-quality-gate-enforcement-with-regr-176
Template: full (devops)
Minimum viable change: A new script sw-merge-gate.sh that collects metrics from the PR branch, compares against a baseline (main branch), and blocks merges when quality degrades beyond configurable thresholds. Integrated into the existing stage_merge() function in pipeline-stages-delivery.sh.
Implicit requirements:
- Must work in both pipeline (automated) and CLI (manual) contexts
- Must respect
NO_GITHUB,SKIP_GATES, and headless/CI modes - Must be Bash 3.2 compatible (no associative arrays, no
readarray, no${var,,}) - Must use atomic file writes (tmp + mv pattern)
- Must integrate with existing event system and GitHub Checks API
Acceptance criteria (from issue):
- Compare PR metrics against baseline (main branch)
- Block if test coverage drops >5% (configurable)
- Block if cyclomatic complexity increases >20% (configurable)
- Block if new lint/static analysis violations introduced
-
--force-mergeoverride with justification comment - Emit to events.jsonl + GitHub Checks API
- Display quality delta in PR comment
Approach A: Extend sw-regression.sh
- Pros: Reuses existing baseline/comparison infrastructure
- Cons:
sw-regression.shtracks script-level metrics (line count, function count), not code quality metrics (coverage, complexity, lint violations). Different concern. Mixing them creates a God script. - Blast radius: High — changes to regression detection could break existing
shipwright regression checkworkflows - Trade-offs: Less code but higher coupling, lower maintainability
Approach B: New sw-merge-gate.sh script + integration into stage_merge() (CHOSEN)
- Pros: Clean separation of concerns. Merge gate is a distinct concept from regression detection. Can be independently tested, sourced, or run standalone. Follows existing pattern (
sw-oversight.sh,sw-quality.sh). - Cons: New file, but minimal — follows established patterns
- Blast radius: Low — new file + small insertion in
stage_merge()+ policy additions - Trade-offs: More files but better modularity, easier to test and maintain
Approach C: Implement as a GitHub Actions workflow only
- Pros: Runs in CI natively, no bash needed
- Cons: Shipwright is a shell orchestration tool — all other gates are bash scripts. Would break the pattern. Can't be used in local/no-github mode.
- Blast radius: N/A — wrong paradigm for this project
- Trade-offs: Better CI integration but breaks project conventions entirely
Decision: Approach B. It follows the established pattern of sw-oversight.sh (gate script sourced/called from stage_merge()), keeps merge-gate concerns separate, and integrates cleanly with the existing policy/event/checks infrastructure.
| Risk | What Could Break | Mitigation |
|---|---|---|
| Coverage collection fails (no tool available) | Gate blocks incorrectly on missing data | Graceful degradation: skip metric if tool unavailable, only gate on metrics that were successfully collected |
| Complexity calculation is slow on large repos | Merge stage timeout, pipeline stalls | Use git diff --name-only to scope to changed files only, cap at 30s timeout |
| Baseline capture on main fails | No baseline to compare, false blocks | Fall back to "no regression detected" when baseline missing (warn, don't block) |
| Force-merge bypass abused | Quality degrades over time despite gates | Require justification comment, emit event for audit trail, track in events.jsonl |
| False positives from metric noise | Developers frustrated, override fatigue | Configurable thresholds in policy.json with sensible defaults, percentage-based not absolute |
| Breaks existing merge flow | Active pipelines stuck at merge stage | Guard behind SKIP_GATES check (existing pattern), test with existing pipeline tests |
set -euo pipefail interaction with grep |
Double output or premature exit | Use ` |
Depends on:
-
scripts/lib/helpers.sh— colors,emit_event,info/warn/error -
scripts/lib/policy.sh—policy_getfor threshold configuration -
scripts/sw-github-checks.sh—gh_checks_create_run,gh_checks_update_run -
scripts/lib/pipeline-github.sh—gh_comment_issuefor PR comments -
config/policy.json— threshold configuration -
config/event-schema.json— event type registration
Depended on by:
-
scripts/lib/pipeline-stages-delivery.sh(stage_merge()) — calls the gate - Future CI workflows may invoke
sw-merge-gate.shdirectly
No circular dependency risks — this is a leaf-node script called by the pipeline.
-
scripts/sw-merge-gate.sh— Main merge quality gate script (~400 lines) -
scripts/sw-merge-gate-test.sh— Test suite (~300 lines)
-
scripts/lib/pipeline-stages-delivery.sh— Insert quality gate call instage_merge()(~15 lines) -
config/policy.json— AddmergeGateconfiguration section (~15 lines) -
config/event-schema.json— Register new event types (~20 lines)
Add a new mergeGate section to config/policy.json:
"mergeGate": {
"enabled": true,
"coverage_drop_threshold_percent": 5,
"complexity_increase_threshold_percent": 20,
"new_violations_block": true,
"force_merge_requires_justification": true,
"baseline_source": "main"
}This follows the existing pattern where all thresholds live in policy.json (see pipeline, quality sections).
Add new event types:
-
merge_gate.check— Quality gate evaluation result (required:passed; optional:issue,coverage_delta,complexity_delta,violations_delta) -
merge_gate.blocked— Merge blocked due to quality regression (required:issue; optional:reason,metrics) -
merge_gate.override— Force-merge override used (required:issue; optional:justification,user) -
merge_gate.baseline_captured— Baseline metrics saved (required: []; optional:branch,coverage,complexity,violations)
Main script with these subcommands:
sw-merge-gate.sh baseline [--branch <branch>] # Capture baseline from branch
sw-merge-gate.sh check [--pr <number>] # Compare current vs baseline
sw-merge-gate.sh report [--json|--md] # Generate delta report
sw-merge-gate.sh override --justification "..." # Record force-merge override
sw-merge-gate.sh help # Usage
Metric collection functions:
-
collect_coverage_metrics()— Parse coverage from:-
npx vitest run --coverageJSON output (this project uses vitest with v8 provider) -
.claude/pipeline-artifacts/test-results.jsonif available - Look for coverage-summary.json in standard locations
- Falls back gracefully if no coverage tool → returns empty
-
-
collect_complexity_metrics()— Calculate cyclomatic complexity:- Count decision points (
if,elif,case,while,for,&&,||) in changed files - Use
git diff --name-only ${BASE_BRANCH}...HEADto scope to changed files only - Sum complexity across all changed files for both base and current
- Bash 3.2 safe: use grep + wc, not associative arrays
- Count decision points (
-
collect_lint_violations()— Count lint/static analysis issues:- Run
shellcheckon changed.shfiles (primary — this is a shell project) - Run
npx eslinton changed.ts/.jsfiles if available - Count errors + warnings
- Return 0 if no lint tool available
- Run
-
capture_baseline()— Collect baseline metrics from base branch:- Use
git stash+git checkoutin a subshell to avoid directory changes - Collect all three metric types on base branch
- Store baseline in
$ARTIFACTS_DIR/merge-gate-baseline.json - Restore original branch/state
- Atomic write via tmp + mv
- Use
-
compare_metrics()— Delta calculation:- Coverage:
(baseline - current)→ if drop > threshold → fail - Complexity:
((current - baseline) / max(baseline, 1) * 100)→ if increase > threshold → fail - Violations:
(current - baseline)→ if any increase → fail (whennew_violations_blockis true) - Handle division-by-zero: if baseline is 0, use absolute diff
- Coverage:
-
generate_pr_comment()— Markdown quality delta table:## 🔍 Quality Gate Results | Metric | Before | After | Delta | Status | |--------|--------|-------|-------|--------| | Test Coverage | 78.5% | 76.2% | -2.3% | ✅ Pass | | Complexity | 142 | 185 | +30.3% | ❌ Fail | | Lint Violations | 3 | 3 | +0 | ✅ Pass | **Result:** ❌ Blocked — complexity increase exceeds 20% threshold > Override with: `shipwright merge-gate override --justification "reason"`
-
post_github_check()— Create/update GitHub Check run:- Source
sw-github-checks.shfor API functions - Create check run named
shipwright/merge-gate - Update with conclusion (success/failure) and output summary
- Respect
NO_GITHUBflag
- Source
-
cmd_check()— Main gate logic:- Load thresholds from policy.json via
policy_get - Capture baseline (or load cached baseline)
- Collect current metrics
- Compare against thresholds
- Emit
merge_gate.checkevent - Post PR comment via
gh issue comment - Create/update GitHub Check run
- Return 0 (pass) or 1 (blocked)
- Load thresholds from policy.json via
-
cmd_override()— Force-merge recording:- Require
--justificationargument (fail if empty) - Write override record to
$ARTIFACTS_DIR/merge-gate-override.json - Emit
merge_gate.overrideevent - Post justification as PR comment
- Require
Insert quality gate check after the oversight gate block (~line 503) and before the approval gates block (~line 505):
# ── Merge quality gate: block on quality regression ──
if [[ -x "$SCRIPT_DIR/sw-merge-gate.sh" ]] && [[ "${SKIP_GATES:-false}" != "true" ]]; then
info "Running merge quality gate..."
local _mg_result=0
bash "$SCRIPT_DIR/sw-merge-gate.sh" check \
--artifacts "$ARTIFACTS_DIR" \
--issue "${ISSUE_NUMBER:-0}" \
--base "${BASE_BRANCH:-main}" \
--pr "${PR_NUMBER:-}" 2>&1 || _mg_result=$?
if [[ "$_mg_result" -ne 0 ]]; then
# Check for force-merge override
if [[ -f "$ARTIFACTS_DIR/merge-gate-override.json" ]]; then
warn "Quality gate failed but override present — proceeding"
emit_event "merge_gate.override" "issue=${ISSUE_NUMBER:-0}"
else
error "Merge quality gate blocked — quality regression detected"
emit_event "merge_gate.blocked" "issue=${ISSUE_NUMBER:-0}"
log_stage "merge" "BLOCKED: quality gate regression"
return 1
fi
fi
fiFollowing the established pattern from sw-regression-test.sh:
Test cases (15 total):
- Baseline capture — Verify baseline JSON created with expected fields (coverage, complexity, violations)
- Check passes (no regression) — Current metrics equal or better than baseline → exit 0
- Coverage drop blocks — Coverage drops >5% → exit 1
- Coverage drop within threshold — Coverage drops 3% (under 5%) → exit 0
- Complexity increase blocks — Complexity increases >20% → exit 1
- Complexity increase within threshold — Complexity increases 15% (under 20%) → exit 0
- New violations block — Violation count increases → exit 1
- Multiple regressions — Report all failures, not just first
- Force-merge override with justification — Override file present → exit 0
- Force-merge without justification — Rejected → exit 1
- No baseline available — Warn but don't block → exit 0
- SKIP_GATES=true — Gate skipped entirely
- Event emission — Verify events written to events.jsonl
- PR comment format — Verify markdown table generated correctly
- Threshold configuration — Custom thresholds from policy.json respected
- Task 1: Add
mergeGateconfig section toconfig/policy.json - Task 2: Register
merge_gate.*event types inconfig/event-schema.json - Task 3: Create
scripts/sw-merge-gate.sh— metric collection functions (coverage, complexity, violations) - Task 4: Create
scripts/sw-merge-gate.sh— baseline capture and comparison logic - Task 5: Create
scripts/sw-merge-gate.sh— PR comment generation and GitHub Checks integration - Task 6: Create
scripts/sw-merge-gate.sh— override/force-merge recording - Task 7: Create
scripts/sw-merge-gate.sh— CLI interface (baseline/check/report/override/help) - Task 8: Integrate quality gate call into
stage_merge()inpipeline-stages-delivery.sh - Task 9: Create
scripts/sw-merge-gate-test.sh— test setup and helpers - Task 10: Create
scripts/sw-merge-gate-test.sh— baseline and comparison tests - Task 11: Create
scripts/sw-merge-gate-test.sh— blocking/override/edge-case tests - Task 12: Run full test suite (
npm test) and fix any failures - Task 13: Verify version consistency across all new/modified files
- Tasks 1-2 have no dependencies (config changes)
- Tasks 3-7 depend on Tasks 1-2 (script needs config)
- Task 8 depends on Tasks 3-7 (integration needs the script)
- Tasks 9-11 depend on Tasks 3-7 (tests need the script)
- Task 12 depends on all previous tasks
- Task 13 depends on Task 12
-
Unit tests (12 tests): Individual metric collection functions, threshold comparison, JSON parsing, report generation, override validation — all in
sw-merge-gate-test.shusing mock binaries and temp directories -
Integration tests (3 tests): Full
checkcommand flow with mock git/coverage data, event emission verification, PR comment format validation - E2E tests (0): Not applicable — this is infrastructure tooling tested via bash test harness with mocks. Real E2E would require actual GitHub API access.
- 100% of subcommands (baseline, check, report, override, help)
- 100% of blocking conditions (coverage drop, complexity increase, new violations)
- 100% of bypass paths (SKIP_GATES, NO_GITHUB, missing baseline, override present)
- Event emission verified for every outcome path
Happy path: Baseline captured → check runs → all metrics within thresholds → exit 0, PR comment with all-pass table posted
Error cases:
- Coverage drops 8% (exceeds 5% threshold) → exit 1, blocking comment with delta table posted, merge_gate.blocked event emitted
- Complexity increases 25% (exceeds 20% threshold) → exit 1, blocking comment posted
- Both coverage and complexity regress → both reported in same table, exit 1
Edge cases:
- Baseline has 0 complexity (division by zero) → use absolute diff instead of percentage, don't divide
- No changed code files (only
.mdchanges) → skip complexity/lint checks, pass gate - No coverage tool available → skip coverage metric, only gate on complexity + violations
- Override file exists but gate passes → ignore override, report clean pass
-
sw-merge-gate.sh baselinecaptures coverage, complexity, and violation metrics from a branch -
sw-merge-gate.sh checkcompares PR metrics against baseline and returns pass/fail - Merge is blocked when coverage drops >5% (configurable via
policy.json) - Merge is blocked when complexity increases >20% (configurable via
policy.json) - Merge is blocked when new lint/static analysis violations are introduced
-
--force-merge/ override works with required justification comment - Quality gate results emitted to
events.jsonlwith correct event types - GitHub Check run created/updated with quality gate results (when GitHub available)
- Quality delta displayed as markdown table in PR comment
- Gate respects
SKIP_GATES,NO_GITHUB, and headless mode flags - All 15 test cases pass in
sw-merge-gate-test.sh -
npm testpasses with no regressions introduced - VERSION variable matches
package.jsonversion in all new/modified files