Pipeline Plan 176

Implementation Plan: Merge Quality Gate Enforcement with Regression Blocking

Issue: #176 Branch: ci/merge-quality-gate-enforcement-with-regr-176 Template: full (devops)

Brainstorming & Design Decisions

Requirements Clarity

Minimum viable change: A new script sw-merge-gate.sh that collects metrics from the PR branch, compares against a baseline (main branch), and blocks merges when quality degrades beyond configurable thresholds. Integrated into the existing stage_merge() function in pipeline-stages-delivery.sh.

Implicit requirements:

Must work in both pipeline (automated) and CLI (manual) contexts
Must respect NO_GITHUB, SKIP_GATES, and headless/CI modes
Must be Bash 3.2 compatible (no associative arrays, no readarray, no ${var,,})
Must use atomic file writes (tmp + mv pattern)
Must integrate with existing event system and GitHub Checks API

Acceptance criteria (from issue):

Compare PR metrics against baseline (main branch)
Block if test coverage drops >5% (configurable)
Block if cyclomatic complexity increases >20% (configurable)
Block if new lint/static analysis violations introduced
--force-merge override with justification comment
Emit to events.jsonl + GitHub Checks API
Display quality delta in PR comment

Alternatives Considered

Approach A: Extend sw-regression.sh

Pros: Reuses existing baseline/comparison infrastructure
Cons: sw-regression.sh tracks script-level metrics (line count, function count), not code quality metrics (coverage, complexity, lint violations). Different concern. Mixing them creates a God script.
Blast radius: High — changes to regression detection could break existing shipwright regression check workflows
Trade-offs: Less code but higher coupling, lower maintainability

Approach B: New sw-merge-gate.sh script + integration into stage_merge() (CHOSEN)

Pros: Clean separation of concerns. Merge gate is a distinct concept from regression detection. Can be independently tested, sourced, or run standalone. Follows existing pattern (sw-oversight.sh, sw-quality.sh).
Cons: New file, but minimal — follows established patterns
Blast radius: Low — new file + small insertion in stage_merge() + policy additions
Trade-offs: More files but better modularity, easier to test and maintain

Approach C: Implement as a GitHub Actions workflow only

Pros: Runs in CI natively, no bash needed
Cons: Shipwright is a shell orchestration tool — all other gates are bash scripts. Would break the pattern. Can't be used in local/no-github mode.
Blast radius: N/A — wrong paradigm for this project
Trade-offs: Better CI integration but breaks project conventions entirely

Decision: Approach B. It follows the established pattern of sw-oversight.sh (gate script sourced/called from stage_merge()), keeps merge-gate concerns separate, and integrates cleanly with the existing policy/event/checks infrastructure.

Risk Analysis

Risk	What Could Break	Mitigation
Coverage collection fails (no tool available)	Gate blocks incorrectly on missing data	Graceful degradation: skip metric if tool unavailable, only gate on metrics that were successfully collected
Complexity calculation is slow on large repos	Merge stage timeout, pipeline stalls	Use `git diff --name-only` to scope to changed files only, cap at 30s timeout
Baseline capture on main fails	No baseline to compare, false blocks	Fall back to "no regression detected" when baseline missing (warn, don't block)
Force-merge bypass abused	Quality degrades over time despite gates	Require justification comment, emit event for audit trail, track in events.jsonl
False positives from metric noise	Developers frustrated, override fatigue	Configurable thresholds in policy.json with sensible defaults, percentage-based not absolute
Breaks existing merge flow	Active pipelines stuck at merge stage	Guard behind `SKIP_GATES` check (existing pattern), test with existing pipeline tests
`set -euo pipefail` interaction with grep	Double output or premature exit	Use `

Dependency Analysis

Depends on:

scripts/lib/helpers.sh — colors, emit_event, info/warn/error
scripts/lib/policy.sh — policy_get for threshold configuration
scripts/sw-github-checks.sh — gh_checks_create_run, gh_checks_update_run
scripts/lib/pipeline-github.sh — gh_comment_issue for PR comments
config/policy.json — threshold configuration
config/event-schema.json — event type registration

Depended on by:

scripts/lib/pipeline-stages-delivery.sh (stage_merge()) — calls the gate
Future CI workflows may invoke sw-merge-gate.sh directly

No circular dependency risks — this is a leaf-node script called by the pipeline.

Files to Modify

New Files

scripts/sw-merge-gate.sh — Main merge quality gate script (~400 lines)
scripts/sw-merge-gate-test.sh — Test suite (~300 lines)

Modified Files

scripts/lib/pipeline-stages-delivery.sh — Insert quality gate call in stage_merge() (~15 lines)
config/policy.json — Add mergeGate configuration section (~15 lines)
config/event-schema.json — Register new event types (~20 lines)

Implementation Steps

Step 1: Add policy configuration (`config/policy.json`)

Add a new mergeGate section to config/policy.json:

"mergeGate": {
  "enabled": true,
  "coverage_drop_threshold_percent": 5,
  "complexity_increase_threshold_percent": 20,
  "new_violations_block": true,
  "force_merge_requires_justification": true,
  "baseline_source": "main"
}

This follows the existing pattern where all thresholds live in policy.json (see pipeline, quality sections).

Step 2: Register event types (`config/event-schema.json`)

Add new event types:

merge_gate.check — Quality gate evaluation result (required: passed; optional: issue, coverage_delta, complexity_delta, violations_delta)
merge_gate.blocked — Merge blocked due to quality regression (required: issue; optional: reason, metrics)
merge_gate.override — Force-merge override used (required: issue; optional: justification, user)
merge_gate.baseline_captured — Baseline metrics saved (required: []; optional: branch, coverage, complexity, violations)

Step 3: Create `scripts/sw-merge-gate.sh`

Main script with these subcommands:

sw-merge-gate.sh baseline [--branch <branch>]    # Capture baseline from branch
sw-merge-gate.sh check [--pr <number>]            # Compare current vs baseline
sw-merge-gate.sh report [--json|--md]             # Generate delta report
sw-merge-gate.sh override --justification "..."   # Record force-merge override
sw-merge-gate.sh help                             # Usage

Metric collection functions:

collect_coverage_metrics() — Parse coverage from:
- npx vitest run --coverage JSON output (this project uses vitest with v8 provider)
- .claude/pipeline-artifacts/test-results.json if available
- Look for coverage-summary.json in standard locations
- Falls back gracefully if no coverage tool → returns empty
collect_complexity_metrics() — Calculate cyclomatic complexity:
- Count decision points (if, elif, case, while, for, &&, ||) in changed files
- Use git diff --name-only ${BASE_BRANCH}...HEAD to scope to changed files only
- Sum complexity across all changed files for both base and current
- Bash 3.2 safe: use grep + wc, not associative arrays
collect_lint_violations() — Count lint/static analysis issues:
- Run shellcheck on changed .sh files (primary — this is a shell project)
- Run npx eslint on changed .ts/.js files if available
- Count errors + warnings
- Return 0 if no lint tool available
capture_baseline() — Collect baseline metrics from base branch:
- Use git stash + git checkout in a subshell to avoid directory changes
- Collect all three metric types on base branch
- Store baseline in $ARTIFACTS_DIR/merge-gate-baseline.json
- Restore original branch/state
- Atomic write via tmp + mv
compare_metrics() — Delta calculation:
- Coverage: (baseline - current) → if drop > threshold → fail
- Complexity: ((current - baseline) / max(baseline, 1) * 100) → if increase > threshold → fail
- Violations: (current - baseline) → if any increase → fail (when new_violations_block is true)
- Handle division-by-zero: if baseline is 0, use absolute diff

generate_pr_comment() — Markdown quality delta table:

## 🔍 Quality Gate Results

| Metric | Before | After | Delta | Status |
|--------|--------|-------|-------|--------|
| Test Coverage | 78.5% | 76.2% | -2.3% | ✅ Pass |
| Complexity | 142 | 185 | +30.3% | ❌ Fail |
| Lint Violations | 3 | 3 | +0 | ✅ Pass |

**Result:** ❌ Blocked — complexity increase exceeds 20% threshold

> Override with: `shipwright merge-gate override --justification "reason"`

post_github_check() — Create/update GitHub Check run:
- Source sw-github-checks.sh for API functions
- Create check run named shipwright/merge-gate
- Update with conclusion (success/failure) and output summary
- Respect NO_GITHUB flag
cmd_check() — Main gate logic:
- Load thresholds from policy.json via policy_get
- Capture baseline (or load cached baseline)
- Collect current metrics
- Compare against thresholds
- Emit merge_gate.check event
- Post PR comment via gh issue comment
- Create/update GitHub Check run
- Return 0 (pass) or 1 (blocked)
cmd_override() — Force-merge recording:
- Require --justification argument (fail if empty)
- Write override record to $ARTIFACTS_DIR/merge-gate-override.json
- Emit merge_gate.override event
- Post justification as PR comment

Step 4: Integrate into `stage_merge()` (`pipeline-stages-delivery.sh`)

Insert quality gate check after the oversight gate block (~line 503) and before the approval gates block (~line 505):

# ── Merge quality gate: block on quality regression ──
if [[ -x "$SCRIPT_DIR/sw-merge-gate.sh" ]] && [[ "${SKIP_GATES:-false}" != "true" ]]; then
    info "Running merge quality gate..."
    local _mg_result=0
    bash "$SCRIPT_DIR/sw-merge-gate.sh" check \
        --artifacts "$ARTIFACTS_DIR" \
        --issue "${ISSUE_NUMBER:-0}" \
        --base "${BASE_BRANCH:-main}" \
        --pr "${PR_NUMBER:-}" 2>&1 || _mg_result=$?

    if [[ "$_mg_result" -ne 0 ]]; then
        # Check for force-merge override
        if [[ -f "$ARTIFACTS_DIR/merge-gate-override.json" ]]; then
            warn "Quality gate failed but override present — proceeding"
            emit_event "merge_gate.override" "issue=${ISSUE_NUMBER:-0}"
        else
            error "Merge quality gate blocked — quality regression detected"
            emit_event "merge_gate.blocked" "issue=${ISSUE_NUMBER:-0}"
            log_stage "merge" "BLOCKED: quality gate regression"
            return 1
        fi
    fi
fi

Step 5: Create test suite (`scripts/sw-merge-gate-test.sh`)

Following the established pattern from sw-regression-test.sh:

Test cases (15 total):

Baseline capture — Verify baseline JSON created with expected fields (coverage, complexity, violations)
Check passes (no regression) — Current metrics equal or better than baseline → exit 0
Coverage drop blocks — Coverage drops >5% → exit 1
Coverage drop within threshold — Coverage drops 3% (under 5%) → exit 0
Complexity increase blocks — Complexity increases >20% → exit 1
Complexity increase within threshold — Complexity increases 15% (under 20%) → exit 0
New violations block — Violation count increases → exit 1
Multiple regressions — Report all failures, not just first
Force-merge override with justification — Override file present → exit 0
Force-merge without justification — Rejected → exit 1
No baseline available — Warn but don't block → exit 0
SKIP_GATES=true — Gate skipped entirely
Event emission — Verify events written to events.jsonl
PR comment format — Verify markdown table generated correctly
Threshold configuration — Custom thresholds from policy.json respected

Task Checklist

Task Dependencies

Tasks 1-2 have no dependencies (config changes)
Tasks 3-7 depend on Tasks 1-2 (script needs config)
Task 8 depends on Tasks 3-7 (integration needs the script)
Tasks 9-11 depend on Tasks 3-7 (tests need the script)
Task 12 depends on all previous tasks
Task 13 depends on Task 12

Testing Approach

Test Pyramid Breakdown

Unit tests (12 tests): Individual metric collection functions, threshold comparison, JSON parsing, report generation, override validation — all in sw-merge-gate-test.sh using mock binaries and temp directories
Integration tests (3 tests): Full check command flow with mock git/coverage data, event emission verification, PR comment format validation
E2E tests (0): Not applicable — this is infrastructure tooling tested via bash test harness with mocks. Real E2E would require actual GitHub API access.

Coverage Targets

100% of subcommands (baseline, check, report, override, help)
100% of blocking conditions (coverage drop, complexity increase, new violations)
100% of bypass paths (SKIP_GATES, NO_GITHUB, missing baseline, override present)
Event emission verified for every outcome path

Critical Paths to Test

Happy path: Baseline captured → check runs → all metrics within thresholds → exit 0, PR comment with all-pass table posted

Error cases:

Coverage drops 8% (exceeds 5% threshold) → exit 1, blocking comment with delta table posted, merge_gate.blocked event emitted
Complexity increases 25% (exceeds 20% threshold) → exit 1, blocking comment posted
Both coverage and complexity regress → both reported in same table, exit 1

Edge cases:

Baseline has 0 complexity (division by zero) → use absolute diff instead of percentage, don't divide
No changed code files (only .md changes) → skip complexity/lint checks, pass gate
No coverage tool available → skip coverage metric, only gate on complexity + violations
Override file exists but gate passes → ignore override, report clean pass

Pipeline Plan 176

Implementation Plan: Merge Quality Gate Enforcement with Regression Blocking

Brainstorming & Design Decisions

Requirements Clarity

Alternatives Considered

Risk Analysis

Dependency Analysis

Files to Modify

New Files

Modified Files

Implementation Steps

Step 1: Add policy configuration (config/policy.json)

Step 2: Register event types (config/event-schema.json)

Step 3: Create scripts/sw-merge-gate.sh

Step 4: Integrate into stage_merge() (pipeline-stages-delivery.sh)

Step 5: Create test suite (scripts/sw-merge-gate-test.sh)

Task Checklist

Task Dependencies

Testing Approach

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Definition of Done

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Add policy configuration (`config/policy.json`)

Step 2: Register event types (`config/event-schema.json`)

Step 3: Create `scripts/sw-merge-gate.sh`

Step 4: Integrate into `stage_merge()` (`pipeline-stages-delivery.sh`)

Step 5: Create test suite (`scripts/sw-merge-gate-test.sh`)