Skip to content

Pipeline Plan 176

ezigus edited this page Mar 17, 2026 · 1 revision

Implementation Plan: Merge Quality Gate Enforcement with Regression Blocking

Issue: #176 Branch: ci/merge-quality-gate-enforcement-with-regr-176 Template: full (devops)


Brainstorming & Design Decisions

Requirements Clarity

Minimum viable change: A new script sw-merge-gate.sh that collects metrics from the PR branch, compares against a baseline (main branch), and blocks merges when quality degrades beyond configurable thresholds. Integrated into the existing stage_merge() function in pipeline-stages-delivery.sh.

Implicit requirements:

  • Must work in both pipeline (automated) and CLI (manual) contexts
  • Must respect NO_GITHUB, SKIP_GATES, and headless/CI modes
  • Must be Bash 3.2 compatible (no associative arrays, no readarray, no ${var,,})
  • Must use atomic file writes (tmp + mv pattern)
  • Must integrate with existing event system and GitHub Checks API

Acceptance criteria (from issue):

  1. Compare PR metrics against baseline (main branch)
  2. Block if test coverage drops >5% (configurable)
  3. Block if cyclomatic complexity increases >20% (configurable)
  4. Block if new lint/static analysis violations introduced
  5. --force-merge override with justification comment
  6. Emit to events.jsonl + GitHub Checks API
  7. Display quality delta in PR comment

Alternatives Considered

Approach A: Extend sw-regression.sh

  • Pros: Reuses existing baseline/comparison infrastructure
  • Cons: sw-regression.sh tracks script-level metrics (line count, function count), not code quality metrics (coverage, complexity, lint violations). Different concern. Mixing them creates a God script.
  • Blast radius: High — changes to regression detection could break existing shipwright regression check workflows
  • Trade-offs: Less code but higher coupling, lower maintainability

Approach B: New sw-merge-gate.sh script + integration into stage_merge() (CHOSEN)

  • Pros: Clean separation of concerns. Merge gate is a distinct concept from regression detection. Can be independently tested, sourced, or run standalone. Follows existing pattern (sw-oversight.sh, sw-quality.sh).
  • Cons: New file, but minimal — follows established patterns
  • Blast radius: Low — new file + small insertion in stage_merge() + policy additions
  • Trade-offs: More files but better modularity, easier to test and maintain

Approach C: Implement as a GitHub Actions workflow only

  • Pros: Runs in CI natively, no bash needed
  • Cons: Shipwright is a shell orchestration tool — all other gates are bash scripts. Would break the pattern. Can't be used in local/no-github mode.
  • Blast radius: N/A — wrong paradigm for this project
  • Trade-offs: Better CI integration but breaks project conventions entirely

Decision: Approach B. It follows the established pattern of sw-oversight.sh (gate script sourced/called from stage_merge()), keeps merge-gate concerns separate, and integrates cleanly with the existing policy/event/checks infrastructure.

Risk Analysis

Risk What Could Break Mitigation
Coverage collection fails (no tool available) Gate blocks incorrectly on missing data Graceful degradation: skip metric if tool unavailable, only gate on metrics that were successfully collected
Complexity calculation is slow on large repos Merge stage timeout, pipeline stalls Use git diff --name-only to scope to changed files only, cap at 30s timeout
Baseline capture on main fails No baseline to compare, false blocks Fall back to "no regression detected" when baseline missing (warn, don't block)
Force-merge bypass abused Quality degrades over time despite gates Require justification comment, emit event for audit trail, track in events.jsonl
False positives from metric noise Developers frustrated, override fatigue Configurable thresholds in policy.json with sensible defaults, percentage-based not absolute
Breaks existing merge flow Active pipelines stuck at merge stage Guard behind SKIP_GATES check (existing pattern), test with existing pipeline tests
set -euo pipefail interaction with grep Double output or premature exit Use `

Dependency Analysis

Depends on:

  • scripts/lib/helpers.sh — colors, emit_event, info/warn/error
  • scripts/lib/policy.shpolicy_get for threshold configuration
  • scripts/sw-github-checks.shgh_checks_create_run, gh_checks_update_run
  • scripts/lib/pipeline-github.shgh_comment_issue for PR comments
  • config/policy.json — threshold configuration
  • config/event-schema.json — event type registration

Depended on by:

  • scripts/lib/pipeline-stages-delivery.sh (stage_merge()) — calls the gate
  • Future CI workflows may invoke sw-merge-gate.sh directly

No circular dependency risks — this is a leaf-node script called by the pipeline.


Files to Modify

New Files

  1. scripts/sw-merge-gate.sh — Main merge quality gate script (~400 lines)
  2. scripts/sw-merge-gate-test.sh — Test suite (~300 lines)

Modified Files

  1. scripts/lib/pipeline-stages-delivery.sh — Insert quality gate call in stage_merge() (~15 lines)
  2. config/policy.json — Add mergeGate configuration section (~15 lines)
  3. config/event-schema.json — Register new event types (~20 lines)

Implementation Steps

Step 1: Add policy configuration (config/policy.json)

Add a new mergeGate section to config/policy.json:

"mergeGate": {
  "enabled": true,
  "coverage_drop_threshold_percent": 5,
  "complexity_increase_threshold_percent": 20,
  "new_violations_block": true,
  "force_merge_requires_justification": true,
  "baseline_source": "main"
}

This follows the existing pattern where all thresholds live in policy.json (see pipeline, quality sections).

Step 2: Register event types (config/event-schema.json)

Add new event types:

  • merge_gate.check — Quality gate evaluation result (required: passed; optional: issue, coverage_delta, complexity_delta, violations_delta)
  • merge_gate.blocked — Merge blocked due to quality regression (required: issue; optional: reason, metrics)
  • merge_gate.override — Force-merge override used (required: issue; optional: justification, user)
  • merge_gate.baseline_captured — Baseline metrics saved (required: []; optional: branch, coverage, complexity, violations)

Step 3: Create scripts/sw-merge-gate.sh

Main script with these subcommands:

sw-merge-gate.sh baseline [--branch <branch>]    # Capture baseline from branch
sw-merge-gate.sh check [--pr <number>]            # Compare current vs baseline
sw-merge-gate.sh report [--json|--md]             # Generate delta report
sw-merge-gate.sh override --justification "..."   # Record force-merge override
sw-merge-gate.sh help                             # Usage

Metric collection functions:

  1. collect_coverage_metrics() — Parse coverage from:

    • npx vitest run --coverage JSON output (this project uses vitest with v8 provider)
    • .claude/pipeline-artifacts/test-results.json if available
    • Look for coverage-summary.json in standard locations
    • Falls back gracefully if no coverage tool → returns empty
  2. collect_complexity_metrics() — Calculate cyclomatic complexity:

    • Count decision points (if, elif, case, while, for, &&, ||) in changed files
    • Use git diff --name-only ${BASE_BRANCH}...HEAD to scope to changed files only
    • Sum complexity across all changed files for both base and current
    • Bash 3.2 safe: use grep + wc, not associative arrays
  3. collect_lint_violations() — Count lint/static analysis issues:

    • Run shellcheck on changed .sh files (primary — this is a shell project)
    • Run npx eslint on changed .ts/.js files if available
    • Count errors + warnings
    • Return 0 if no lint tool available
  4. capture_baseline() — Collect baseline metrics from base branch:

    • Use git stash + git checkout in a subshell to avoid directory changes
    • Collect all three metric types on base branch
    • Store baseline in $ARTIFACTS_DIR/merge-gate-baseline.json
    • Restore original branch/state
    • Atomic write via tmp + mv
  5. compare_metrics() — Delta calculation:

    • Coverage: (baseline - current) → if drop > threshold → fail
    • Complexity: ((current - baseline) / max(baseline, 1) * 100) → if increase > threshold → fail
    • Violations: (current - baseline) → if any increase → fail (when new_violations_block is true)
    • Handle division-by-zero: if baseline is 0, use absolute diff
  6. generate_pr_comment() — Markdown quality delta table:

    ## 🔍 Quality Gate Results
    
    | Metric | Before | After | Delta | Status |
    |--------|--------|-------|-------|--------|
    | Test Coverage | 78.5% | 76.2% | -2.3% | ✅ Pass |
    | Complexity | 142 | 185 | +30.3% | ❌ Fail |
    | Lint Violations | 3 | 3 | +0 | ✅ Pass |
    
    **Result:** ❌ Blocked — complexity increase exceeds 20% threshold
    
    > Override with: `shipwright merge-gate override --justification "reason"`
  7. post_github_check() — Create/update GitHub Check run:

    • Source sw-github-checks.sh for API functions
    • Create check run named shipwright/merge-gate
    • Update with conclusion (success/failure) and output summary
    • Respect NO_GITHUB flag
  8. cmd_check() — Main gate logic:

    • Load thresholds from policy.json via policy_get
    • Capture baseline (or load cached baseline)
    • Collect current metrics
    • Compare against thresholds
    • Emit merge_gate.check event
    • Post PR comment via gh issue comment
    • Create/update GitHub Check run
    • Return 0 (pass) or 1 (blocked)
  9. cmd_override() — Force-merge recording:

    • Require --justification argument (fail if empty)
    • Write override record to $ARTIFACTS_DIR/merge-gate-override.json
    • Emit merge_gate.override event
    • Post justification as PR comment

Step 4: Integrate into stage_merge() (pipeline-stages-delivery.sh)

Insert quality gate check after the oversight gate block (~line 503) and before the approval gates block (~line 505):

# ── Merge quality gate: block on quality regression ──
if [[ -x "$SCRIPT_DIR/sw-merge-gate.sh" ]] && [[ "${SKIP_GATES:-false}" != "true" ]]; then
    info "Running merge quality gate..."
    local _mg_result=0
    bash "$SCRIPT_DIR/sw-merge-gate.sh" check \
        --artifacts "$ARTIFACTS_DIR" \
        --issue "${ISSUE_NUMBER:-0}" \
        --base "${BASE_BRANCH:-main}" \
        --pr "${PR_NUMBER:-}" 2>&1 || _mg_result=$?

    if [[ "$_mg_result" -ne 0 ]]; then
        # Check for force-merge override
        if [[ -f "$ARTIFACTS_DIR/merge-gate-override.json" ]]; then
            warn "Quality gate failed but override present — proceeding"
            emit_event "merge_gate.override" "issue=${ISSUE_NUMBER:-0}"
        else
            error "Merge quality gate blocked — quality regression detected"
            emit_event "merge_gate.blocked" "issue=${ISSUE_NUMBER:-0}"
            log_stage "merge" "BLOCKED: quality gate regression"
            return 1
        fi
    fi
fi

Step 5: Create test suite (scripts/sw-merge-gate-test.sh)

Following the established pattern from sw-regression-test.sh:

Test cases (15 total):

  1. Baseline capture — Verify baseline JSON created with expected fields (coverage, complexity, violations)
  2. Check passes (no regression) — Current metrics equal or better than baseline → exit 0
  3. Coverage drop blocks — Coverage drops >5% → exit 1
  4. Coverage drop within threshold — Coverage drops 3% (under 5%) → exit 0
  5. Complexity increase blocks — Complexity increases >20% → exit 1
  6. Complexity increase within threshold — Complexity increases 15% (under 20%) → exit 0
  7. New violations block — Violation count increases → exit 1
  8. Multiple regressions — Report all failures, not just first
  9. Force-merge override with justification — Override file present → exit 0
  10. Force-merge without justification — Rejected → exit 1
  11. No baseline available — Warn but don't block → exit 0
  12. SKIP_GATES=true — Gate skipped entirely
  13. Event emission — Verify events written to events.jsonl
  14. PR comment format — Verify markdown table generated correctly
  15. Threshold configuration — Custom thresholds from policy.json respected

Task Checklist

  • Task 1: Add mergeGate config section to config/policy.json
  • Task 2: Register merge_gate.* event types in config/event-schema.json
  • Task 3: Create scripts/sw-merge-gate.sh — metric collection functions (coverage, complexity, violations)
  • Task 4: Create scripts/sw-merge-gate.sh — baseline capture and comparison logic
  • Task 5: Create scripts/sw-merge-gate.sh — PR comment generation and GitHub Checks integration
  • Task 6: Create scripts/sw-merge-gate.sh — override/force-merge recording
  • Task 7: Create scripts/sw-merge-gate.sh — CLI interface (baseline/check/report/override/help)
  • Task 8: Integrate quality gate call into stage_merge() in pipeline-stages-delivery.sh
  • Task 9: Create scripts/sw-merge-gate-test.sh — test setup and helpers
  • Task 10: Create scripts/sw-merge-gate-test.sh — baseline and comparison tests
  • Task 11: Create scripts/sw-merge-gate-test.sh — blocking/override/edge-case tests
  • Task 12: Run full test suite (npm test) and fix any failures
  • Task 13: Verify version consistency across all new/modified files

Task Dependencies

  • Tasks 1-2 have no dependencies (config changes)
  • Tasks 3-7 depend on Tasks 1-2 (script needs config)
  • Task 8 depends on Tasks 3-7 (integration needs the script)
  • Tasks 9-11 depend on Tasks 3-7 (tests need the script)
  • Task 12 depends on all previous tasks
  • Task 13 depends on Task 12

Testing Approach

Test Pyramid Breakdown

  • Unit tests (12 tests): Individual metric collection functions, threshold comparison, JSON parsing, report generation, override validation — all in sw-merge-gate-test.sh using mock binaries and temp directories
  • Integration tests (3 tests): Full check command flow with mock git/coverage data, event emission verification, PR comment format validation
  • E2E tests (0): Not applicable — this is infrastructure tooling tested via bash test harness with mocks. Real E2E would require actual GitHub API access.

Coverage Targets

  • 100% of subcommands (baseline, check, report, override, help)
  • 100% of blocking conditions (coverage drop, complexity increase, new violations)
  • 100% of bypass paths (SKIP_GATES, NO_GITHUB, missing baseline, override present)
  • Event emission verified for every outcome path

Critical Paths to Test

Happy path: Baseline captured → check runs → all metrics within thresholds → exit 0, PR comment with all-pass table posted

Error cases:

  1. Coverage drops 8% (exceeds 5% threshold) → exit 1, blocking comment with delta table posted, merge_gate.blocked event emitted
  2. Complexity increases 25% (exceeds 20% threshold) → exit 1, blocking comment posted
  3. Both coverage and complexity regress → both reported in same table, exit 1

Edge cases:

  1. Baseline has 0 complexity (division by zero) → use absolute diff instead of percentage, don't divide
  2. No changed code files (only .md changes) → skip complexity/lint checks, pass gate
  3. No coverage tool available → skip coverage metric, only gate on complexity + violations
  4. Override file exists but gate passes → ignore override, report clean pass

Definition of Done

  • sw-merge-gate.sh baseline captures coverage, complexity, and violation metrics from a branch
  • sw-merge-gate.sh check compares PR metrics against baseline and returns pass/fail
  • Merge is blocked when coverage drops >5% (configurable via policy.json)
  • Merge is blocked when complexity increases >20% (configurable via policy.json)
  • Merge is blocked when new lint/static analysis violations are introduced
  • --force-merge / override works with required justification comment
  • Quality gate results emitted to events.jsonl with correct event types
  • GitHub Check run created/updated with quality gate results (when GitHub available)
  • Quality delta displayed as markdown table in PR comment
  • Gate respects SKIP_GATES, NO_GITHUB, and headless mode flags
  • All 15 test cases pass in sw-merge-gate-test.sh
  • npm test passes with no regressions introduced
  • VERSION variable matches package.json version in all new/modified files

Clone this wiki locally