Skip to content

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1109

@github-actions

Description

@github-actions

📊 Current CI/CD Pipeline Status

The repository has a well-structured, multi-layered CI/CD pipeline covering builds, linting, type checking, unit tests, integration tests, security scanning, and agentic smoke tests. The overall health is good, but there are meaningful gaps that affect PR quality measurement, particularly around coverage thresholds and enforcement of required status checks.


✅ Existing Quality Gates

On Every PR (Automated, Blocking by Default)

Workflow File What It Checks
PR Title Check pr-title.yml Conventional Commits format, allowed scopes, lowercase subject
Build Verification build.yml TypeScript build on Node 20 + 22, ESLint, API proxy unit tests
Lint lint.yml ESLint on src/** TypeScript
TypeScript Type Check test-integration.yml tsc --noEmit strict type checking
Test Coverage test-coverage.yml Unit tests with coverage comparison; posts PR comment with delta
Integration Tests test-integration-suite.yml 4 parallel jobs: domain/network, protocol/security, container/ops, API proxy (~265 tests)
Chroot Integration Tests test-chroot.yml 4 parallel jobs: languages, package managers, procfs, edge cases (~70 tests)
Examples Test test-examples.yml Runs all example shell scripts end-to-end
CodeQL codeql.yml SAST for JavaScript/TypeScript + GitHub Actions
Dependency Vulnerability Audit dependency-audit.yml npm audit --audit-level=high for main + docs packages
Container Security Scan container-scan.yml Trivy scan for agent + squid containers (only on container file changes)
Security Guard security-guard.lock.yml AI-powered Claude security review of PR diff

On PRs (Opt-in / Reaction-triggered)

Workflow Trigger Purpose
Smoke Claude :heart: reaction Full Claude agent run through AWF sandbox
Smoke Codex :hooray: reaction Full Codex agent run through AWF sandbox
Smoke Copilot :eyes: reaction Full Copilot CLI run through AWF sandbox
Smoke Chroot :rocket: reaction, path filter Chroot mode smoke test
Build-Test (8 languages) PR opened/sync Real builds (Bun, C++, Deno, .NET, Go, Java, Node, Rust) through firewall proxy

Scheduled / Background Quality Checks

  • Secret Digger (Claude, Codex, Copilot) — runs hourly scanning for secrets
  • Dependency Security Monitor — daily dependency vulnerability monitoring
  • Security Review — daily security review
  • Test Coverage Improver — weekly AI-assisted test coverage improvements
  • Doc Maintainer — daily documentation maintenance
  • CI Doctor — monitors CI health on workflow completions

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage on Core Files

Current state: Overall coverage is only 38% with severe gaps in the most critical files:

File Statements Functions Lines
cli.ts 0% 0% 0%
docker-manager.ts 18% 4% 17%
host-iptables.ts 83% 100% 83%

cli.ts and docker-manager.ts together represent the core orchestration logic but are effectively untested at the unit level. PRs changing these files can ship broken behavior undetected by unit tests.

Recommendation: Add targeted unit tests for cli.ts (command parsing, signal handling, workflow orchestration) and docker-manager.ts (container lifecycle, config generation, cleanup logic). Raise coverage thresholds incrementally from the current 38% floor to at least 60%.

Complexity: Medium | Impact: High


2. Container Security Scan Misses Most PRs

Current state: container-scan.yml only triggers when containers/** or .github/workflows/container-scan.yml changes. PRs that modify src/** (e.g., changes to docker-manager.ts that alter container configuration, capabilities, or mounts) bypass the Trivy scan entirely.

Recommendation: Add src/** and containers/** to the scan trigger paths, or run the scan unconditionally on all PRs. The build cost is under 15 minutes.

Complexity: Low | Impact: High


3. Smoke Tests Are Not Required Checks

Current state: Smoke tests for Claude, Codex, Copilot require manual reactions (:heart:, :hooray:, :eyes:) to trigger. They are not required status checks and can be skipped entirely. A PR that breaks real-world agent execution through the firewall can merge without any end-to-end validation.

Recommendation: Run smoke tests automatically on PRs (already configured for opened/synchronize/reopened) but add them as required status checks in branch protection rules. Alternatively, create a "gateway" composite check that summarizes smoke test results.

Complexity: Low | Impact: High


4. No Enforcement of Required Status Checks in Branch Protection

Current state: Several workflows are configured to run on PRs but it's unclear whether they're enforced as required checks. The recent PR analysis shows that Dependency Vulnerability Audit had 2 failures, and multiple build-test workflows failed simultaneously, suggesting these failures don't block merging.

Recommendation: Ensure the following are required status checks blocking merge:

  • Build Verification
  • Lint
  • TypeScript Type Check
  • Integration Tests (all 4 jobs)
  • CodeQL
  • Dependency Vulnerability Audit
  • Test Coverage (with regression detection)

Complexity: Low | Impact: High


🟡 Medium Priority

5. No Secret Scanning on PR Push

Current state: Secret scanning runs hourly via scheduled secret-digger-* workflows. A secret accidentally committed in a PR will not be detected until the next hourly run — potentially after it's visible in the PR diff on GitHub.

Recommendation: Add secret scanning (e.g., gitleaks or GitHub's built-in secret scanning) triggered on PR push events. This provides immediate feedback before reviewers see the diff.

Complexity: Low | Impact: Medium


6. Dependency Audit Failing on Recent PRs

Current state: The recent PR run shows Dependency Vulnerability Audit failing 2 out of 2 runs. This suggests there are currently unfixed high/critical vulnerabilities in npm audit that are causing consistent CI failures. If these are known/accepted vulnerabilities, they should be allowlisted; if not, they block all PRs.

Recommendation: Investigate the current audit failures, apply fixes or allowlist entries, and ensure the baseline is green. Add an allowlist (npm audit --omit=... or .nsprc equivalent) for false positives.

Complexity: Low | Impact: Medium


7. No SBOM Generation or Supply Chain Attestation

Current state: Container images are published via release.yml but there is no Software Bill of Materials (SBOM) generated or signed attestation for container images.

Recommendation: Add Syft SBOM generation and Cosign attestation during the release workflow (signing infrastructure via id-token: write is already in place in release.yml).

Complexity: Medium | Impact: Medium


8. CI Doctor Frequently Skipping

Current state: The recent 30-run sample shows CI Doctor was "skipped" on every observed run. This monitoring workflow may not be triggering correctly or its conditions may be too restrictive, reducing visibility into CI health trends.

Recommendation: Investigate why CI Doctor consistently skips. Review the workflow_run trigger list — if workflows are missing from the monitored list or running under different names, CI Doctor will never fire.

Complexity: Low | Impact: Medium


9. No Dist/Bundle Size Monitoring

Current state: The TypeScript build produces a dist/ directory, but there is no tracking of bundle size across PRs. A PR could significantly increase the installed footprint without any visibility.

Recommendation: Add a step to build.yml that reports dist/ size and optionally fails if it exceeds a threshold or grows by more than X% vs. the base branch.

Complexity: Low | Impact: Medium


10. Build-Test Workflows Show High Failure Rate

Current state: The recent PR run shows that all 8 build-test language workflows failed simultaneously. This pattern suggests these workflows are flaky or sensitive to external network dependencies (fetching packages through the firewall) rather than code defects.

Recommendation: Investigate root causes of build-test failures. Add retry logic or pinned package versions to reduce flakiness. Consider caching package registries used in tests to reduce external dependency.

Complexity: Medium | Impact: Medium


🟢 Low Priority

11. No Mutation Testing

Current state: Coverage numbers (38%) measure which lines are executed, not whether tests would catch actual bugs. With docker-manager.ts at 18% function coverage, even the existing tests may not verify correctness.

Recommendation: Add [Stryker Mutator]((strykermutator.io/redacted) for TypeScript mutation testing, initially scoped to the most critical modules (squid-config.ts, host-iptables.ts).

Complexity: Medium | Impact: Low-Medium


12. No Docs Quality Check on PRs

Current state: doc-maintainer.md runs on a daily schedule but does not run on PRs. Documentation drift can accumulate undetected.

Recommendation: Add a lightweight check (e.g., link checker, markdownlint) triggered on PRs that touch *.md, docs/**, or docs-site/**.

Complexity: Low | Impact: Low


13. update-release-notes Workflow Not Compiled

Current state: agenticworkflows-status reports update-release-notes as compiled: "No". An uncompiled workflow will not execute correctly.

Recommendation: Run gh aw compile .github/workflows/update-release-notes.md and commit the resulting .lock.yml.

Complexity: Low | Impact: Low


📋 Actionable Recommendations Summary

# Recommendation Priority Complexity Impact
1 Increase unit test coverage for cli.ts (0%) and docker-manager.ts (18%) 🔴 High Medium High
2 Expand container scan trigger to include src/** 🔴 High Low High
3 Make smoke tests automatic (not reaction-only) and add as required checks 🔴 High Low High
4 Enforce required status checks in branch protection 🔴 High Low High
5 Add secret scanning on PR push events 🟡 Medium Low Medium
6 Fix/allowlist current dependency audit failures 🟡 Medium Low Medium
7 Add SBOM generation + Cosign attestation to release workflow 🟡 Medium Medium Medium
8 Investigate and fix CI Doctor always skipping 🟡 Medium Low Medium
9 Add dist/bundle size monitoring to build workflow 🟡 Medium Low Medium
10 Investigate build-test workflow flakiness 🟡 Medium Medium Medium
11 Add mutation testing for core modules 🟢 Low Medium Low-Medium
12 Add markdown/docs linting on PR for doc changes 🟢 Low Low Low
13 Compile update-release-notes.md workflow 🟢 Low Low Low

📈 Metrics Summary

Metric Value
Total workflow files 43 .yml + 28 .md agentic workflows
Agentic workflows compiled 27/28 (96%)
Workflows triggered on PRs 13 standard + 4 reaction-gated smoke + 8 build-test
Unit test coverage (statements) 38% (threshold: 38%)
Unit test coverage (functions) 37% (threshold: 35%)
Integration test files 26 files, ~265 tests
Recent PR workflow failure rate ~60% of workflows failed on the most recent PR run
Secret digger runs Hourly (3 parallel: Claude, Codex, Copilot)
Security scans CodeQL (weekly + PRs), Trivy (weekly + container PRs), npm audit (weekly + PRs)

Note on 60% failure rate: The high failure rate on the most recent observed PR is likely not representative of steady-state — it may reflect a specific PR that touched many systems simultaneously or a transient CI environment issue. The dependency audit failures (2/2) are the most concerning signal as they represent a persistent infrastructure problem.


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment

  • expires on Mar 7, 2026, 10:18 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions