-
Notifications
You must be signed in to change notification settings - Fork 14
Description
📊 Current CI/CD Pipeline Status
The repository has a well-structured, multi-layered CI/CD pipeline covering builds, linting, type checking, unit tests, integration tests, security scanning, and agentic smoke tests. The overall health is good, but there are meaningful gaps that affect PR quality measurement, particularly around coverage thresholds and enforcement of required status checks.
✅ Existing Quality Gates
On Every PR (Automated, Blocking by Default)
| Workflow | File | What It Checks |
|---|---|---|
| PR Title Check | pr-title.yml |
Conventional Commits format, allowed scopes, lowercase subject |
| Build Verification | build.yml |
TypeScript build on Node 20 + 22, ESLint, API proxy unit tests |
| Lint | lint.yml |
ESLint on src/** TypeScript |
| TypeScript Type Check | test-integration.yml |
tsc --noEmit strict type checking |
| Test Coverage | test-coverage.yml |
Unit tests with coverage comparison; posts PR comment with delta |
| Integration Tests | test-integration-suite.yml |
4 parallel jobs: domain/network, protocol/security, container/ops, API proxy (~265 tests) |
| Chroot Integration Tests | test-chroot.yml |
4 parallel jobs: languages, package managers, procfs, edge cases (~70 tests) |
| Examples Test | test-examples.yml |
Runs all example shell scripts end-to-end |
| CodeQL | codeql.yml |
SAST for JavaScript/TypeScript + GitHub Actions |
| Dependency Vulnerability Audit | dependency-audit.yml |
npm audit --audit-level=high for main + docs packages |
| Container Security Scan | container-scan.yml |
Trivy scan for agent + squid containers (only on container file changes) |
| Security Guard | security-guard.lock.yml |
AI-powered Claude security review of PR diff |
On PRs (Opt-in / Reaction-triggered)
| Workflow | Trigger | Purpose |
|---|---|---|
| Smoke Claude | :heart: reaction |
Full Claude agent run through AWF sandbox |
| Smoke Codex | :hooray: reaction |
Full Codex agent run through AWF sandbox |
| Smoke Copilot | :eyes: reaction |
Full Copilot CLI run through AWF sandbox |
| Smoke Chroot | :rocket: reaction, path filter |
Chroot mode smoke test |
| Build-Test (8 languages) | PR opened/sync | Real builds (Bun, C++, Deno, .NET, Go, Java, Node, Rust) through firewall proxy |
Scheduled / Background Quality Checks
- Secret Digger (Claude, Codex, Copilot) — runs hourly scanning for secrets
- Dependency Security Monitor — daily dependency vulnerability monitoring
- Security Review — daily security review
- Test Coverage Improver — weekly AI-assisted test coverage improvements
- Doc Maintainer — daily documentation maintenance
- CI Doctor — monitors CI health on workflow completions
🔍 Identified Gaps
🔴 High Priority
1. Critically Low Unit Test Coverage on Core Files
Current state: Overall coverage is only 38% with severe gaps in the most critical files:
| File | Statements | Functions | Lines |
|---|---|---|---|
cli.ts |
0% | 0% | 0% |
docker-manager.ts |
18% | 4% | 17% |
host-iptables.ts |
83% | 100% | 83% |
cli.ts and docker-manager.ts together represent the core orchestration logic but are effectively untested at the unit level. PRs changing these files can ship broken behavior undetected by unit tests.
Recommendation: Add targeted unit tests for cli.ts (command parsing, signal handling, workflow orchestration) and docker-manager.ts (container lifecycle, config generation, cleanup logic). Raise coverage thresholds incrementally from the current 38% floor to at least 60%.
Complexity: Medium | Impact: High
2. Container Security Scan Misses Most PRs
Current state: container-scan.yml only triggers when containers/** or .github/workflows/container-scan.yml changes. PRs that modify src/** (e.g., changes to docker-manager.ts that alter container configuration, capabilities, or mounts) bypass the Trivy scan entirely.
Recommendation: Add src/** and containers/** to the scan trigger paths, or run the scan unconditionally on all PRs. The build cost is under 15 minutes.
Complexity: Low | Impact: High
3. Smoke Tests Are Not Required Checks
Current state: Smoke tests for Claude, Codex, Copilot require manual reactions (:heart:, :hooray:, :eyes:) to trigger. They are not required status checks and can be skipped entirely. A PR that breaks real-world agent execution through the firewall can merge without any end-to-end validation.
Recommendation: Run smoke tests automatically on PRs (already configured for opened/synchronize/reopened) but add them as required status checks in branch protection rules. Alternatively, create a "gateway" composite check that summarizes smoke test results.
Complexity: Low | Impact: High
4. No Enforcement of Required Status Checks in Branch Protection
Current state: Several workflows are configured to run on PRs but it's unclear whether they're enforced as required checks. The recent PR analysis shows that Dependency Vulnerability Audit had 2 failures, and multiple build-test workflows failed simultaneously, suggesting these failures don't block merging.
Recommendation: Ensure the following are required status checks blocking merge:
- Build Verification
- Lint
- TypeScript Type Check
- Integration Tests (all 4 jobs)
- CodeQL
- Dependency Vulnerability Audit
- Test Coverage (with regression detection)
Complexity: Low | Impact: High
🟡 Medium Priority
5. No Secret Scanning on PR Push
Current state: Secret scanning runs hourly via scheduled secret-digger-* workflows. A secret accidentally committed in a PR will not be detected until the next hourly run — potentially after it's visible in the PR diff on GitHub.
Recommendation: Add secret scanning (e.g., gitleaks or GitHub's built-in secret scanning) triggered on PR push events. This provides immediate feedback before reviewers see the diff.
Complexity: Low | Impact: Medium
6. Dependency Audit Failing on Recent PRs
Current state: The recent PR run shows Dependency Vulnerability Audit failing 2 out of 2 runs. This suggests there are currently unfixed high/critical vulnerabilities in npm audit that are causing consistent CI failures. If these are known/accepted vulnerabilities, they should be allowlisted; if not, they block all PRs.
Recommendation: Investigate the current audit failures, apply fixes or allowlist entries, and ensure the baseline is green. Add an allowlist (npm audit --omit=... or .nsprc equivalent) for false positives.
Complexity: Low | Impact: Medium
7. No SBOM Generation or Supply Chain Attestation
Current state: Container images are published via release.yml but there is no Software Bill of Materials (SBOM) generated or signed attestation for container images.
Recommendation: Add Syft SBOM generation and Cosign attestation during the release workflow (signing infrastructure via id-token: write is already in place in release.yml).
Complexity: Medium | Impact: Medium
8. CI Doctor Frequently Skipping
Current state: The recent 30-run sample shows CI Doctor was "skipped" on every observed run. This monitoring workflow may not be triggering correctly or its conditions may be too restrictive, reducing visibility into CI health trends.
Recommendation: Investigate why CI Doctor consistently skips. Review the workflow_run trigger list — if workflows are missing from the monitored list or running under different names, CI Doctor will never fire.
Complexity: Low | Impact: Medium
9. No Dist/Bundle Size Monitoring
Current state: The TypeScript build produces a dist/ directory, but there is no tracking of bundle size across PRs. A PR could significantly increase the installed footprint without any visibility.
Recommendation: Add a step to build.yml that reports dist/ size and optionally fails if it exceeds a threshold or grows by more than X% vs. the base branch.
Complexity: Low | Impact: Medium
10. Build-Test Workflows Show High Failure Rate
Current state: The recent PR run shows that all 8 build-test language workflows failed simultaneously. This pattern suggests these workflows are flaky or sensitive to external network dependencies (fetching packages through the firewall) rather than code defects.
Recommendation: Investigate root causes of build-test failures. Add retry logic or pinned package versions to reduce flakiness. Consider caching package registries used in tests to reduce external dependency.
Complexity: Medium | Impact: Medium
🟢 Low Priority
11. No Mutation Testing
Current state: Coverage numbers (38%) measure which lines are executed, not whether tests would catch actual bugs. With docker-manager.ts at 18% function coverage, even the existing tests may not verify correctness.
Recommendation: Add [Stryker Mutator]((strykermutator.io/redacted) for TypeScript mutation testing, initially scoped to the most critical modules (squid-config.ts, host-iptables.ts).
Complexity: Medium | Impact: Low-Medium
12. No Docs Quality Check on PRs
Current state: doc-maintainer.md runs on a daily schedule but does not run on PRs. Documentation drift can accumulate undetected.
Recommendation: Add a lightweight check (e.g., link checker, markdownlint) triggered on PRs that touch *.md, docs/**, or docs-site/**.
Complexity: Low | Impact: Low
13. update-release-notes Workflow Not Compiled
Current state: agenticworkflows-status reports update-release-notes as compiled: "No". An uncompiled workflow will not execute correctly.
Recommendation: Run gh aw compile .github/workflows/update-release-notes.md and commit the resulting .lock.yml.
Complexity: Low | Impact: Low
📋 Actionable Recommendations Summary
| # | Recommendation | Priority | Complexity | Impact |
|---|---|---|---|---|
| 1 | Increase unit test coverage for cli.ts (0%) and docker-manager.ts (18%) |
🔴 High | Medium | High |
| 2 | Expand container scan trigger to include src/** |
🔴 High | Low | High |
| 3 | Make smoke tests automatic (not reaction-only) and add as required checks | 🔴 High | Low | High |
| 4 | Enforce required status checks in branch protection | 🔴 High | Low | High |
| 5 | Add secret scanning on PR push events | 🟡 Medium | Low | Medium |
| 6 | Fix/allowlist current dependency audit failures | 🟡 Medium | Low | Medium |
| 7 | Add SBOM generation + Cosign attestation to release workflow | 🟡 Medium | Medium | Medium |
| 8 | Investigate and fix CI Doctor always skipping | 🟡 Medium | Low | Medium |
| 9 | Add dist/bundle size monitoring to build workflow | 🟡 Medium | Low | Medium |
| 10 | Investigate build-test workflow flakiness | 🟡 Medium | Medium | Medium |
| 11 | Add mutation testing for core modules | 🟢 Low | Medium | Low-Medium |
| 12 | Add markdown/docs linting on PR for doc changes | 🟢 Low | Low | Low |
| 13 | Compile update-release-notes.md workflow |
🟢 Low | Low | Low |
📈 Metrics Summary
| Metric | Value |
|---|---|
| Total workflow files | 43 .yml + 28 .md agentic workflows |
| Agentic workflows compiled | 27/28 (96%) |
| Workflows triggered on PRs | 13 standard + 4 reaction-gated smoke + 8 build-test |
| Unit test coverage (statements) | 38% (threshold: 38%) |
| Unit test coverage (functions) | 37% (threshold: 35%) |
| Integration test files | 26 files, ~265 tests |
| Recent PR workflow failure rate | ~60% of workflows failed on the most recent PR run |
| Secret digger runs | Hourly (3 parallel: Claude, Codex, Copilot) |
| Security scans | CodeQL (weekly + PRs), Trivy (weekly + container PRs), npm audit (weekly + PRs) |
Note on 60% failure rate: The high failure rate on the most recent observed PR is likely not representative of steady-state — it may reflect a specific PR that touched many systems simultaneously or a transient CI environment issue. The dependency audit failures (2/2) are the most concerning signal as they represent a persistent infrastructure problem.
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by CI/CD Pipelines and Integration Tests Gap Assessment
- expires on Mar 7, 2026, 10:18 PM UTC