Skip to content

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment β€” March 2026Β #1113

@github-actions

Description

@github-actions

πŸ“Š Current CI/CD Pipeline Status

The repository has a mature and extensive CI/CD setup with 56 total GitHub Actions workflows (28 standard .yml + 28 compiled agentic .lock.yml workflows). The pipeline covers build, lint, type-check, unit tests, integration tests, security scanning, container scanning, and AI-assisted smoke tests.

Workflow Inventory

Category Count Trigger
Standard CI/CD workflows (.yml) 20 PR / push to main / schedule
Agentic workflows (.md / .lock.yml) 28 PR / schedule / reaction
Total 48 active β€”

Health at a Glance

Recent scheduled run sample (last 30 runs from the agentic scheduler):

Workflow Status
Secret Digger (Claude) βœ… Success
Secret Digger (Codex) βœ… Success
Secret Digger (Copilot) ⚠️ Mixed (3 failures / 5 runs)
Issue Monster βœ… Success
Agentic Maintenance βœ… Success
CI Doctor ⚠️ All skipped (7/7 runs) β€” monitoring may not be triggering correctly

βœ… Existing Quality Gates

On Every PR (pull_request trigger to main)

Check Workflow What It Verifies
Build Verification build.yml TypeScript compile on Node 20 & 22, dist output exists, API proxy unit tests
ESLint lint.yml Code style / static analysis of src/
TypeScript Type Check test-integration.yml tsc --noEmit strict mode check
Test Coverage test-coverage.yml Jest unit tests + coverage delta vs base branch, PR comment
Integration Tests test-integration-suite.yml 4 parallel Docker-based test jobs (domain/network, protocol/security, container/ops, API proxy)
Chroot Integration Tests test-chroot.yml 4 parallel jobs: language runtimes, package managers, /proc FS, edge cases
Examples Test test-examples.yml End-to-end execution of examples/*.sh scripts
Test Setup Action test-action.yml action.yml self-test (latest version, specific version, image pull, invalid version)
CodeQL codeql.yml Static security analysis (JavaScript/TypeScript + GitHub Actions)
Container Security Scan container-scan.yml Trivy CRITICAL/HIGH CVE scan of agent and squid images (path-filtered to containers/**)
Dependency Audit dependency-audit.yml npm audit --audit-level=high for main and docs-site packages
PR Title Check pr-title.yml Conventional Commits format enforcement
AI Security Guard security-guard.lock.yml Claude reviews PR diff for security regressions
Build-Test Workflows 8 agentic workflows Real-world project builds (Go, Rust, Java, Node, Bun, C++, Deno, .NET) through the firewall
Smoke Tests 4 agentic workflows Claude/Codex/Copilot/Chroot end-to-end agent execution (reaction or PR triggered)

Recurring (Scheduled, not PR-blocking)

  • Weekly: dependency-audit.yml, container-scan.yml, CodeQL
  • Daily: security-review.md, dependency-security-monitor.md, doc-maintainer.md, ci-cd-gaps-assessment.md
  • Hourly: issue-monster.md, secret-digger-*.md

πŸ” Identified Gaps

πŸ”΄ High Priority

1. Critically low unit test coverage on core modules

docker-manager.ts (the most complex file β€” ~250 statements, 25 functions) has only 18% statement coverage and 4% function coverage. cli.ts (the main entry point, ~69 statements) has 0% coverage. These files contain the container lifecycle logic, cleanup handlers, signal processing, and exit code propagation β€” all of which are critical paths.

Current overall coverage: 38% statements / 31% branches with very low enforcement thresholds (38% / 30%).

2. chroot-copilot-home.test.ts not wired to any CI workflow

The file tests/integration/chroot-copilot-home.test.ts exists but is not included in any --testPathPatterns in test-integration-suite.yml or test-chroot.yml. These tests never run in CI, meaning regressions in Copilot home directory handling will go undetected.

3. build-test-node.md is uncompiled

agenticworkflows-status reports build-test-node with compiled: "No". This means the Node.js build-test workflow (which exercises real npm projects through the firewall) does not execute in CI. Node.js is the primary ecosystem for this project.

4. API Proxy container is not included in container-scan.yml

The Trivy scan in container-scan.yml covers the agent and squid images but not the api-proxy container. The API proxy is a Node.js HTTP server that handles authentication token injection β€” a high-value security target. Its base image and npm dependencies should be scanned for CVEs on every change to containers/api-proxy/**.


🟑 Medium Priority

5. Duplicate ESLint execution

Both build.yml and lint.yml run npm run lint on every PR. This is redundant and wastes ~1–2 minutes of CI time per PR. One of these should be removed or consolidated.

6. No code formatting check (Prettier not enforced)

The project has ESLint but no Prettier or --fix enforcement. Code style inconsistencies can accumulate silently. There is no formatting gate preventing unformatted code from merging.

7. No shell script linting (shellcheck)

The repository contains multiple shell scripts in containers/agent/ (setup-iptables.sh, entrypoint.sh), containers/squid/, and scripts/ci/. These scripts contain security-critical logic (iptables setup, capability drops) but are not validated by shellcheck in CI. Shell bugs in these scripts could silently weaken the firewall.

8. container-scan.yml only triggers on containers/** path changes

The container scan is path-filtered to containers/**, meaning PRs that change the container base images or packages indirectly (e.g., through apt calls in scripts referenced by Dockerfiles) won't trigger a scan. The weekly schedule catches this eventually, but a window exists.

9. No binary artifact size monitoring

The release pipeline builds standalone binaries (awf-linux-x64, awf-darwin-arm64, etc.). There is no check to detect unexpected size increases (which could indicate accidental large dependency inclusion). A simple size threshold check in the release workflow or a separate PR check would catch this.

10. CI Doctor shows all-skipped runs

All 7 recent CI Doctor runs have conclusion skipped. The CI Doctor workflow monitors the health of other workflows via workflow_run trigger, but if the triggering workflows aren't completing as expected (or the name list is stale), the doctor never fires. This monitoring gap means workflow regressions (broken workflows that stop running entirely) may go unnoticed.

11. No integration test coverage for docs-site

The docs-site/ Astro/Starlight documentation site has its own package.json and dependencies audited, but there is no build test for it in CI (only deploy-docs.yml which deploys β€” but doesn't test the build on PRs that don't change docs). A broken docs build on a non-doc PR would only be caught at deploy time.


🟒 Low Priority

12. No dependency license compliance check

There is no check for license compatibility of new npm dependencies. A contributor could introduce a GPL-licensed dependency that conflicts with the project's MIT license without CI catching it. Tools like license-checker or licensee could be added.

13. No performance regression benchmarks

The firewall's container startup time and proxy latency are important UX metrics. There are no benchmarks tracking these across PRs. While this is complex to implement correctly, even a simple "time to first byte" check in the integration tests would surface major regressions.

14. No test flakiness tracking or retry mechanism

Integration tests using Docker containers can have intermittent failures (network timing, container startup races). There's no flakiness tracking or automatic retry configured in the integration test workflows. This leads to manual re-runs and reduces developer confidence in the CI signal.

15. SECRET_DIGGER_COPILOT has a 60% failure rate

The scheduled Secret Digger (Copilot) workflow shows 3 failures out of 5 recent runs. This recurring failure should be investigated to determine if it's a token/quota issue or a workflow bug.


πŸ“‹ Actionable Recommendations

# Gap Recommended Solution Complexity Impact
1 Low coverage on docker-manager.ts/cli.ts Add unit tests using Jest mocks for execa and file system; target 60%+ coverage High High
2 chroot-copilot-home.test.ts not in CI Add chroot-copilot-home to a --testPathPatterns in test-chroot.yml Low High
3 build-test-node.md uncompiled Run gh aw compile .github/workflows/build-test-node.md && npx tsx scripts/ci/postprocess-smoke-workflows.ts Low High
4 No API proxy container scan Add a scan-api-proxy job to container-scan.yml mirroring the existing scan-agent job Low High
5 Duplicate ESLint Remove lint.yml (keep ESLint in build.yml); or remove lint from build.yml and keep lint.yml Low Medium
6 No Prettier enforcement Add prettier --check step to build.yml or create a dedicated formatting workflow Low Medium
7 No shellcheck Add shellcheck containers/**/*.sh scripts/ci/*.sh step to build.yml Low High
8 Container scan path filter too narrow Add 'containers/api-proxy/**' to container-scan.yml paths Low Medium
9 No binary size monitoring Add a step in release.yml to assert each binary is within expected size bounds Low Low
10 CI Doctor skipping Audit the workflow_run trigger list in ci-doctor.md and recompile Low Medium
11 Docs site not built on PRs Add a docs build step (npm run docs:build) to build.yml or a dedicated docs-check workflow Low Medium
12 No license check Add npx license-checker --onlyAllow 'MIT;ISC;Apache-2.0;BSD-2-Clause;BSD-3-Clause;CC0-1.0' to dependency-audit Low Low
14 No test retry Add --retries 2 to Jest integration test runs for Docker-dependent tests Low Medium
15 Secret Digger failures Investigate Copilot token/quota issues in secret-digger-copilot.md Medium Medium

πŸ“ˆ Metrics Summary

Metric Value
Total workflows 56 (48 active)
Workflows triggering on PR ~28
Unit test statement coverage 38.39%
Unit test branch coverage 31.78%
Integration test files 27
Integration tests not wired to CI 1 (chroot-copilot-home.test.ts)
Agentic workflows uncompiled 1 (build-test-node.md)
Container images scanned 2 of 3 (api-proxy missing)
Recent Secret Digger Copilot failure rate 60% (3/5 runs)
CI Doctor effectiveness ⚠️ All recent runs skipped

Assessment generated by ci-cd-gaps-assessment workflow on 2026-03-01. Workflow run: #22553999520


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment

  • expires on Mar 8, 2026, 10:20 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions