[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment

## 📊 Current CI/CD Pipeline Status

The repository has a **well-structured, multi-layered CI/CD pipeline** covering builds, linting, type checking, unit tests, integration tests, security scanning, and agentic smoke tests. The overall health is good, but there are meaningful gaps that affect PR quality measurement, particularly around coverage thresholds and enforcement of required status checks.

---

## ✅ Existing Quality Gates

### On Every PR (Automated, Blocking by Default)

| Workflow | File | What It Checks |
|---|---|---|
| **PR Title Check** | `pr-title.yml` | Conventional Commits format, allowed scopes, lowercase subject |
| **Build Verification** | `build.yml` | TypeScript build on Node 20 + 22, ESLint, API proxy unit tests |
| **Lint** | `lint.yml` | ESLint on `src/**` TypeScript |
| **TypeScript Type Check** | `test-integration.yml` | `tsc --noEmit` strict type checking |
| **Test Coverage** | `test-coverage.yml` | Unit tests with coverage comparison; posts PR comment with delta |
| **Integration Tests** | `test-integration-suite.yml` | 4 parallel jobs: domain/network, protocol/security, container/ops, API proxy (~265 tests) |
| **Chroot Integration Tests** | `test-chroot.yml` | 4 parallel jobs: languages, package managers, procfs, edge cases (~70 tests) |
| **Examples Test** | `test-examples.yml` | Runs all example shell scripts end-to-end |
| **CodeQL** | `codeql.yml` | SAST for JavaScript/TypeScript + GitHub Actions |
| **Dependency Vulnerability Audit** | `dependency-audit.yml` | `npm audit --audit-level=high` for main + docs packages |
| **Container Security Scan** | `container-scan.yml` | Trivy scan for agent + squid containers *(only on container file changes)* |
| **Security Guard** | `security-guard.lock.yml` | AI-powered Claude security review of PR diff |

### On PRs (Opt-in / Reaction-triggered)

| Workflow | Trigger | Purpose |
|---|---|---|
| **Smoke Claude** | `:heart:` reaction | Full Claude agent run through AWF sandbox |
| **Smoke Codex** | `:hooray:` reaction | Full Codex agent run through AWF sandbox |
| **Smoke Copilot** | `:eyes:` reaction | Full Copilot CLI run through AWF sandbox |
| **Smoke Chroot** | `:rocket:` reaction, path filter | Chroot mode smoke test |
| **Build-Test (8 languages)** | PR opened/sync | Real builds (Bun, C++, Deno, .NET, Go, Java, Node, Rust) through firewall proxy |

### Scheduled / Background Quality Checks

- **Secret Digger** (Claude, Codex, Copilot) — runs hourly scanning for secrets
- **Dependency Security Monitor** — daily dependency vulnerability monitoring
- **Security Review** — daily security review
- **Test Coverage Improver** — weekly AI-assisted test coverage improvements
- **Doc Maintainer** — daily documentation maintenance
- **CI Doctor** — monitors CI health on workflow completions

---

## 🔍 Identified Gaps

### 🔴 High Priority

#### 1. Critically Low Unit Test Coverage on Core Files

**Current state:** Overall coverage is only **38%** with severe gaps in the most critical files:

| File | Statements | Functions | Lines |
|---|---|---|---|
| `cli.ts` | **0%** | **0%** | **0%** |
| `docker-manager.ts` | **18%** | **4%** | **17%** |
| `host-iptables.ts` | 83% | 100% | 83% |

`cli.ts` and `docker-manager.ts` together represent the core orchestration logic but are effectively **untested at the unit level**. PRs changing these files can ship broken behavior undetected by unit tests.

**Recommendation:** Add targeted unit tests for `cli.ts` (command parsing, signal handling, workflow orchestration) and `docker-manager.ts` (container lifecycle, config generation, cleanup logic). Raise coverage thresholds incrementally from the current 38% floor to at least 60%.

**Complexity:** Medium | **Impact:** High

---

#### 2. Container Security Scan Misses Most PRs

**Current state:** `container-scan.yml` only triggers when `containers/**` or `.github/workflows/container-scan.yml` changes. PRs that modify `src/**` (e.g., changes to `docker-manager.ts` that alter container configuration, capabilities, or mounts) bypass the Trivy scan entirely.

**Recommendation:** Add `src/**` and `containers/**` to the scan trigger paths, or run the scan unconditionally on all PRs. The build cost is under 15 minutes.

**Complexity:** Low | **Impact:** High

---

#### 3. Smoke Tests Are Not Required Checks

**Current state:** Smoke tests for Claude, Codex, Copilot require manual reactions (`:heart:`, `:hooray:`, `:eyes:`) to trigger. They are not required status checks and can be skipped entirely. A PR that breaks real-world agent execution through the firewall can merge without any end-to-end validation.

**Recommendation:** Run smoke tests automatically on PRs (already configured for `opened/synchronize/reopened`) but **add them as required status checks in branch protection rules**. Alternatively, create a "gateway" composite check that summarizes smoke test results.

**Complexity:** Low | **Impact:** High

---

#### 4. No Enforcement of Required Status Checks in Branch Protection

**Current state:** Several workflows are configured to run on PRs but it's unclear whether they're enforced as required checks. The recent PR analysis shows that **Dependency Vulnerability Audit had 2 failures**, and multiple build-test workflows failed simultaneously, suggesting these failures don't block merging.

**Recommendation:** Ensure the following are **required status checks** blocking merge:
- Build Verification
- Lint
- TypeScript Type Check
- Integration Tests (all 4 jobs)
- CodeQL
- Dependency Vulnerability Audit
- Test Coverage (with regression detection)

**Complexity:** Low | **Impact:** High

---

### 🟡 Medium Priority

#### 5. No Secret Scanning on PR Push

**Current state:** Secret scanning runs hourly via scheduled `secret-digger-*` workflows. A secret accidentally committed in a PR will not be detected until the next hourly run — potentially after it's visible in the PR diff on GitHub.

**Recommendation:** Add secret scanning (e.g., `gitleaks` or GitHub's built-in secret scanning) triggered on PR push events. This provides immediate feedback before reviewers see the diff.

**Complexity:** Low | **Impact:** Medium

---

#### 6. Dependency Audit Failing on Recent PRs

**Current state:** The recent PR run shows **Dependency Vulnerability Audit failing 2 out of 2 runs**. This suggests there are currently unfixed high/critical vulnerabilities in `npm audit` that are causing consistent CI failures. If these are known/accepted vulnerabilities, they should be allowlisted; if not, they block all PRs.

**Recommendation:** Investigate the current audit failures, apply fixes or allowlist entries, and ensure the baseline is green. Add an allowlist (`npm audit --omit=...` or `.nsprc` equivalent) for false positives.

**Complexity:** Low | **Impact:** Medium

---

#### 7. No SBOM Generation or Supply Chain Attestation

**Current state:** Container images are published via `release.yml` but there is no Software Bill of Materials (SBOM) generated or signed attestation for container images.

**Recommendation:** Add Syft SBOM generation and Cosign attestation during the release workflow (signing infrastructure via `id-token: write` is already in place in `release.yml`).

**Complexity:** Medium | **Impact:** Medium

---

#### 8. CI Doctor Frequently Skipping

**Current state:** The recent 30-run sample shows CI Doctor was "skipped" on every observed run. This monitoring workflow may not be triggering correctly or its conditions may be too restrictive, reducing visibility into CI health trends.

**Recommendation:** Investigate why CI Doctor consistently skips. Review the `workflow_run` trigger list — if workflows are missing from the monitored list or running under different names, CI Doctor will never fire.

**Complexity:** Low | **Impact:** Medium

---

#### 9. No Dist/Bundle Size Monitoring

**Current state:** The TypeScript build produces a `dist/` directory, but there is no tracking of bundle size across PRs. A PR could significantly increase the installed footprint without any visibility.

**Recommendation:** Add a step to `build.yml` that reports `dist/` size and optionally fails if it exceeds a threshold or grows by more than X% vs. the base branch.

**Complexity:** Low | **Impact:** Medium

---

#### 10. Build-Test Workflows Show High Failure Rate

**Current state:** The recent PR run shows that **all 8 build-test language workflows failed** simultaneously. This pattern suggests these workflows are flaky or sensitive to external network dependencies (fetching packages through the firewall) rather than code defects.

**Recommendation:** Investigate root causes of build-test failures. Add retry logic or pinned package versions to reduce flakiness. Consider caching package registries used in tests to reduce external dependency.

**Complexity:** Medium | **Impact:** Medium

---

### 🟢 Low Priority

#### 11. No Mutation Testing

**Current state:** Coverage numbers (38%) measure which lines are executed, not whether tests would catch actual bugs. With `docker-manager.ts` at 18% function coverage, even the existing tests may not verify correctness.

**Recommendation:** Add [Stryker Mutator]((strykermutator.io/redacted) for TypeScript mutation testing, initially scoped to the most critical modules (`squid-config.ts`, `host-iptables.ts`).

**Complexity:** Medium | **Impact:** Low-Medium

---

#### 12. No Docs Quality Check on PRs

**Current state:** `doc-maintainer.md` runs on a daily schedule but does not run on PRs. Documentation drift can accumulate undetected.

**Recommendation:** Add a lightweight check (e.g., link checker, markdownlint) triggered on PRs that touch `*.md`, `docs/**`, or `docs-site/**`.

**Complexity:** Low | **Impact:** Low

---

#### 13. `update-release-notes` Workflow Not Compiled

**Current state:** `agenticworkflows-status` reports `update-release-notes` as `compiled: "No"`. An uncompiled workflow will not execute correctly.

**Recommendation:** Run `gh aw compile .github/workflows/update-release-notes.md` and commit the resulting `.lock.yml`.

**Complexity:** Low | **Impact:** Low

---

## 📋 Actionable Recommendations Summary

| # | Recommendation | Priority | Complexity | Impact |
|---|---|---|---|---|
| 1 | Increase unit test coverage for `cli.ts` (0%) and `docker-manager.ts` (18%) | 🔴 High | Medium | High |
| 2 | Expand container scan trigger to include `src/**` | 🔴 High | Low | High |
| 3 | Make smoke tests automatic (not reaction-only) and add as required checks | 🔴 High | Low | High |
| 4 | Enforce required status checks in branch protection | 🔴 High | Low | High |
| 5 | Add secret scanning on PR push events | 🟡 Medium | Low | Medium |
| 6 | Fix/allowlist current dependency audit failures | 🟡 Medium | Low | Medium |
| 7 | Add SBOM generation + Cosign attestation to release workflow | 🟡 Medium | Medium | Medium |
| 8 | Investigate and fix CI Doctor always skipping | 🟡 Medium | Low | Medium |
| 9 | Add dist/bundle size monitoring to build workflow | 🟡 Medium | Low | Medium |
| 10 | Investigate build-test workflow flakiness | 🟡 Medium | Medium | Medium |
| 11 | Add mutation testing for core modules | 🟢 Low | Medium | Low-Medium |
| 12 | Add markdown/docs linting on PR for doc changes | 🟢 Low | Low | Low |
| 13 | Compile `update-release-notes.md` workflow | 🟢 Low | Low | Low |

---

## 📈 Metrics Summary

| Metric | Value |
|---|---|
| Total workflow files | 43 `.yml` + 28 `.md` agentic workflows |
| Agentic workflows compiled | 27/28 (96%) |
| Workflows triggered on PRs | 13 standard + 4 reaction-gated smoke + 8 build-test |
| Unit test coverage (statements) | 38% (threshold: 38%) |
| Unit test coverage (functions) | 37% (threshold: 35%) |
| Integration test files | 26 files, ~265 tests |
| Recent PR workflow failure rate | ~60% of workflows failed on the most recent PR run |
| Secret digger runs | Hourly (3 parallel: Claude, Codex, Copilot) |
| Security scans | CodeQL (weekly + PRs), Trivy (weekly + container PRs), npm audit (weekly + PRs) |

> **Note on 60% failure rate:** The high failure rate on the most recent observed PR is likely not representative of steady-state — it may reflect a specific PR that touched many systems simultaneously or a transient CI environment issue. The dependency audit failures (2/2) are the most concerning signal as they represent a persistent infrastructure problem.

---

> **Note:** This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
>
> **Tip:** Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.




> Generated by [CI/CD Pipelines and Integration Tests Gap Assessment](https://github.com/github/gh-aw-firewall/actions/runs/22530141433)
> - [x] expires  on Mar 7, 2026, 10:18 PM UTC

Workflow	File	What It Checks
PR Title Check	`pr-title.yml`	Conventional Commits format, allowed scopes, lowercase subject
Build Verification	`build.yml`	TypeScript build on Node 20 + 22, ESLint, API proxy unit tests
Lint	`lint.yml`	ESLint on `src/**` TypeScript
TypeScript Type Check	`test-integration.yml`	`tsc --noEmit` strict type checking
Test Coverage	`test-coverage.yml`	Unit tests with coverage comparison; posts PR comment with delta
Integration Tests	`test-integration-suite.yml`	4 parallel jobs: domain/network, protocol/security, container/ops, API proxy (~265 tests)
Chroot Integration Tests	`test-chroot.yml`	4 parallel jobs: languages, package managers, procfs, edge cases (~70 tests)
Examples Test	`test-examples.yml`	Runs all example shell scripts end-to-end
CodeQL	`codeql.yml`	SAST for JavaScript/TypeScript + GitHub Actions
Dependency Vulnerability Audit	`dependency-audit.yml`	`npm audit --audit-level=high` for main + docs packages
Container Security Scan	`container-scan.yml`	Trivy scan for agent + squid containers (only on container file changes)
Security Guard	`security-guard.lock.yml`	AI-powered Claude security review of PR diff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1109

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On Every PR (Automated, Blocking by Default)

On PRs (Opt-in / Reaction-triggered)

Scheduled / Background Quality Checks

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage on Core Files

2. Container Security Scan Misses Most PRs

3. Smoke Tests Are Not Required Checks

4. No Enforcement of Required Status Checks in Branch Protection

🟡 Medium Priority

5. No Secret Scanning on PR Push

6. Dependency Audit Failing on Recent PRs

7. No SBOM Generation or Supply Chain Attestation

8. CI Doctor Frequently Skipping

9. No Dist/Bundle Size Monitoring

10. Build-Test Workflows Show High Failure Rate

🟢 Low Priority

11. No Mutation Testing

12. No Docs Quality Check on PRs

13. `update-release-notes` Workflow Not Compiled

📋 Actionable Recommendations Summary

📈 Metrics Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workflow	Trigger	Purpose
Smoke Claude	`:heart:` reaction	Full Claude agent run through AWF sandbox
Smoke Codex	`:hooray:` reaction	Full Codex agent run through AWF sandbox
Smoke Copilot	`:eyes:` reaction	Full Copilot CLI run through AWF sandbox
Smoke Chroot	`:rocket:` reaction, path filter	Chroot mode smoke test
Build-Test (8 languages)	PR opened/sync	Real builds (Bun, C++, Deno, .NET, Go, Java, Node, Rust) through firewall proxy

File	Statements	Functions	Lines
`cli.ts`	0%	0%	0%
`docker-manager.ts`	18%	4%	17%
`host-iptables.ts`	83%	100%	83%

#	Recommendation	Priority	Complexity	Impact
1	Increase unit test coverage for `cli.ts` (0%) and `docker-manager.ts` (18%)	🔴 High	Medium	High
2	Expand container scan trigger to include `src/**`	🔴 High	Low	High
3	Make smoke tests automatic (not reaction-only) and add as required checks	🔴 High	Low	High
4	Enforce required status checks in branch protection	🔴 High	Low	High
5	Add secret scanning on PR push events	🟡 Medium	Low	Medium
6	Fix/allowlist current dependency audit failures	🟡 Medium	Low	Medium
7	Add SBOM generation + Cosign attestation to release workflow	🟡 Medium	Medium	Medium
8	Investigate and fix CI Doctor always skipping	🟡 Medium	Low	Medium
9	Add dist/bundle size monitoring to build workflow	🟡 Medium	Low	Medium
10	Investigate build-test workflow flakiness	🟡 Medium	Medium	Medium
11	Add mutation testing for core modules	🟢 Low	Medium	Low-Medium
12	Add markdown/docs linting on PR for doc changes	🟢 Low	Low	Low
13	Compile `update-release-notes.md` workflow	🟢 Low	Low	Low

Metric	Value
Total workflow files	43 `.yml` + 28 `.md` agentic workflows
Agentic workflows compiled	27/28 (96%)
Workflows triggered on PRs	13 standard + 4 reaction-gated smoke + 8 build-test
Unit test coverage (statements)	38% (threshold: 38%)
Unit test coverage (functions)	37% (threshold: 35%)
Integration test files	26 files, ~265 tests
Recent PR workflow failure rate	~60% of workflows failed on the most recent PR run
Secret digger runs	Hourly (3 parallel: Claude, Codex, Copilot)
Security scans	CodeQL (weekly + PRs), Trivy (weekly + container PRs), npm audit (weekly + PRs)

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1109

Description

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On Every PR (Automated, Blocking by Default)

On PRs (Opt-in / Reaction-triggered)

Scheduled / Background Quality Checks

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage on Core Files

2. Container Security Scan Misses Most PRs

3. Smoke Tests Are Not Required Checks

4. No Enforcement of Required Status Checks in Branch Protection

🟡 Medium Priority

5. No Secret Scanning on PR Push

6. Dependency Audit Failing on Recent PRs

7. No SBOM Generation or Supply Chain Attestation

8. CI Doctor Frequently Skipping

9. No Dist/Bundle Size Monitoring

10. Build-Test Workflows Show High Failure Rate

🟢 Low Priority

11. No Mutation Testing

12. No Docs Quality Check on PRs

13. update-release-notes Workflow Not Compiled

📋 Actionable Recommendations Summary

📈 Metrics Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

13. `update-release-notes` Workflow Not Compiled