Skip to content

feat: /cso v2 — infrastructure-first security audit (v0.11.6.0)#384

Merged
garrytan merged 9 commits intomainfrom
garrytan/cso-v2-dep-surface
Mar 23, 2026
Merged

feat: /cso v2 — infrastructure-first security audit (v0.11.6.0)#384
garrytan merged 9 commits intomainfrom
garrytan/cso-v2-dep-surface

Conversation

@garrytan
Copy link
Owner

Summary

  • /cso v2 rewrites the security audit to start where breaches actually happen — leaked secrets in git history, dependency supply chain, CI/CD pipeline misconfigurations, unverified webhooks, Dockerfile security — before touching application code
  • 15 phases covering secrets archaeology, supply chain, CI/CD, LLM/AI security, skill supply chain, OWASP Top 10, STRIDE, and active verification
  • Two modes: --daily (8/10 confidence gate, zero noise) and --comprehensive (2/10 bar, deep scan)
  • Active verification: every finding independently verified by a subagent before reporting. Variant analysis searches the codebase when one vuln is confirmed
  • Stack detection: v1 ran all-language patterns without checking the stack. v2 detects your framework first
  • Proper tooling: v1 used raw grep in Bash with | head -20 truncation. v2 uses Claude Code's native Grep tool

Test Coverage

  • 3 E2E tests with planted vulnerabilities — all passing
    • cso-full-audit: hardcoded API key + .env tracked by git ($0.40, 16 turns, 159s)
    • cso-diff-mode: webhook without signature verification ($0.31, 12 turns, 142s)
    • cso-infra-scope: unpinned GitHub Action + Dockerfile without USER ($0.37, 14 turns, 169s)
  • Total E2E cost: $1.08 per run

Pre-Landing Review

No issues found. Diff is skill templates (prompts) and E2E tests — no application code that handles user data.

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related application files changed — evals skipped. CSO-specific E2E tests ran and passed 3/3.

TODOS

No TODO items completed in this PR.

Acknowledgements

Credits to 10 projects that informed the design: Sentry, Trail of Bits, Shannon/Keygraph, afiqiqmal, Snyk ToxicSkills, Miessler PAI, McGo, Claude Code Security Pack, Anthropic CCS, and @gus_argon (v1 blind spot identification).

Test plan

  • All free tests pass (bun test, exit 0)
  • CSO E2E tests pass (3/3, $1.08)

🤖 Generated with Claude Code

garrytan and others added 9 commits March 22, 2026 23:15
Rewrite /cso from code-centric OWASP scanning to infrastructure-first
attack surface analysis. 15 phases covering secrets archaeology, dependency
supply chain, CI/CD pipeline security, webhook verification, LLM/AI
security, skill supply chain scanning, plus OWASP Top 10, STRIDE, and
data classification.

Key design decisions from eng review + Codex adversarial review:
- Soft gate stack detection (prioritize, don't skip)
- Error on conflicting scope flags (never silently ignore)
- Permission gate before scanning ~/.claude/skills/
- Graceful degradation when audit tools aren't installed
- Finding fingerprints for cross-run trend tracking
- Variant analysis: one verified vuln triggers codebase-wide search
- Dual confidence modes: daily (8/10 gate) vs comprehensive (2/10)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Credits: Sentry (confidence gating), Trail of Bits (mental model + variant
analysis), Shannon/Keygraph (active verification validation), afiqiqmal
(framework detection + LLM security), Snyk ToxicSkills (skill supply chain),
Miessler PAI (incident playbooks), McGo (report format), Claude Code
Security Pack (modular validation), Anthropic CCS (500+ zero-days), and
@gus_argon (v1 blind spot identification).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three E2E test cases with planted vulnerabilities:
- cso-full-audit: hardcoded API key + .env tracked by git
- cso-diff-mode: webhook without signature verification on feature branch
- cso-infra-scope: unpinned GitHub Action + Dockerfile without USER

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
logCost requires (label, result), recordE2E requires (collector, name,
suite, result). Fixed all 3 test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The infra scope test runs Agent sub-tasks for parallel finding
verification which can take longer than 240s. Increased maxTurns
from 25 to 60 and timeout from 240s to 360s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent was burning 30+ turns exploring a 3-file repo (18 Glob calls,
Explore subagent, 4 SKILL.md reads) before starting the audit. Two Agent
verification subagents then ate ~100s, causing the 240s timeout.

Fix: tell the agent the repo is tiny, list the exact files, skip the
preamble, remove Agent from allowed tools, reduce maxTurns 60→30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…urface

# Conflicts:
#	.agents/skills/gstack-cso/SKILL.md
#	cso/SKILL.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Six fixes from Codex adversarial review:

1. Phase 2: Use `git log -G` (regex) instead of `-S` (literal) for
   patterns with alternation (ghp_|gho_|github_pat_, etc.)

2. Phase 12 exclusion #5: Add exception so CI/CD pipeline findings
   from Phase 4 are never auto-discarded when --infra is active

3. Phase 12 exclusion #6: Add exception that unpinned actions and
   missing CODEOWNERS are concrete risks, not "missing hardening"

4. Phase 12 exclusion #15: Add exception that SKILL.md files are
   executable prompt code, not documentation — Phase 8 findings
   in SKILL.md must not be excluded

5. Phase 12 exclusion #1: Add exception that LLM cost/spend
   amplification from Phase 7 is financial risk, not DoS

6. E2E tests: Add exitReason === 'success' assertion to all 3 tests;
   move finalizeEvalCollector to file-level afterAll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 3d1e8e0 into main Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant