fix(benchmark): mount adversarial_dojo into AWF container and pass API keys#4187
Conversation
Mount the adversarial_dojo project dir (with its uv-managed standalone Python venv), uv binary, benchmark config files, and output dir into the AWF container via --mount so the benchmark tooling is available inside the minimal container image. Add explicit --env flags for ANTHROPIC_API_KEY and OPENAI_API_KEY so adversarial_dojo agents can authenticate inside the container. Set --container-workdir /tmp/adversarial_dojo so uv finds the project venv. Remove the now-redundant host-side cd. Update tests to verify the new mount and env flags. Fixes: red-team benchmark run 26792842801 (adversarial-dojo not found, API keys not forwarded to AWF container)
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
Smoke Test: Claude Engine
Result: PASS
|
🔬 Smoke Test Results
Overall: FAIL — pre-computed smoke data ( PR by
|
🔥 Smoke Test: Copilot BYOK (Offline) Mode
Running in BYOK offline mode ( Author: Overall: PARTIAL PASS (BYOK + MCP ✅; pre-step template vars not expanded)
|
There was a problem hiding this comment.
Pull request overview
Fixes the weekly red-team benchmark AWF-protected run by ensuring the adversarial_dojo tooling (including its uv-managed environment) is available inside the minimal AWF container, and by explicitly forwarding required API keys into the containerized run.
Changes:
- Mount
/tmp/adversarial_dojo, uv binary, benchmark config inputs, and the AWF output directory into the AWF container; set--container-workdirso uv locates the venv. - Forward
ANTHROPIC_API_KEYandOPENAI_API_KEYto the containerized benchmark run viaawf --env .... - Extend CI tests to assert presence of the new workflow flags in both the source workflow and compiled lock file.
Show a summary per file
| File | Description |
|---|---|
| scripts/ci/red-team-benchmark-workflow.test.ts | Adds assertions that the workflow/lock include the new mounts, workdir, and env forwarding flags. |
| .github/workflows/red-team-benchmark.md | Updates the AWF-protected benchmark step to mount required paths/binaries and forward API keys into the AWF container. |
| .github/workflows/red-team-benchmark.lock.yml | Regenerates the compiled lock workflow to reflect the new AWF invocation flags. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 3
| // adversarial_dojo tooling is mounted into the AWF container | ||
| expect(source).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo'); | ||
| expect(source).toContain('--mount /tmp/awf-benchmark.toml:/tmp/awf-benchmark.toml:ro'); | ||
| expect(source).toContain('--mount /tmp/gh-aw/agent/awf:/tmp/gh-aw/agent/awf'); | ||
| expect(source).toContain('--container-workdir /tmp/adversarial_dojo'); |
| // adversarial_dojo mounts compiled into lock | ||
| expect(lock).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo'); | ||
| expect(lock).toContain('--container-workdir /tmp/adversarial_dojo'); | ||
|
|
| --container-workdir /tmp/adversarial_dojo \ | ||
| --env "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \ | ||
| --env "OPENAI_API_KEY=$OPENAI_API_KEY" \ |
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Gemini Smoke Test Results
Overall status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
Smoke Test Results: ❌ FAIL
|
The lock file was stale after PR #4187 changed the .md source without a full recompile (frontmatter hash mismatch). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The weekly red-team benchmark was crashing before any attack attempts completed: the AWF-protected run failed with
adversarial-dojo / No such file or directorybecause the minimal AWF container has nouv/Python/venv, and API keys set in the stepenv:block were silently dropped bysudo.Changes
AWF-protected benchmark run (
red-team-benchmark.md/.lock.yml)--mountflags so adversarial_dojo tooling is available inside the container:sudo+ host-sidecdwith explicit--envforwarding;sudodrops env vars by default, so API keys never reached the container agents.python-build-standalone), so no system Python is required in the minimal container image.Tests (
red-team-benchmark-workflow.test.ts)--mountflags,--container-workdir, and explicit--envkey forwarding.