fix(benchmark): mount adversarial_dojo into AWF container and pass API keys by Copilot · Pull Request #4187 · github/gh-aw-firewall

Copilot · 2026-06-02T02:56:52Z

The weekly red-team benchmark was crashing before any attack attempts completed: the AWF-protected run failed with adversarial-dojo / No such file or directory because the minimal AWF container has no uv/Python/venv, and API keys set in the step env: block were silently dropped by sudo.

Changes

AWF-protected benchmark run (red-team-benchmark.md / .lock.yml)

Add --mount flags so adversarial_dojo tooling is available inside the container:

sudo awf \
  --mount /tmp/adversarial_dojo:/tmp/adversarial_dojo \       # project + uv-managed standalone-Python venv
  --mount "$HOME/.local/bin/uv:$HOME/.local/bin/uv:ro" \     # uv binary
  --mount /tmp/awf-benchmark.toml:/tmp/awf-benchmark.toml:ro \
  --mount /tmp/awf-benchmark:/tmp/awf-benchmark:ro \
  --mount /tmp/gh-aw/agent/awf:/tmp/gh-aw/agent/awf \        # writable output dir
  --container-workdir /tmp/adversarial_dojo \                 # so uv locates the venv
  --env "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \
  --env "OPENAI_API_KEY=$OPENAI_API_KEY" \
  -- "$HOME/.local/bin/uv" run adversarial-dojo search-attacks ...

Replace sudo + host-side cd with explicit --env forwarding; sudo drops env vars by default, so API keys never reached the container agents.
uv's venv contains a self-contained Python binary (from python-build-standalone), so no system Python is required in the minimal container image.

Tests (red-team-benchmark-workflow.test.ts)

Assert that the source workflow and compiled lock file contain the new --mount flags, --container-workdir, and explicit --env key forwarding.

Mount the adversarial_dojo project dir (with its uv-managed standalone Python venv), uv binary, benchmark config files, and output dir into the AWF container via --mount so the benchmark tooling is available inside the minimal container image. Add explicit --env flags for ANTHROPIC_API_KEY and OPENAI_API_KEY so adversarial_dojo agents can authenticate inside the container. Set --container-workdir /tmp/adversarial_dojo so uv finds the project venv. Remove the now-redundant host-side cd. Update tests to verify the new mount and env flags. Fixes: red-team benchmark run 26792842801 (adversarial-dojo not found, API keys not forwarded to AWF container)

github-actions · 2026-06-02T04:28:41Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	96.35%	96.40%	📈 +0.05%
Statements	96.21%	96.25%	📈 +0.04%
Functions	98.27%	98.27%	➡️ +0.00%
Branches	90.42%	90.46%	📈 +0.04%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/config-writer.ts`	89.3% → 90.9% (+1.65%)	89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

github-actions · 2026-06-02T04:30:08Z

Smoke Test: Claude Engine

✅ GitHub API: 2 PR entries found
✅ GitHub check: playwright_check=PASS
✅ File verify: smoke-test-claude-26795946958.txt exists

Result: PASS

💥 [THE END] — Illustrated by Smoke Claude

github-actions · 2026-06-02T04:30:39Z

🔬 Smoke Test Results

Test	Status
GitHub MCP connectivity	✅
GitHub.com HTTP connectivity	⚠️ unresolved (pre-step data not injected)
File write/read	⚠️ unresolved (pre-step data not injected)

Overall: FAIL — pre-computed smoke data (steps.smoke-data.outputs.*) was not resolved at runtime.

PR by @Copilot · Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-06-02T04:30:43Z

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Test	Result
GitHub MCP (list PRs)	✅ PR #4176 returned
GitHub.com connectivity	⚠️ Template var unexpanded
File write/read	⚠️ Template var unexpanded
BYOK inference (this response)	✅

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com

Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL PASS (BYOK + MCP ✅; pre-step template vars not expanded)

🔑 BYOK report filed by Smoke Copilot BYOK

Copilot

Pull request overview

Fixes the weekly red-team benchmark AWF-protected run by ensuring the adversarial_dojo tooling (including its uv-managed environment) is available inside the minimal AWF container, and by explicitly forwarding required API keys into the containerized run.

Changes:

Mount /tmp/adversarial_dojo, uv binary, benchmark config inputs, and the AWF output directory into the AWF container; set --container-workdir so uv locates the venv.
Forward ANTHROPIC_API_KEY and OPENAI_API_KEY to the containerized benchmark run via awf --env ....
Extend CI tests to assert presence of the new workflow flags in both the source workflow and compiled lock file.

Show a summary per file

File	Description
scripts/ci/red-team-benchmark-workflow.test.ts	Adds assertions that the workflow/lock include the new mounts, workdir, and env forwarding flags.
.github/workflows/red-team-benchmark.md	Updates the AWF-protected benchmark step to mount required paths/binaries and forward API keys into the AWF container.
.github/workflows/red-team-benchmark.lock.yml	Regenerates the compiled lock workflow to reflect the new AWF invocation flags.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 3/3 changed files
Comments generated: 3

+    // adversarial_dojo tooling is mounted into the AWF container
+    expect(source).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo');
+    expect(source).toContain('--mount /tmp/awf-benchmark.toml:/tmp/awf-benchmark.toml:ro');
+    expect(source).toContain('--mount /tmp/gh-aw/agent/awf:/tmp/gh-aw/agent/awf');
+    expect(source).toContain('--container-workdir /tmp/adversarial_dojo');


+    // adversarial_dojo mounts compiled into lock
+    expect(lock).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo');
+    expect(lock).toContain('--container-workdir /tmp/adversarial_dojo');
+


+          --container-workdir /tmp/adversarial_dojo \
+          --env "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \
+          --env "OPENAI_API_KEY=$OPENAI_API_KEY" \


github-actions · 2026-06-02T04:32:11Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	1/1 passed	✅ PASS
Go	env	✅	1/1 passed	✅ PASS
Go	uuid	✅	1/1 passed	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	All passed	✅ PASS
Node.js	execa	✅	All passed	✅ PASS
Node.js	p-limit	✅	All passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #4187 · sonnet46 1M · ◷

github-actions · 2026-06-02T04:32:20Z

Gemini Smoke Test Results

GitHub MCP Testing: ✅
GitHub.com Connectivity: ❌
File Writing Testing: ✅
Bash Tool Testing: ✅

Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

github-actions · 2026-06-02T04:33:08Z

Smoke Test Results: ❌ FAIL

Check	Result
Redis PING	❌ Connection timed out / not reachable
PostgreSQL pg_isready	❌ No response on port 5432
PostgreSQL SELECT 1	❌ Connection timed out

host.docker.internal is not resolvable in this environment (agent is not running inside a Docker container), and 127.0.0.1 shows no services bound on ports 6379 or 5432. Service containers are not accessible from this runner context.

🔌 Service connectivity validated by Smoke Services

The lock file was stale after PR #4187 changed the .md source without a full recompile (frontmatter hash mismatch). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Initial plan

a85fe92

Copilot AI assigned Copilot and lpcox Jun 2, 2026

Copilot started work on behalf of lpcox June 2, 2026 02:56 View session

Copilot AI linked an issue Jun 2, 2026 that may be closed by this pull request

[Red-Team Benchmark] AWF Red-Team Benchmark — 2026-06-02 — ⏭️ SKIPPED #4186

Closed

Copilot AI changed the title ~~[WIP] Fix adversarial_dojo benchmark execution for AWF~~ fix(benchmark): mount adversarial_dojo into AWF container and pass API keys Jun 2, 2026

Copilot finished work on behalf of lpcox June 2, 2026 03:11

Copilot AI requested a review from lpcox June 2, 2026 03:11

lpcox marked this pull request as ready for review June 2, 2026 04:27

Copilot AI review requested due to automatic review settings June 2, 2026 04:27

Copilot started reviewing on behalf of lpcox June 2, 2026 04:27 View session

github-actions Bot mentioned this pull request Jun 2, 2026

[aw] Smoke Codex failed #4188

Open

github-actions Bot added the smoke-claude label Jun 2, 2026

Copilot AI reviewed Jun 2, 2026

View reviewed changes

github-actions Bot added the build-test label Jun 2, 2026

lpcox merged commit b092478 into main Jun 2, 2026
59 of 64 checks passed

lpcox deleted the copilot/red-team-benchmark-failure-resolution branch June 2, 2026 04:49

lpcox mentioned this pull request Jun 2, 2026

fix: recompile red-team-benchmark lock file #4191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmark): mount adversarial_dojo into AWF container and pass API keys#4187

fix(benchmark): mount adversarial_dojo into AWF container and pass API keys#4187
lpcox merged 2 commits into
mainfrom
copilot/red-team-benchmark-failure-resolution

Copilot AI commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

github-actions Bot commented Jun 2, 2026

✅ Coverage Check Passed

Overall Coverage

Uh oh!

github-actions Bot commented Jun 2, 2026

Smoke Test: Claude Engine

Uh oh!

github-actions Bot commented Jun 2, 2026

🔬 Smoke Test Results

Uh oh!

github-actions Bot commented Jun 2, 2026

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 2, 2026

🏗️ Build Test Suite Results

Uh oh!

github-actions Bot commented Jun 2, 2026

Gemini Smoke Test Results

Uh oh!

github-actions Bot commented Jun 2, 2026

Smoke Test Results: ❌ FAIL

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 2, 2026 •

edited

Loading