Skip to content

fix(benchmark): mount adversarial_dojo into AWF container and pass API keys#4187

Merged
lpcox merged 2 commits into
mainfrom
copilot/red-team-benchmark-failure-resolution
Jun 2, 2026
Merged

fix(benchmark): mount adversarial_dojo into AWF container and pass API keys#4187
lpcox merged 2 commits into
mainfrom
copilot/red-team-benchmark-failure-resolution

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 2, 2026

The weekly red-team benchmark was crashing before any attack attempts completed: the AWF-protected run failed with adversarial-dojo / No such file or directory because the minimal AWF container has no uv/Python/venv, and API keys set in the step env: block were silently dropped by sudo.

Changes

AWF-protected benchmark run (red-team-benchmark.md / .lock.yml)

  • Add --mount flags so adversarial_dojo tooling is available inside the container:
    sudo awf \
      --mount /tmp/adversarial_dojo:/tmp/adversarial_dojo \       # project + uv-managed standalone-Python venv
      --mount "$HOME/.local/bin/uv:$HOME/.local/bin/uv:ro" \     # uv binary
      --mount /tmp/awf-benchmark.toml:/tmp/awf-benchmark.toml:ro \
      --mount /tmp/awf-benchmark:/tmp/awf-benchmark:ro \
      --mount /tmp/gh-aw/agent/awf:/tmp/gh-aw/agent/awf \        # writable output dir
      --container-workdir /tmp/adversarial_dojo \                 # so uv locates the venv
      --env "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \
      --env "OPENAI_API_KEY=$OPENAI_API_KEY" \
      -- "$HOME/.local/bin/uv" run adversarial-dojo search-attacks ...
  • Replace sudo + host-side cd with explicit --env forwarding; sudo drops env vars by default, so API keys never reached the container agents.
  • uv's venv contains a self-contained Python binary (from python-build-standalone), so no system Python is required in the minimal container image.

Tests (red-team-benchmark-workflow.test.ts)

  • Assert that the source workflow and compiled lock file contain the new --mount flags, --container-workdir, and explicit --env key forwarding.

Mount the adversarial_dojo project dir (with its uv-managed standalone
Python venv), uv binary, benchmark config files, and output dir into the
AWF container via --mount so the benchmark tooling is available inside
the minimal container image.

Add explicit --env flags for ANTHROPIC_API_KEY and OPENAI_API_KEY so
adversarial_dojo agents can authenticate inside the container.

Set --container-workdir /tmp/adversarial_dojo so uv finds the project
venv. Remove the now-redundant host-side cd.

Update tests to verify the new mount and env flags.

Fixes: red-team benchmark run 26792842801 (adversarial-dojo not found,
API keys not forwarded to AWF container)
Copilot AI changed the title [WIP] Fix adversarial_dojo benchmark execution for AWF fix(benchmark): mount adversarial_dojo into AWF container and pass API keys Jun 2, 2026
Copilot finished work on behalf of lpcox June 2, 2026 03:11
Copilot AI requested a review from lpcox June 2, 2026 03:11
@lpcox lpcox marked this pull request as ready for review June 2, 2026 04:27
Copilot AI review requested due to automatic review settings June 2, 2026 04:27
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.35% 96.40% 📈 +0.05%
Statements 96.21% 96.25% 📈 +0.04%
Functions 98.27% 98.27% ➡️ +0.00%
Branches 90.42% 90.46% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 89.3% → 90.9% (+1.65%) 89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Smoke Test: Claude Engine

  • ✅ GitHub API: 2 PR entries found
  • ✅ GitHub check: playwright_check=PASS
  • ✅ File verify: smoke-test-claude-26795946958.txt exists

Result: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🔬 Smoke Test Results

Test Status
GitHub MCP connectivity
GitHub.com HTTP connectivity ⚠️ unresolved (pre-step data not injected)
File write/read ⚠️ unresolved (pre-step data not injected)

Overall: FAIL — pre-computed smoke data (steps.smoke-data.outputs.*) was not resolved at runtime.

PR by @Copilot · Assignees: @lpcox, @Copilot

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
GitHub MCP (list PRs) ✅ PR #4176 returned
GitHub.com connectivity ⚠️ Template var unexpanded
File write/read ⚠️ Template var unexpanded
BYOK inference (this response)

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com

Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL PASS (BYOK + MCP ✅; pre-step template vars not expanded)

🔑 BYOK report filed by Smoke Copilot BYOK

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the weekly red-team benchmark AWF-protected run by ensuring the adversarial_dojo tooling (including its uv-managed environment) is available inside the minimal AWF container, and by explicitly forwarding required API keys into the containerized run.

Changes:

  • Mount /tmp/adversarial_dojo, uv binary, benchmark config inputs, and the AWF output directory into the AWF container; set --container-workdir so uv locates the venv.
  • Forward ANTHROPIC_API_KEY and OPENAI_API_KEY to the containerized benchmark run via awf --env ....
  • Extend CI tests to assert presence of the new workflow flags in both the source workflow and compiled lock file.
Show a summary per file
File Description
scripts/ci/red-team-benchmark-workflow.test.ts Adds assertions that the workflow/lock include the new mounts, workdir, and env forwarding flags.
.github/workflows/red-team-benchmark.md Updates the AWF-protected benchmark step to mount required paths/binaries and forward API keys into the AWF container.
.github/workflows/red-team-benchmark.lock.yml Regenerates the compiled lock workflow to reflect the new AWF invocation flags.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment on lines +72 to +76
// adversarial_dojo tooling is mounted into the AWF container
expect(source).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo');
expect(source).toContain('--mount /tmp/awf-benchmark.toml:/tmp/awf-benchmark.toml:ro');
expect(source).toContain('--mount /tmp/gh-aw/agent/awf:/tmp/gh-aw/agent/awf');
expect(source).toContain('--container-workdir /tmp/adversarial_dojo');
Comment on lines +154 to +157
// adversarial_dojo mounts compiled into lock
expect(lock).toContain('--mount /tmp/adversarial_dojo:/tmp/adversarial_dojo');
expect(lock).toContain('--container-workdir /tmp/adversarial_dojo');

Comment on lines +193 to +195
--container-workdir /tmp/adversarial_dojo \
--env "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \
--env "OPENAI_API_KEY=$OPENAI_API_KEY" \
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #4187 · sonnet46 1M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Gemini Smoke Test Results

  • GitHub MCP Testing: ✅
  • GitHub.com Connectivity: ❌
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Smoke Test Results: ❌ FAIL

Check Result
Redis PING ❌ Connection timed out / not reachable
PostgreSQL pg_isready ❌ No response on port 5432
PostgreSQL SELECT 1 ❌ Connection timed out

host.docker.internal is not resolvable in this environment (agent is not running inside a Docker container), and 127.0.0.1 shows no services bound on ports 6379 or 5432. Service containers are not accessible from this runner context.

🔌 Service connectivity validated by Smoke Services

@lpcox lpcox merged commit b092478 into main Jun 2, 2026
59 of 64 checks passed
@lpcox lpcox deleted the copilot/red-team-benchmark-failure-resolution branch June 2, 2026 04:49
lpcox added a commit that referenced this pull request Jun 2, 2026
The lock file was stale after PR #4187 changed the .md source without
a full recompile (frontmatter hash mismatch).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Red-Team Benchmark] AWF Red-Team Benchmark — 2026-06-02 — ⏭️ SKIPPED

3 participants