Skip to content

EACCES on /tmp/gh-aw/mcp-logs — no ownership repair between workflow runs #19018

@samuelkahessay

Description

@samuelkahessay

What happens

When a workflow run leaves /tmp/gh-aw/mcp-logs/ owned by a different UID (e.g., root from a container execution), subsequent runs fail because mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs or file writes into that directory hit EACCES. The failure can manifest in two ways depending on which step hits the bad permissions first:

  1. Subdirectory creation fails at the mkdir -p step, blocking the workflow
  2. File logging is silently disabled — mcp_server_core.cjs:75 and :104 swallow write failures, so MCP logs and artifacts are lost without any error

The first case blocks the run. The second case degrades it — the workflow completes but MCP telemetry and artifact collection are silently skipped.

What should happen

The startup scripts should repair ownership/permissions on /tmp/gh-aw/mcp-logs/ for the runtime user before use, the same way install_copilot_cli.sh already does for /home/runner/.copilot. File write failures in the MCP server should be surfaced, not swallowed.

Where in the code

All references are to main at 2d91393f3.

Directory creation (no ownership repair):

  • start_mcp_gateway.sh:31-32mkdir -p /tmp/gh-aw/mcp-logs and mkdir -p /tmp/gh-aw/mcp-config with no chmod/chown
  • mcp_setup_generator.go:196 — generates mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs in workflow YAML, no ownership fix
  • mcp_setup_generator.go:472 — same for /tmp/gh-aw/mcp-logs/playwright

File writes that fail silently on bad permissions:

  • safe_outputs_mcp_server.cjs:34 — safe-outputs server writes logs
  • mcp_server_core.cjs:75 — swallows file-write failures
  • mcp_server_core.cjs:104 — same pattern, swallows write failures

Environment wiring:

  • compiler_activation_jobs.go:916 — sets GH_AW_MCP_LOG_DIR=/tmp/gh-aw/mcp-logs/safeoutputs

The fix pattern that already exists:

Evidence

Production:

  • EACCES on /tmp/gh-aw/mcp-logs/rpc-messages.jsonl blocked workflow run (v0.50.7)
  • Run 22497663869: workflow failed at MCP log access

Source-level verification (2026-03-01, main at 2d91393f3):

  • Confirmed no chmod, chown, or cleanup logic exists for /tmp/gh-aw/mcp-logs/ in any startup script
  • Confirmed mcp_server_core.cjs:75 and :104 swallow write failures
  • Confirmed the ownership repair pattern is already applied for Copilot at install_copilot_cli.sh:25

Proposed fix

Repair ownership/permissions for the runtime user on /tmp/gh-aw/mcp-logs/ after mkdir -p in start_mcp_gateway.sh, following the same pattern as install_copilot_cli.sh:25 (which does this for /home/runner/.copilot).

Additionally, mcp_server_core.cjs:75 and :104 should surface file-write failures rather than swallowing them — at minimum log an error so operators can diagnose permission issues.

Impact

Frequency: Intermittent — depends on whether prior runs left stale directories with mismatched ownership. More likely in repos with high workflow concurrency or container-based engines.
Cost: High when it hits — either blocks the run at subdirectory creation, or silently disables MCP logging/artifact collection. Debugging requires manual inspection of Actions runner state because the swallowed write failures produce no error output.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions