EACCES on /tmp/gh-aw/mcp-logs — no ownership repair between workflow runs

## What happens

When a workflow run leaves `/tmp/gh-aw/mcp-logs/` owned by a different UID (e.g., root from a container execution), subsequent runs fail because `mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs` or file writes into that directory hit EACCES. The failure can manifest in two ways depending on which step hits the bad permissions first:

1. Subdirectory creation fails at the `mkdir -p` step, blocking the workflow
2. File logging is silently disabled — `mcp_server_core.cjs:75` and `:104` swallow write failures, so MCP logs and artifacts are lost without any error

The first case blocks the run. The second case degrades it — the workflow completes but MCP telemetry and artifact collection are silently skipped.

## What should happen

The startup scripts should repair ownership/permissions on `/tmp/gh-aw/mcp-logs/` for the runtime user before use, the same way `install_copilot_cli.sh` already does for `/home/runner/.copilot`. File write failures in the MCP server should be surfaced, not swallowed.

## Where in the code

All references are to `main` at `2d91393f3`.

**Directory creation (no ownership repair):**
- `start_mcp_gateway.sh:31-32` — `mkdir -p /tmp/gh-aw/mcp-logs` and `mkdir -p /tmp/gh-aw/mcp-config` with no `chmod`/`chown`
- `mcp_setup_generator.go:196` — generates `mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs` in workflow YAML, no ownership fix
- `mcp_setup_generator.go:472` — same for `/tmp/gh-aw/mcp-logs/playwright`

**File writes that fail silently on bad permissions:**
- `safe_outputs_mcp_server.cjs:34` — safe-outputs server writes logs
- `mcp_server_core.cjs:75` — swallows file-write failures
- `mcp_server_core.cjs:104` — same pattern, swallows write failures

**Environment wiring:**
- `compiler_activation_jobs.go:916` — sets `GH_AW_MCP_LOG_DIR=/tmp/gh-aw/mcp-logs/safeoutputs`

**The fix pattern that already exists:**
- `install_copilot_cli.sh:25` — repairs ownership before Copilot CLI install (commit `ada84f04f`, PR #13980)

## Evidence

**Production:**
- EACCES on `/tmp/gh-aw/mcp-logs/rpc-messages.jsonl` blocked workflow run (v0.50.7)
- Run 22497663869: workflow failed at MCP log access

**Source-level verification (2026-03-01, main at `2d91393f3`):**
- Confirmed no `chmod`, `chown`, or cleanup logic exists for `/tmp/gh-aw/mcp-logs/` in any startup script
- Confirmed `mcp_server_core.cjs:75` and `:104` swallow write failures
- Confirmed the ownership repair pattern is already applied for Copilot at `install_copilot_cli.sh:25`

## Proposed fix

Repair ownership/permissions for the runtime user on `/tmp/gh-aw/mcp-logs/` after `mkdir -p` in `start_mcp_gateway.sh`, following the same pattern as `install_copilot_cli.sh:25` (which does this for `/home/runner/.copilot`).

Additionally, `mcp_server_core.cjs:75` and `:104` should surface file-write failures rather than swallowing them — at minimum log an error so operators can diagnose permission issues.

## Impact

**Frequency:** Intermittent — depends on whether prior runs left stale directories with mismatched ownership. More likely in repos with high workflow concurrency or container-based engines.
**Cost:** High when it hits — either blocks the run at subdirectory creation, or silently disables MCP logging/artifact collection. Debugging requires manual inspection of Actions runner state because the swallowed write failures produce no error output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EACCES on /tmp/gh-aw/mcp-logs — no ownership repair between workflow runs #19018

What happens

What should happen

Where in the code

Evidence

Proposed fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EACCES on /tmp/gh-aw/mcp-logs — no ownership repair between workflow runs #19018

Description

What happens

What should happen

Where in the code

Evidence

Proposed fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions