[Security] Error messages leak filesystem paths / stack frames in user-facing responses

## Severity / Size
- Severity: MEDIUM
- Size: M
- Threat model: HTTP layer faces the public internet. An attacker triggers an error path (malformed input, missing resource, etc.) and reads internal file paths, function names, or third-party library identifiers from the response body.

## Affected files
- `src/http/backend/FastifyBackend.ts:188-194` — `writeError` puts `err.message` directly in the response body for non-`HttpError` exceptions; the 500 branch even names `'Internal Server Error'` alongside the raw `message`.
- `src/http/backend/HonoBackend.ts:121-139` — same shape.
- `src/http/backend/ExpressBackend.ts:214-233` — same shape.
- `src/cluster/ClusterClientReceptionist.ts:138-141` — `err.message` flows over the wire as the ask-reply body when a remote ask fails.
- `src/persistence/migration/SchemaRegistry.ts:161` — `err.message` flows into log lines that may include source-tree paths from `Error.stack`.

## Background

Every HTTP backend has the same exception path: catch any thrown error, send `{ error, message: err.message }` as JSON to the client. That `err.message` very often contains:

- File paths: `Error: ENOENT: no such file or directory, open '/srv/app/data/users.db'`
- Library identifiers: `TypeError: cassandra-driver: query timeout, marker=foo`
- Internal symbol names: `Invariant: ShardCoordinator.handle: expected 'allocated' but got 'rebalancing'`
- Stack frames if the user code passes `.stack` instead of `.message` (some libraries do `throw new Error(stack)`)

Apart from leaking framework internals to attackers (helps them fingerprint and target specific code paths), this can also leak **secrets**: error messages from misconfigured DB drivers sometimes include connection strings (`Error: getaddrinfo ENOTFOUND postgres://prod-user:hunter2@db.internal:5432`). Anything sensitive that ends up in `err.message` flows straight to the client.

The OWASP umbrella is A04:2021 (Insecure Design) → A05 (Security Misconfiguration) → A09 (Security Logging and Monitoring Failures). The defensive default is: **log the full error server-side, return a generic message to the client, optionally include a correlation ID so ops can pull the full error from logs.**

## Exploit walkthrough

**Step 1 — Attacker probes a known endpoint pattern:**
```
GET /api/users/non-existent-id
```

**Step 2 — Application code paths through a backend call that throws:**
```typescript
// inside the handler:
const user = await db.users.find(id);
if (!user) throw new Error(`User ${id} not found in shard ${shardOf(id)} (backend=cassandra, ks=app_prod)`);
```

**Step 3 — Without sanitisation:** backend's `writeError` emits to the client:
```json
{ "error": "Internal Server Error", "message": "User non-existent-id not found in shard 7 (backend=cassandra, ks=app_prod)" }
```

**Step 4 — Attacker now knows:** there's a Cassandra backend, the keyspace is `app_prod`, shard count is at least 8, and the shard-mapping function is deterministic on user-id. That's enough fingerprint to design a targeted DoS (collide on a single shard) or to know which DB driver to look for CVEs in.

**Step 5 — Escalation: malformed JWT** triggers `jsonwebtoken: invalid signature, expected RS256 got HS256` → attacker learns the JWT algorithm and library version. Pivot.

**Realistic worst case:** stack frames flow over the wire (some libraries throw `Error(stack)` literally; some user code does `throw err; }, } catch (e) { throw new Error(e.stack); }`). Attacker reads `at /srv/app/node_modules/cassandra-driver/lib/connection.js:432:23` → learns exact library version + exact deploy path → targets `node_modules/cassandra-driver/` with known CVE.

## How the 8 already-landed security fixes inform this

- **Wire-frame DoS cap** — generic error message on rejection (`wire frame claims length X > maxFrameBytes Y`) doesn't include any deployment-specific info. Same shape here: error messages flowing to the client should be sanitised to generic, sanitised text.
- **Hello-handshake hijack defence** — rejected with a clear error to *logs* but only a generic 'connection terminated' to the peer. Same separation: detailed server-side, generic client-side.
- **Idempotency body-fingerprint** — verified the body matched, but didn't leak *how* it differed (just rejected the request). Same shape: don't tell attackers what they got wrong, beyond what they need to know to fix it legitimately.

The shape across all three: **error messages are part of the security surface; treat them like any other output.**

## Fix design

**Track 1 — Sanitiser at backend write-error boundary (primary).** Each backend's `writeError` runs the error through a sanitiser that:
1. For `HttpError`: keep `err.message` (it's caller-authored — caller is responsible for not leaking).
2. For any other `Error`: replace with `'Internal Server Error'`. Attach a `correlationId` (short ULID/UUID) that's also logged server-side with the full error.
3. Strip any field from `err.extra` that looks like a path (`/^([A-Z]:[\\/]|\.\.?\/|\/)/`) or a stack frame (`at\s+.+:\d+:\d+`).

```typescript
function sanitiseErrorForClient(err: unknown): { status: number; body: object; correlationId: string } {
  const correlationId = generateShortUlid();
  if (err instanceof HttpError) {
    return { status: err.status, body: { error: err.message, ...stripDangerousExtras(err.extra) }, correlationId };
  }
  return {
    status: 500,
    body: { error: 'Internal Server Error', correlationId },
    correlationId,
  };
}

function stripDangerousExtras(extra?: Readonly<Record<string, unknown>>): Record<string, unknown> {
  if (!extra) return {};
  const out: Record<string, unknown> = {};
  for (const [k, v] of Object.entries(extra)) {
    if (typeof v === 'string' && looksLikePath(v)) continue;
    if (typeof v === 'string' && looksLikeStackFrame(v)) continue;
    out[k] = v;
  }
  return out;
}
```

**Track 2 — Strict-mode opt-out.** Some apps (internal tools, dev environments) WANT verbose errors. Provide `new FastifyBackend({ errorVerbosity: 'verbose' | 'sanitised' })`. Default: `'sanitised'`.

The opt-in for verbose mode also logs a `console.warn('errorVerbosity=verbose — do not enable in production-facing deployments')` once at construction.

**Track 3 — Server-side log emission of full error.** Every `writeError` invocation logs (at `error` level via the system logger) `{ correlationId, err.message, err.stack, err.code }`. The correlationId in the client response lets ops match `support ticket "Got correlationId abc-123"` → the full server-side log line.

```typescript
private writeError(reply: FastifyReply, err: unknown): void {
  const { status, body, correlationId } = sanitiseErrorForClient(err);
  this.systemLog.error({ correlationId, err }, 'HTTP handler threw');
  reply.status(status).send(body);
}
```

(Each backend gets a `systemLog: Logger` from the `HttpExtension` setup so the error landing context is preserved.)

**Track 4 — Cluster-internal: same sanitiser at `ClusterClientReceptionist:138-141`.** Remote ask failures should not flow exception messages back over the wire. Reply with `{ error: 'internal error', correlationId }` and log full detail at the receiving cluster node.

## API surface

```typescript
export interface FastifyBackendOptions {
  // existing fields...
  /**
   * Controls how unhandled-exception responses are rendered to the client.
   *   - 'sanitised' (default): generic 'Internal Server Error' + correlationId.
   *     Full error logged server-side at error level.
   *   - 'verbose': raw err.message + (in development) stack. Server-side log
   *     unchanged. WARNING: do not enable in deployments that face untrusted
   *     clients.
   */
  readonly errorVerbosity?: 'sanitised' | 'verbose';
}
```

Same shape on `HonoBackendOptions` and `ExpressBackendOptions`.

## Backward compatibility

**Breaking** for clients that parse the full `err.message` out of 500 responses (rare, but happens with internal-tool front-ends). Mitigation: `errorVerbosity: 'verbose'` opt-in.

Document in CHANGELOG under "Security defaults: HTTP error sanitisation". Note that `HttpError(...)` thrown explicitly continues to flow through unchanged — only *unhandled* exceptions get sanitised.

## Test plan

1. **Sanitisation default** — throw `new Error('User /etc/passwd not found')` from a handler, verify the response body is `{ error: 'Internal Server Error', correlationId: '...' }` (no path, no original message).
2. **HttpError pass-through** — `throw new HttpError(404, 'User abc not found')` → response body is `{ error: 'User abc not found' }` (caller-authored, kept).
3. **Server-side log emission** — same scenario as Test 1, verify the system logger received an `error`-level entry with the full err.message + stack + matching correlationId.
4. **Verbose mode opt-in** — `new FastifyBackend({ errorVerbosity: 'verbose' })`, throw `new Error('foo')`, response body is `{ error: 'Internal Server Error', message: 'foo' }`.
5. **Path-stripping in HttpError.extra** — `throw new HttpError(400, 'bad', { path: '/etc/passwd', userField: 'alice' })`, verify response body has `userField: 'alice'` but no `path`.
6. **Cross-backend parity** — same exception through Fastify / Hono / Express produces the same sanitised body (same correlationId format, same fields).
7. **ClusterClientReceptionist** — remote ask throws, reply body to caller is `{ error: 'internal error', correlationId }`, not the original `err.message`.

## Acceptance criteria

- [ ] All three HTTP backends sanitise non-HttpError exceptions by default.
- [ ] Server-side log emission with full error + correlationId works.
- [ ] `errorVerbosity: 'verbose'` opts out and triggers a one-shot warning at construction.
- [ ] `HttpError` continues to pass through unchanged (caller-authored messages).
- [ ] `ClusterClientReceptionist` ask-reply sanitised on failure.
- [ ] CHANGELOG entry + README "Known security caveats" updated.
- [ ] Test suite covers the sanitiser with at least 7 cases.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Error messages leak filesystem paths / stack frames in user-facing responses #130

Severity / Size

Affected files

Background

Exploit walkthrough

How the 8 already-landed security fixes inform this

Fix design

API surface

Backward compatibility

Test plan

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Security] Error messages leak filesystem paths / stack frames in user-facing responses #130

Description

Severity / Size

Affected files

Background

Exploit walkthrough

How the 8 already-landed security fixes inform this

Fix design

API surface

Backward compatibility

Test plan

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions