[policy] Add policy.fetch_failure: warn|block schema knob for fail-closed enforcement (follow-up to #827)

## Context

In #827 we shipped install-time policy enforcement for APM. When the policy fetch **fails** — cache miss combined with network failure, malformed YAML response, or garbage HTTP response — the current behavior is **fail-open**: the install proceeds and a warning is logged.

The three `PolicyFetchResult` outcomes that currently fail-open are:

- `cache_miss_fetch_fail` — no cached policy and the remote fetch failed (network error, timeout, DNS).
- `malformed` — a response was received but could not be parsed as valid policy YAML.
- `garbage_response` — the HTTP response was non-YAML (HTML error page, empty body, etc.).

This was a deliberate v1 design decision to avoid bricking developer environments when an org policy server is flaky or unreachable. For enterprises that require **fail-closed** behavior (no install proceeds without a verified policy verdict), we need a configurable knob.

## Proposal

Add a `fetch_failure` key under the `policy` section of `apm-policy.yml`:

```yaml
policy:
  fetch_failure: warn  # default — current behavior (fail-open)
  # fetch_failure: block  # fail-closed — install exits non-zero on fetch failure
```

| Value | Behavior |
|-------|----------|
| `warn` (default) | Log a warning and proceed with install. Current behavior, unchanged. |
| `block` | Exit non-zero with a clear error message. No packages are installed. |

**Important:** `cache_stale` (a cached policy within `MAX_STALE_TTL`) is **unaffected** by this knob — a stale-but-valid cache is always used regardless of the `fetch_failure` setting. This knob only governs the three hard-failure outcomes listed above.

## Acceptance criteria

- [ ] Schema validation accepts both `warn` and `block` as valid values for `policy.fetch_failure`; rejects anything else.
- [ ] Default value is `warn` — no behavioral change for existing users.
- [ ] When set to `block`, any of the three failure outcomes (`cache_miss_fetch_fail`, `malformed`, `garbage_response`) causes install to exit non-zero with a clear, actionable error message (e.g., "Policy fetch failed and fetch_failure is set to block. Cannot proceed.").
- [ ] `cache_stale` (within `MAX_STALE_TTL`) is unaffected — always uses the cached policy regardless of this setting.
- [ ] Documentation page updated to describe the knob, default, and enterprise use case.
- [ ] Unit tests cover both `warn` and `block` values across all three failure outcomes.
- [ ] Integration test confirms `block` mode exits non-zero and `warn` mode logs warning but exits zero.

## Open questions

1. **`apm install --dry-run`** — Should `block` mode also prevent dry-run from completing? Or should dry-run always succeed (since it does not actually install anything)?
2. **`--no-policy` bypass** — Should `--no-policy` still bypass enforcement even in `block` mode? **CEO recommendation: yes.** The escape hatch is the escape hatch — if an engineer explicitly opts out with `--no-policy`, that intent should be respected even in strict mode. The audit trail (CLI logs) already records the bypass.

## Out of scope

- Non-GitHub VCS policy hosting (e.g., GitLab, Bitbucket policy sources) — separate feature.
- Signing or integrity verification of the policy file itself — separate security issue.
- Granular per-rule failure modes (e.g., some rules fail-open, others fail-closed) — future iteration if demand materializes.

---

_Follow-up to #827. Filed per CEO mandate during panel review._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[policy] Add policy.fetch_failure: warn|block schema knob for fail-closed enforcement (follow-up to #827) #829

Context

Proposal

Acceptance criteria

Open questions

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Value	Behavior
`warn` (default)	Log a warning and proceed with install. Current behavior, unchanged.
`block`	Exit non-zero with a clear error message. No packages are installed.

[policy] Add policy.fetch_failure: warn|block schema knob for fail-closed enforcement (follow-up to #827) #829

Description

Context

Proposal

Acceptance criteria

Open questions

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions