Phase 4: Policy versioning + audit replay

## Context

Policies change. A vendor tightens their default spend cap; a parent loosens family-memory access; a regulation forces a new constraint. Without versioning, every change is destructive — old audit events can't be re-evaluated against the new policy, and there's no "what would have happened" sandbox.

This issue ships **policy versioning** (every policy update creates an immutable version with a timestamp) plus **audit replay** (given a time window + target policy version, recompute what the decisions WOULD have been).

Per [`milestones-roadmap.md` §5](https://github.com/litentry/agentKeys/blob/main/docs/spec/plans/milestones-roadmap.md), this is M4 depth: regulator-grade reproducibility + safe policy iteration for vendors.

## Scope (M4)

### Policy versioning

- Every policy update (per-vendor template, per-actor override, system-default) creates a new version
- Version metadata: `policy_id`, `version_number`, `timestamp`, `actor_who_changed_it`, `change_summary`
- Old versions retained immutably; can be referenced by version number
- Policy version is recorded on every audit row that the policy applied to

### Audit replay endpoint

- Endpoint: `POST /v1/audit/replay { time_window, target_policy_version, actor_omni? }`
- For each event in the window: re-evaluate the policy at the target version + report "what would the decision have been?"
- Returns: `{ event_id, original_decision, simulated_decision, divergence: bool, simulated_reason? }`
- Aggregated view: how many events would have flipped under the new policy

### Use cases

- **Vendor evaluating a stricter policy before deploying it**: "If I drop the default payment cap from ¥500 to ¥300, how many devices would have hit approval-required last month?"
- **Parent reviewing "if I had set this limit yesterday, how many requests would have been denied?"**
- **Regulator export with policy version stamp on every event** — supports compliance reconstruction

### Diff report

For a replay run, a diff report shows:
- Total events
- Events where decision flipped (was approved → now denied; or vice versa)
- Per-event detail accessible from the audit dashboard (#115)

## Out of scope (defer)

- Auto-rollout of new policy versions (M5 — for M4, vendor manually triggers deployment after reviewing the diff)
- ML-suggested policy changes ("you might want to tighten X based on patterns") — M5
- Cross-vendor policy comparison (M5 — privacy-bound)
- Policy version branching ("test policy version 7.2 on actor X only") — M5

## Acceptance criteria

- [ ] Vendor can update a policy via the vendor portal (#113); a new version is created; the version metadata is auditable
- [ ] Parent / vendor can replay last 7 days of events under a new candidate policy and see the diff report
- [ ] Replay correctly handles delegation chains (#121) — re-evaluating a delegated cap re-applies the chain's policy at the target version
- [ ] Replay is **read-only** — never mutates the audit chain or any cap-tokens (cannot be confused with actual policy enforcement)
- [ ] Audit dashboard (#115) renders the policy version stamp on every event; clicking shows the policy version's definition
- [ ] Regulator export (PDF from #115) includes policy version stamps + footer reference to the policy archive
- [ ] Performance: replay 10,000 events under a new policy in < 30 seconds

## Risks

| Risk | Mitigation |
|---|---|
| Replay computation is too slow under load | Background job for large replays; UI shows "computing…" with ETA; result cached for re-fetch |
| Policy version storage grows unbounded | Versions are immutable text blobs; cheap to store; retention policy: keep all for 3 years (regulator alignment) |
| Replay's simulated decision differs from actual decision in non-obvious ways (cross-version race conditions) | Replay treats the target policy as the SOLE policy for that run — no half-and-half states; documented |
| Vendor accidentally deploys a policy version that breaks all their devices | Vendor portal requires explicit "Deploy" action after reviewing the diff; rollback to previous version is one-click |

## References

- [`docs/spec/plans/milestones-roadmap.md`](https://github.com/litentry/agentKeys/blob/main/docs/spec/plans/milestones-roadmap.md) §5 (M4 scope)
- [`docs/research/agent-iam-strategy.md`](https://github.com/litentry/agentKeys/blob/main/docs/research/agent-iam-strategy.md) §5 Phase 4 — audit replay as regulator surface
- [`docs/arch.md`](https://github.com/litentry/agentKeys/blob/main/docs/arch.md) §15 (audit framing)
- #107 (MCP server — policy engine surface)
- #115 (audit dashboard — UI consumer of policy versions + diff reports)
- #113 (vendor portal — policy editor)
- #121 (delegation chains — replay must traverse them)
- #122 (approval workflows — replay tracks would-have-been-approval events)

## Effort

~1-2 weeks. Sequencing:

1. (Days 1-3) Policy versioning storage + version metadata + audit-row version stamp
2. (Days 3-6) Audit replay endpoint + decision re-evaluation engine
3. (Days 6-9) Diff report aggregation + UI rendering (extends #115)
4. (Days 9-11) Performance pass + caching + background-job pattern
5. (Days 11-14) Regulator export updates + acceptance tests

## Pickup notes for the next agent / developer

- Read [`milestones-roadmap.md` §5](https://github.com/litentry/agentKeys/blob/main/docs/spec/plans/milestones-roadmap.md) for M4 framing
- Policy versions are IMMUTABLE. Treat them like git commits — version_id is a hash, never edited; updates create a new version.
- Replay is **read-only**. Make this impossible to misuse by separating the API endpoints (replay can't accidentally call mint/revoke)
- The replay engine must NOT use an LLM. It's deterministic policy evaluation — same engine as #107's `permission.check`, just applied historically.
- Cross-cutting with #121 (delegation): policy versions affect delegated caps too. Test the delegation-chain replay scenario explicitly.
- **Watch for**: regulator-export format requirements vary by region (PIPL vs GDPR vs CCPA). Same as #115 — M4 covers PIPL; GDPR/CCPA when EU/US pilots demand.
- Use the `/agentkeys-issue-create` skill for follow-up issues (e.g., ML-suggested policy changes for M5)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 4: Policy versioning + audit replay #123

Context

Scope (M4)

Policy versioning

Audit replay endpoint

Use cases

Diff report

Out of scope (defer)

Acceptance criteria

Risks

References

Effort

Pickup notes for the next agent / developer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Risk	Mitigation
Replay computation is too slow under load	Background job for large replays; UI shows "computing…" with ETA; result cached for re-fetch
Policy version storage grows unbounded	Versions are immutable text blobs; cheap to store; retention policy: keep all for 3 years (regulator alignment)
Replay's simulated decision differs from actual decision in non-obvious ways (cross-version race conditions)	Replay treats the target policy as the SOLE policy for that run — no half-and-half states; documented
Vendor accidentally deploys a policy version that breaks all their devices	Vendor portal requires explicit "Deploy" action after reviewing the diff; rollback to previous version is one-click

Phase 4: Policy versioning + audit replay #123

Description

Context

Scope (M4)

Policy versioning

Audit replay endpoint

Use cases

Diff report

Out of scope (defer)

Acceptance criteria

Risks

References

Effort

Pickup notes for the next agent / developer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions