Skip to content

Commit a6417ca

Browse files
peteromalletclaude
andcommitted
chore(hardening-epic): checkpoint m0/m1/m3a completed work + m2 WIP
The chain driver was not committing milestone work, leaving all prior milestones' changes uncommitted in the worktree. m2's verifiability audit then flagged the 60 inherited uncommitted files as "unclaimed scope violations" and hard-blocked the milestone. Committing the completed work gives m2's audit a clean tree so its rework can converge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7de679d commit a6417ca

64 files changed

Lines changed: 18141 additions & 586 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/characterization-gate.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Characterization gate — M0
2+
3+
This gate freezes the observable behaviour of several core subsystems so that
4+
later refactors (e.g. the M5b CLI split, store backend changes, pipeline
5+
reworks) can detect regressions before they ship.
6+
7+
---
8+
9+
## Downstream milestone policy
10+
11+
1. **Keep the characterization suite green.**
12+
Every PR that touches the CLI, store, chain, evaluation, or planning pipeline
13+
must run the characterisation tests listed below. A red characterisation
14+
test is a regression unless the change is an *intentional, explained*
15+
behaviour change (see rule 2).
16+
17+
2. **Golden updates require a PR explanation.**
18+
CLI parser snapshots, pipeline golden fixtures, and store contract assertions
19+
may need updating when the team deliberately changes behaviour. Every golden
20+
update must be accompanied by a PR comment or commit message that explains
21+
*what* changed and *why* it is not a regression. Examples of acceptable
22+
reasons:
23+
24+
- A new subcommand or CLI flag was added.
25+
- A deprecated option was removed.
26+
- The pipeline state machine gained or lost a state.
27+
- A store backend intentionally changed error semantics.
28+
29+
Examples of *unacceptable* reasons (these are regressions):
30+
31+
- A renamed option that was not part of the intentional change.
32+
- A silently dropped positional argument.
33+
- A pipeline step that no longer produces an expected artifact.
34+
35+
3. **Regenerate goldens with `--write-fixture`.**
36+
When an intentional golden update is needed, regenerate the fixture by
37+
running the relevant test with the `--write-fixture` flag:
38+
39+
```bash
40+
# CLI parser snapshot
41+
python -m pytest tests/characterization/test_cli_parser_snapshot.py \
42+
-k test_generate_fixture --write-fixture
43+
44+
# Pipeline golden fixtures
45+
python -m pytest tests/characterization/test_pipeline_golden.py \
46+
-k test_generate_fixtures --write-fixture
47+
```
48+
49+
Commit both the changed test code and the regenerated fixture in the same
50+
PR.
51+
52+
---
53+
54+
## Focused pytest targets added by this milestone
55+
56+
Run the full characterisation module to verify the gate:
57+
58+
```bash
59+
python -m pytest tests/characterization/ -v
60+
```
61+
62+
You can also target individual test classes or files. The full list of
63+
targets introduced by this milestone:
64+
65+
### Import-surface tests (`tests/characterization/test_import_surface.py`)
66+
67+
| Test | Purpose |
68+
|------|---------|
69+
| `TestStoreImportSurface::test_all_symbols_resolve` | Every `megaplan.store.__all__` symbol resolves |
70+
| `TestWorkersImportSurface::test_all_symbols_resolve` | Every `megaplan.workers.__all__` symbol resolves |
71+
| `TestCliImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.cli` (test-imported symbols) |
72+
| `TestChainImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.chain` (including private helpers used by tests) |
73+
| `TestChainImportSurface::test_remote_exec_guard_callable_or_class` | Remote-exec guard symbols `_capture_sync_state`, `ChainState`, `save_chain_state`, `load_chain_state` have expected types |
74+
| `TestEvaluationImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.orchestration.evaluation` |
75+
76+
### CLI parser snapshot (`tests/characterization/test_cli_parser_snapshot.py`)
77+
78+
| Test | Purpose |
79+
|------|---------|
80+
| `TestCliParserSnapshot::test_snapshot_matches_fixture` | `build_parser()` output matches committed JSON fixture |
81+
| `TestCliParserSnapshot::test_lazy_subcommands_are_passthrough_only` | Cloud/resident/bakeoff are `REMAINDER` passthrough entries (documented limitation) |
82+
| `TestCliParserSnapshot::test_root_parser_has_expected_top_level_options` | Sanity: `--actor`, `--backend`, command subparser |
83+
| `TestCliParserSnapshot::test_at_least_expected_subcommands_exist` | Key subcommands are present in the tree |
84+
| `TestCliParserSnapshot::test_nested_subcommands_present` | Nested trees (e.g. `epic snapshot`, `config profiles list`) exist |
85+
| `TestCliParserSnapshot::test_all_option_strings_are_sorted` | Deterministic JSON output invariant |
86+
| `TestCliParserSnapshot::test_fixture_is_readable_json` | Fixture parses as valid JSON |
87+
88+
### Pipeline golden tests (`tests/characterization/test_pipeline_golden.py`)
89+
90+
| Test | Purpose |
91+
|------|---------|
92+
| `TestPipelineGolden::test_fresh_run_matches_fixture` | Full `init → plan/finalize → execute → review/done` pipeline |
93+
| `TestPipelineGolden::test_resume_after_finalize_matches_fixture` | Halt after `finalize`, reread `state.json`, resume to `done` |
94+
95+
### Store contract tests (shared, run across backends)
96+
97+
| Test | Purpose |
98+
|------|---------|
99+
| `tests/test_file_store.py::test_file_store_contract` | `FileStore` fulfils the shared store contract |
100+
| `tests/test_multi_store.py::test_multi_store_contract` | `MultiStore` (two `FileStore` backends) fulfils the shared contract with `home_backend='db'` routing |
101+
| `tests/test_db_store.py::test_db_store_contract` | `DBStore` fulfils the shared contract (skipped unless `--backend db` + `SUPABASE_DB_URL`) |
102+
103+
---
104+
105+
## Quick smoke test
106+
107+
To confirm the gate is healthy after any change:
108+
109+
```bash
110+
python -m pytest tests/characterization/ -v
111+
```
112+
113+
Expected: all active tests pass (2 fixture-generation tests skip without
114+
`--write-fixture`). If any test fails, determine whether the change was
115+
intentional (see rule 2 above) or a regression that must be fixed before
116+
merge.

0 commit comments

Comments
 (0)