|
| 1 | +# Characterization gate — M0 |
| 2 | + |
| 3 | +This gate freezes the observable behaviour of several core subsystems so that |
| 4 | +later refactors (e.g. the M5b CLI split, store backend changes, pipeline |
| 5 | +reworks) can detect regressions before they ship. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Downstream milestone policy |
| 10 | + |
| 11 | +1. **Keep the characterization suite green.** |
| 12 | + Every PR that touches the CLI, store, chain, evaluation, or planning pipeline |
| 13 | + must run the characterisation tests listed below. A red characterisation |
| 14 | + test is a regression unless the change is an *intentional, explained* |
| 15 | + behaviour change (see rule 2). |
| 16 | + |
| 17 | +2. **Golden updates require a PR explanation.** |
| 18 | + CLI parser snapshots, pipeline golden fixtures, and store contract assertions |
| 19 | + may need updating when the team deliberately changes behaviour. Every golden |
| 20 | + update must be accompanied by a PR comment or commit message that explains |
| 21 | + *what* changed and *why* it is not a regression. Examples of acceptable |
| 22 | + reasons: |
| 23 | + |
| 24 | + - A new subcommand or CLI flag was added. |
| 25 | + - A deprecated option was removed. |
| 26 | + - The pipeline state machine gained or lost a state. |
| 27 | + - A store backend intentionally changed error semantics. |
| 28 | + |
| 29 | + Examples of *unacceptable* reasons (these are regressions): |
| 30 | + |
| 31 | + - A renamed option that was not part of the intentional change. |
| 32 | + - A silently dropped positional argument. |
| 33 | + - A pipeline step that no longer produces an expected artifact. |
| 34 | + |
| 35 | +3. **Regenerate goldens with `--write-fixture`.** |
| 36 | + When an intentional golden update is needed, regenerate the fixture by |
| 37 | + running the relevant test with the `--write-fixture` flag: |
| 38 | + |
| 39 | + ```bash |
| 40 | + # CLI parser snapshot |
| 41 | + python -m pytest tests/characterization/test_cli_parser_snapshot.py \ |
| 42 | + -k test_generate_fixture --write-fixture |
| 43 | + |
| 44 | + # Pipeline golden fixtures |
| 45 | + python -m pytest tests/characterization/test_pipeline_golden.py \ |
| 46 | + -k test_generate_fixtures --write-fixture |
| 47 | + ``` |
| 48 | + |
| 49 | + Commit both the changed test code and the regenerated fixture in the same |
| 50 | + PR. |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Focused pytest targets added by this milestone |
| 55 | + |
| 56 | +Run the full characterisation module to verify the gate: |
| 57 | + |
| 58 | +```bash |
| 59 | +python -m pytest tests/characterization/ -v |
| 60 | +``` |
| 61 | + |
| 62 | +You can also target individual test classes or files. The full list of |
| 63 | +targets introduced by this milestone: |
| 64 | + |
| 65 | +### Import-surface tests (`tests/characterization/test_import_surface.py`) |
| 66 | + |
| 67 | +| Test | Purpose | |
| 68 | +|------|---------| |
| 69 | +| `TestStoreImportSurface::test_all_symbols_resolve` | Every `megaplan.store.__all__` symbol resolves | |
| 70 | +| `TestWorkersImportSurface::test_all_symbols_resolve` | Every `megaplan.workers.__all__` symbol resolves | |
| 71 | +| `TestCliImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.cli` (test-imported symbols) | |
| 72 | +| `TestChainImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.chain` (including private helpers used by tests) | |
| 73 | +| `TestChainImportSurface::test_remote_exec_guard_callable_or_class` | Remote-exec guard symbols `_capture_sync_state`, `ChainState`, `save_chain_state`, `load_chain_state` have expected types | |
| 74 | +| `TestEvaluationImportSurface::test_surveyed_symbols_resolve` | De-facto public surface for `megaplan.orchestration.evaluation` | |
| 75 | + |
| 76 | +### CLI parser snapshot (`tests/characterization/test_cli_parser_snapshot.py`) |
| 77 | + |
| 78 | +| Test | Purpose | |
| 79 | +|------|---------| |
| 80 | +| `TestCliParserSnapshot::test_snapshot_matches_fixture` | `build_parser()` output matches committed JSON fixture | |
| 81 | +| `TestCliParserSnapshot::test_lazy_subcommands_are_passthrough_only` | Cloud/resident/bakeoff are `REMAINDER` passthrough entries (documented limitation) | |
| 82 | +| `TestCliParserSnapshot::test_root_parser_has_expected_top_level_options` | Sanity: `--actor`, `--backend`, command subparser | |
| 83 | +| `TestCliParserSnapshot::test_at_least_expected_subcommands_exist` | Key subcommands are present in the tree | |
| 84 | +| `TestCliParserSnapshot::test_nested_subcommands_present` | Nested trees (e.g. `epic snapshot`, `config profiles list`) exist | |
| 85 | +| `TestCliParserSnapshot::test_all_option_strings_are_sorted` | Deterministic JSON output invariant | |
| 86 | +| `TestCliParserSnapshot::test_fixture_is_readable_json` | Fixture parses as valid JSON | |
| 87 | + |
| 88 | +### Pipeline golden tests (`tests/characterization/test_pipeline_golden.py`) |
| 89 | + |
| 90 | +| Test | Purpose | |
| 91 | +|------|---------| |
| 92 | +| `TestPipelineGolden::test_fresh_run_matches_fixture` | Full `init → plan/finalize → execute → review/done` pipeline | |
| 93 | +| `TestPipelineGolden::test_resume_after_finalize_matches_fixture` | Halt after `finalize`, reread `state.json`, resume to `done` | |
| 94 | + |
| 95 | +### Store contract tests (shared, run across backends) |
| 96 | + |
| 97 | +| Test | Purpose | |
| 98 | +|------|---------| |
| 99 | +| `tests/test_file_store.py::test_file_store_contract` | `FileStore` fulfils the shared store contract | |
| 100 | +| `tests/test_multi_store.py::test_multi_store_contract` | `MultiStore` (two `FileStore` backends) fulfils the shared contract with `home_backend='db'` routing | |
| 101 | +| `tests/test_db_store.py::test_db_store_contract` | `DBStore` fulfils the shared contract (skipped unless `--backend db` + `SUPABASE_DB_URL`) | |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Quick smoke test |
| 106 | + |
| 107 | +To confirm the gate is healthy after any change: |
| 108 | + |
| 109 | +```bash |
| 110 | +python -m pytest tests/characterization/ -v |
| 111 | +``` |
| 112 | + |
| 113 | +Expected: all active tests pass (2 fixture-generation tests skip without |
| 114 | +`--write-fixture`). If any test fails, determine whether the change was |
| 115 | +intentional (see rule 2 above) or a regression that must be fixed before |
| 116 | +merge. |
0 commit comments