-
Notifications
You must be signed in to change notification settings - Fork 0
Sprint 03 Planner and capabilities
This page was migrated from the paxman repositorys docs/sprints/ folder as part of the Sprint 11 repo springclean. The original git history is preserved in the paxman repo (commit 3121eb2 and earlier).
Duration: 2 weeks Goal: Implement the field-centric planner (deterministic, rule-based) and the first 3 capabilities (
text_extraction,regex_extraction,validation). The Executor is not yet wired — this sprint ends with a planner that produces anExecutionPlanbut does not execute it. Status: Sprint 3 closes PRD §10 V1 Acceptance Criteria item 1.2.partial (3 of 5 V1 capabilities) and §1.3 partial (planner/ subsystem).
-
field_plan.py—FieldPlan,FieldPlanStep,ExecutionPlandata models -
input_profile.py—InputProfiledata model +make_profile(input) -> InputProfile(per Sprint 0 spec) -
scoring.py— candidate cost/confidence/coverage scoring (usingCostHintfrom Sprint 0) -
heuristics.py— 7-step heuristic ordering rules (perARCHITECTURE.md§4.2) -
policies.py— budget/accuracy/fallback policy application -
_registry.py— internalCapabilityRegistryhandle -
planner.py— top-levelplan(canonical, profile, budget, policy, registry) -> ExecutionPlan
-
base.py—CapabilityProtocol -
spec.py—CapabilitySpecdata model (withCostHintfrom Sprint 0) -
result.py—CapabilityResult,Candidate,EvidenceRef,Diagnostic(noconfidencefield per ADR-0005) -
registry.py— capability lookup, version management -
v1/text_extraction.py—text/plainandtext/htmlonly in V1; provider SPI for OCR -
v1/regex_extraction.py— ECMAScript regex with named groups -
v1/validation.py— type/range/regex/enum/reference constraint checks - Stub
InferenceProvider(incapabilities/v1/inference.py— interface only, no real provider in this sprint)
- Unit tests for every module above
- Property tests: planner determinism (Hypothesis,
derandomize=True) - Capability tests: known-input for each of the 3 capabilities
-
test_capability_result_has_no_confidence()— static check thatCapabilityResulthas noconfidencefield (per ADR-0005)
-
import-lintercontract:planner/andcapabilities/may NOT import fromexecutor/,reconciler/,artifact/, orapi/
- Executor (Sprint 4).
- Reconciler (Sprint 5).
- Artifact + API (Sprint 6).
-
lookupandinferencecapabilities (Sprint 4 —lookupis a data-source concern;inferencerequires Executor context). -
No real inference provider — only the SPI and a stub. Real providers (OpenAI, Anthropic) are V2 per
EXTENDING.md§3.4.
| ID | Deliverable | Effort (id-ed) |
|---|---|---|
| D3.1 |
planner/field_plan.py — data models |
2.0 |
| D3.2 |
planner/input_profile.py — per Sprint 0 spec |
2.0 |
| D3.3 |
planner/scoring.py — uses Sprint 0 CostHint values |
2.0 |
| D3.4 |
planner/heuristics.py — 7-step ordering |
3.0 |
| D3.5 |
planner/policies.py — budget/policy application |
2.0 |
| D3.6 |
planner/_registry.py — registry handle |
1.0 |
| D3.7 |
planner/planner.py — top-level plan()
|
3.0 |
| D3.8 |
capabilities/base.py — Capability Protocol |
1.0 |
| D3.9 |
capabilities/spec.py — CapabilitySpec
|
1.0 |
| D3.10 |
capabilities/result.py — CapabilityResult (no confidence field) |
1.0 |
| D3.11 |
capabilities/registry.py — versioned registry |
2.0 |
| D3.12 | capabilities/v1/text_extraction.py |
3.0 |
| D3.13 | capabilities/v1/regex_extraction.py |
2.0 |
| D3.14 | capabilities/v1/validation.py |
2.0 |
| D3.15 |
capabilities/v1/inference.py (SPI + stub provider only) |
1.0 |
| D3.16 | Unit tests for all planner modules (≥1 per public function) | 3.0 |
| D3.17 | Unit tests for all 3 capabilities (known-input, determinism, spec) | 3.0 |
| D3.18 | Property tests: planner determinism (Hypothesis, 100 examples) | 1.5 |
| D3.19 | Static test: CapabilityResult has no confidence attribute |
0.2 |
| D3.20 |
import-linter contracts for planner/ and capabilities/
|
0.5 |
| D3.21 |
docs/concepts/planning.md (skeleton) |
1.0 |
Total: ~35.2 id-ed. Sized for 4 engineers × 2 weeks with parallel work (planner 2 ppl + 2 capability authors + 1 test lead).
| Type | Item | Notes |
|---|---|---|
| People | 4 engineers (1 senior, 3 mid-level) | 2 on planner, 2 on capabilities |
| Tools | Hypothesis (dev dep), all Sprint 1 + 2 deps | Standard Python dev env |
| Tests | Sprint 1 + 2 test infrastructure; fixture contracts (Pydantic + Dict DSL) from Sprint 2 | Done |
| Decisions | Sprint 0 CostHint values; InputProfile spec |
Both from Sprint 0 |
| Docs |
ADR-0001 (field-centric), ADR-0002 (rule-based planner), ADR-0005 (confidence ownership), ARCHITECTURE.md §4.2, §4.3 |
Read by all engineers |
| Tool | Version | Purpose | Notes |
|---|---|---|---|
| hypothesis | ≥ 6.0 | Planner determinism property tests | Already dev dep |
| re (stdlib) | — | ECMAScript regex for regex_extraction
|
Python's re is ECMAScript-compatible enough for V1 |
| html.parser (stdlib) | — |
text/html parsing for text_extraction
|
Stdlib only |
| BeautifulSoup4 | latest | Optional, for richer text/html extraction |
Defer to V2; use stdlib for V1 |
None. The stub inference provider returns hard-coded completions; no real provider is wired.
-
planner.plan(canonical_contract, profile, budget, policy, registry) -> ExecutionPlanis a pure function (no clock, no random, no I/O). - Property test: for 100 random
(canonical, profile, budget, policy, registry)tuples, the same tuple produces byte-equalExecutionPlanJSON across two calls. - The 7-step heuristic ordering is implemented: explicit evidence → local deterministic → structured lookup → derived computation → local inference → remote inference →
UNRESOLVED. Note (per Oracle M7): "explicit evidence" in step 1 is a planner rule (deciding whether to skip capability invocation if the input already contains the value); it does not require atext_extractioncapability to detect it — the planner checks theInputProfile(from Sprint 0 spec) for pre-extracted evidence. The planner'sscoring.pyis the module that implements this rule. - The Planner excludes remote inference when
Policy.allow_remote_inference=False(heuristic step 6 dropped). - The Planner excludes local inference when
Policy.allow_local_inference=False(heuristic step 5 dropped). -
text_extractioncapability handlestext/plainandtext/htmlinputs (≥1 unit test each). -
regex_extractioncapability extracts with named groups (≥1 unit test, including a multi-group pattern). -
validationcapability checks type, range, regex, enum (≥1 unit test each). -
CapabilityResultdoes NOT have aconfidencefield (static test usinghasattr/getattr). -
CapabilityResult.candidatesare returned withvalue(not yetconfidence). - Test coverage on
planner/≥ 90%, oncapabilities/v1/{text_extraction,regex_extraction,validation}.py≥ 85% (V1 acceptance §2.2). -
mypy --strict src/paxman/{planner,capabilities}is clean. -
import-linteris clean for both subsystems. -
make ciis green. -
docs/concepts/planning.mdexists as a skeleton (will be filled in Sprint 8).
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Planner determinism is broken by a hidden source of non-determinism (e.g., set iteration order) | Medium | High | Property test catches it. Mitigation: explicitly sort registry.capabilities by id before iteration. Pin Python's PYTHONHASHSEED=0 in CI. |
InputProfile is misclassified (e.g., a 10MB HTML blob is misidentified as text/plain) |
Medium | Medium | Sprint 0 spec defines classification algorithm; cover with ≥5 test cases (empty, plain, html, base64, binary). |
| The 7-step heuristic is too rigid for the V1 use cases (e.g., always picking regex over inference even when the field is poorly-suited) | Medium | Medium | Property test that, for the "invoice from a well-known vendor" test case, the planner picks regex_extraction first. Document the heuristic as a default; per-field ResolutionPolicy override exists in CanonicalField (Sprint 2 already created the field). |
text_extraction's provider SPI is over-engineered (no real provider yet) |
Medium | Low | Keep the provider interface minimal: def extract(input) -> str. Stub provider returns input for text/plain, uses html.parser for text/html. |
regex_extraction named-group support is incomplete (e.g., not handling duplicate group names) |
Low | Low | Document the regex flavor: ECMAScript, single named group per pattern in V1. Reject (?P<name>...)(?P<name>...). |
validation capability's reference-constraint check (e.g., "total == sum(line_items)") is not in scope but might be misread as in scope |
Low | Low | Reference constraints are post-V1 (per EXTENDING.md). Validate only type, range, regex, enum in V1. |
| Stub inference provider accidentally drifts toward real-provider behavior | Low | Low | Stub is one class with one method that returns Completion(text="...", model="stub", usage=Usage(tokens=0, ...)). Add a test that the stub never makes network calls. |
-
../V1_ACCEPTANCE_CRITERIA.md§1.2, §1.3. -
../PACKAGE_STRUCTURE.md§4 —planner/module spec. -
../PACKAGE_STRUCTURE.md§5 —capabilities/module spec. -
../docs/adr/0001-field-centric-planning.md. -
../docs/adr/0002-rule-based-planner-v1.md. -
../docs/adr/0005-confidence-ownership.md. -
../ARCHITECTURE.md§4.2, §4.3. -
../EXTENDING.md§2 — Capability SPI. -
../TESTING_STRATEGY.md§7 — capability tests.