-
Notifications
You must be signed in to change notification settings - Fork 0
Sprint 04 Executor and capabilities
This page was migrated from the paxman repositorys docs/sprints/ folder as part of the Sprint 11 repo springclean. The original git history is preserved in the paxman repo (commit 3121eb2 and earlier).
Duration: 2 weeks Goal: Implement the Executor (sequential, deterministic runner) and finish the remaining 2 V1 capabilities (
lookup,inferencewith a real-enough stub provider). End of sprint:paxman.normalize(text, contract)produces anExecutionArtifactwith candidates + evidence but no confidence (Reconciler is Sprint 5). Status: This is the sprint where the end-to-end pipeline becomes functional — input → plan → execute → candidates. Sprint 5 adds the final layer (confidence).
-
execution_state.py— transient in-flight state -
context.py—CapabilityContextbuilder -
evidence.py— evidence + diagnostics collection -
budget_tracker.py— cost/latency/invocations tracking -
early_stop.py— short-circuit when capability chain is exhausted (V1: no real early-stop based on confidence — that requires Reconciler) -
field_runner.py— executes oneFieldPlanend-to-end -
executor.py— top-levelrun(plan, contract, registry, input) -> CandidateResult[]
-
lookup.py— structured retrieval (deterministic in-memory dict backend in V1) -
inference.py— model-backed extraction, with a functional stub provider (not a real network call) that simulates non-determinism for testing
-
src/paxman/contract/adapters/openapi.py— OpenAPI 3.x (best-effort per ADR-0007). Delegates schema parsing to the JSON Schema adapter for schema objects; handles OpenAPI-specific constructs (requestBody,parameters,components/schemas).
- Unit tests for all Executor modules
- Integration test:
executor.run(plan, contract, registry, input) -> CandidateResult[]for a 3-field plan - Property test: Executor determinism (given the same plan and capabilities, same
CandidateResult[]byte-equal) -
lookupandinferencecapability unit tests - OpenAPI adapter: ≥1 test using
petstore_3_0.yaml(vendored intests/fixtures/contracts/openapi/) - Budget tests:
executor.runshort-circuits whenBudget.max_total_cost_usdis exceeded
-
import-lintercontract:executor/may NOT import fromreconciler/,artifact/, orapi/
- Reconciler (Sprint 5) — this is where confidence gets assigned.
-
Artifact (the final object) (Sprint 6) — Sprint 4 produces
CandidateResult[], notExecutionArtifact. -
paxman.normalize()API (Sprint 6) — Sprint 4 wires the Executor but the public API orchestration is in Sprint 6. - OpenAPI 3.1 full coverage (deferred to V2; ADR-0007 best-effort).
-
Real inference providers (OpenAI, Anthropic) — V2 per
EXTENDING.md§3.4. The stub provider simulates non-determinism by returning a random string from a fixed list.
| ID | Deliverable | Effort (id-ed) |
|---|---|---|
| D4.1 | executor/execution_state.py |
1.0 |
| D4.2 | executor/context.py |
1.0 |
| D4.3 | executor/evidence.py |
1.0 |
| D4.4 | executor/budget_tracker.py |
2.0 |
| D4.5 |
executor/early_stop.py (Sprint 4: just "chain exhausted" logic; confidence-based early stop is post-V1) |
1.0 |
| D4.6 | executor/field_runner.py |
2.0 |
| D4.7 | executor/executor.py |
2.0 |
| D4.8 |
capabilities/v1/lookup.py (deterministic in-memory dict backend) |
2.0 |
| D4.9 |
capabilities/v1/inference.py (with StubInferenceProvider) |
3.0 |
| D4.10 |
contract/adapters/openapi.py (best-effort) |
3.0 |
| D4.11 |
tests/fixtures/contracts/openapi/petstore_3_0.yaml (vendored or hand-written excerpt) |
0.5 |
| D4.12 | Unit tests for all Executor modules | 3.0 |
| D4.13 | Integration test: 3-field plan end-to-end through Executor | 1.5 |
| D4.14 | Property tests: Executor determinism | 1.0 |
| D4.15 | Budget tests: short-circuit on cost | 1.0 |
| D4.16 |
lookup capability tests |
1.0 |
| D4.17 |
inference capability tests (including stub provider) |
1.0 |
| D4.18 | OpenAPI adapter tests (petstore 3.0 only) | 1.0 |
| D4.19 |
import-linter contract for executor/
|
0.5 |
Total: ~27.5 id-ed. Sized for 4 engineers × 2 weeks (2 on executor, 1 on capabilities, 1 on OpenAPI + tests).
| Type | Item | Notes |
|---|---|---|
| People | 4 engineers (2 senior, 2 mid-level) | Executor is hard; needs experienced engineers |
| Tools | All Sprint 1-3 deps; OpenAPI test fixture | Standard Python dev env |
| Tests | Sprint 3 capabilities (text_extraction, regex_extraction, validation) and planner | Done |
| Docs |
ARCHITECTURE.md §4.4, EXTENDING.md §3.3, EXTENDING.md §3.5 (provider MUST/MUST NOT) |
Read carefully — the provider SPI has subtle rules |
| Tool | Version | Purpose | Notes |
|---|---|---|---|
| openapi-spec-validator | ≥ 0.6 | OpenAPI 3.x schema validation | pip install paxman[openapi] |
| PyYAML | ≥ 6.0 | YAML parsing for OpenAPI fixtures | Stdlib has no YAML |
| hypothesis | ≥ 6.0 | Property tests (already dev dep) |
None. The stub inference provider returns hard-coded completions; no real provider is wired.
-
executor.run(plan, contract, registry, input) -> CandidateResult[]works end-to-end. - Sequential execution is verified: capabilities are invoked in plan order, one field at a time.
- The Executor walks field plans in declaration order (not in dict-iteration order).
- The Executor short-circuits when the
Budget.max_total_cost_usdis exceeded (returns what was resolved with aBUDGET_EXCEEDEDdiagnostic). - The Executor collects evidence for every capability invocation.
- The Executor returns explicit
UNRESOLVEDcandidates when a field's capability chain is exhausted without producing a candidate. - The Executor never assigns confidence (static test:
Candidatehas noconfidencefield). -
lookupcapability: deterministic in-memory dict backend works; same input → same output. -
inferencecapability: stub provider returns aCompletionwith text + model + usage; the artifact records the model id and version in evidence. - OpenAPI adapter: at least the
petstore_3_0.yamlsmoke test produces aCanonicalContract. - Test coverage on
executor/≥ 90%, oncapabilities/v1/{lookup,inference}.py≥ 85%. -
mypy --strict src/paxman/{executor,capabilities/v1,contract/adapters/openapi}is clean. -
import-linteris clean. -
make ciis green.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Executor non-determinism from dict iteration order | Medium | High | Sort all plan/registry iterations by id or field_id. Property test catches regressions. |
Decimal for cost. Property test that cost tracking is exact for round numbers. |
|||
| The stub inference provider accidentally returns the same completion on every call (defeats the test for non-determinism) | Medium | Medium | Stub provider cycles through a fixed list of 3 strings, or accepts a --seed parameter for tests. Document the stub as a test-only utility. |
| OpenAPI adapter scope creep (trying to support all of OpenAPI 3.1) | High | Medium | Hard cap: only support what petstore_3_0.yaml exercises. Defer oneOf/anyOf/$ref resolution to V2. Document limitations in EXTENDING.md. |
| OpenAPI adapter delegates to JSON Schema adapter, creating an import-linter violation | High | High | The OpenAPI adapter is in contract/, which may import from contract/adapters/. The JSON Schema adapter is also in contract/adapters/. No DAG violation — they're siblings. Verify with import-linter in CI. |
| Sequential execution is "too slow" for large contracts (e.g., 1000-field plan takes >30s) | Low | Low | Performance is a Sprint 9 concern. Sprint 4 only needs correctness. |
The InferenceProvider Protocol is too rigid (real providers can't implement it without hacks) |
Medium | Medium | Keep the protocol minimal: def complete(request) -> Completion. Provider-specific features (e.g., streaming, function calling) are V2. |
-
../V1_ACCEPTANCE_CRITERIA.md§1.1 (OpenAPI), §1.2 (lookup, inference), §1.3 (executor). -
../PACKAGE_STRUCTURE.md§6 —executor/module spec. -
../docs/adr/0006-sequential-execution-v1.md— V1 is sequential. -
../ARCHITECTURE.md§4.4 — Executor responsibilities. -
../EXTENDING.md§2 (capability), §3 (inference provider). -
../TESTING_STRATEGY.md§12.2 — Executor test pattern. -
../tests/fixtures/contracts/README.md— petstore_3_0.yaml planned.