-
Notifications
You must be signed in to change notification settings - Fork 0
Sprint 00 Design closure
This page was migrated from the paxman repositorys docs/sprints/ folder as part of the Sprint 11 repo springclean. The original git history is preserved in the paxman repo (commit 3121eb2 and earlier).
Duration: 1 week Goal: Close the 3 design gaps that block Sprint 1, plus resolve the license decision (currently TBD). Status: This sprint produces specification documents only — no code, no CI yet.
Three concrete gaps were identified by auditing the existing documentation (see CHANGES_LOG.md and the audit summary at the end of this document). These gaps block the implementation of the first sprint:
-
Dict DSL syntax is unspecified — referenced ~20 times in the docs as the "escape hatch" and "test source of truth" (per
EXTENDING.md§1 andPRD.md§5.1), but the concrete syntax of the DSL is not defined anywhere. -
InputProfilehas no module spec —planner/input_profile.pyis listed inPACKAGE_STRUCTURE.md§4.2 with the description "lightweight input classifier (no capability invocation)" but the data model and the construction logic are not described. -
CostHintvalues are undefined —CapabilitySpec.cost_estimateperARCHITECTURE.md§4.3 takes aCostHint(tokens, ms, usd)but the docs do not specify the cost model that the planner'sscoring.pywill use to compare capabilities.
Plus:
-
License decision is TBD —
README.md§License says "MIT (or Apache-2.0 — final TBD by the team)." This blocksLICENSEfile creation in Sprint 1.
- 4 specification documents created in
docs/specs/(per Oracle review M2 — consistent with the project-widedocs/taxonomy, not nested underdocs/sprints/):dict-dsl-spec.mdinput-profile-spec.mdcapability-cost-model.mdlicense-decision.md
- An ADR (
docs/adr/0008-license-decision.md) recording the license choice. - A new ADR template if Dict DSL or
InputProfileare deemed ADR-worthy by the team (the project owner decides).
-
No source code. No
src/paxman/yet. Nopyproject.tomlyet. This sprint is design-closure only. - No GitHub Actions setup yet. Sprint 1 introduces CI.
- No fixture contracts yet. Sprint 2 introduces Pydantic + Dict DSL fixture contracts.
| ID | Deliverable | Location | Format |
|---|---|---|---|
| D0.1 | Dict DSL syntax specification | docs/specs/dict-dsl-spec.md |
Markdown; BNF-like grammar + 3 worked examples + 5 edge cases |
| D0.2 | Input Profile module spec | docs/specs/input-profile-spec.md |
Markdown; InputProfile data model + make_profile(input) -> InputProfile algorithm |
| D0.3 | Capability cost model | docs/specs/capability-cost-model.md |
Markdown; CostHint values for all 5 V1 capabilities + scoring rubric |
| D0.4 | License decision document | docs/specs/license-decision.md |
Markdown; MIT vs Apache-2.0 trade-off analysis + recommendation |
| D0.5 | License ADR | docs/adr/0008-license-decision.md |
MADR 4.0 template |
| D0.6 | (Optional) ADR for Dict DSL if deemed architectural | docs/adr/0009-dict-dsl-v1.md |
MADR 4.0 (optional; project owner decides) |
| Type | Item | Notes |
|---|---|---|
| People | 1 senior engineer (or 1 lead + 1 reviewer) | 1 week, full-time |
| Decisions | Project owner available for license decision | Cannot be deferred |
| Docs |
GLOSSARY.md, EXTENDING.md, PACKAGE_STRUCTURE.md
|
Already in repo |
| Tools | None (this sprint produces only Markdown) | Markdown editor |
None required. This sprint is documentation only. No Python environment, no CI, no editor beyond Markdown.
None.
-
docs/specs/dict-dsl-spec.mdexists, has BNF grammar + ≥3 worked examples, and is reviewed by ≥1 other engineer. -
docs/specs/input-profile-spec.mdexists, has theInputProfiledata model andmake_profile()algorithm, and is reviewed. -
docs/specs/capability-cost-model.mdexists with explicitCostHint(tokens, ms, usd)for all 5 V1 capabilities (text_extraction,regex_extraction,lookup,inference,validation), reviewed. -
docs/adr/0008-license-decision.mdis in the MADR format and has Status: Accepted. -
docs/specs/license-decision.mdrecords the rationale. - (Optional, if project owner decides) Dict DSL ADR exists.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| License decision takes longer than 1 week | Low | High | Pre-brief the project owner on the trade-off; have a recommended default in the decision document. |
| Dict DSL spec becomes too complex (second-system effect) | Medium | High | Keep V1 Dict DSL ≤ 5 concepts; reject references and inheritance (YAGNI). |
InputProfile spec becomes a research project |
Medium | Medium | Cap at 5 fields. Defer content classification (e.g., "is this a financial document?") to V2. |
CostHint values become arbitrary |
High | Medium | Cost model is for planner scoring, not accounting. Round numbers are fine. Document that the values are heuristics, not measurements. |
| Spec reviews cause scope creep | Medium | Medium | Reviewers can suggest, not require. Spec is owned by the author; changes go through an ADR. |
The full audit is recorded in CHANGES_LOG.md §"Documentation gaps identified". Briefly:
-
Dict DSL syntax —
EXTENDING.md§1.3 example uses a genericto_canonical_field()placeholder, not real Dict DSL syntax. The Dict DSL is the only internal escape hatch and the test source of truth; Sprint 2 cannot write fixture contracts without it. -
Input Profile —
PACKAGE_STRUCTURE.md§4.2 listsplanner/input_profile.pywith one-line description; no data model. The planner depends on it. -
CostHintvalues —ARCHITECTURE.md§4.3 defines the type but no values. The planner's heuristic ordering needs to compareregex_extraction(fast, free) vsinference(slow, expensive) — without numbers, the planner cannot rank them.
-
CHANGES_LOG.md— full record of documentation changes from this planning exercise. -
../docs/adr/README.md— MADR 4.0 template for new ADRs. -
../EXTENDING.md§1 — adapter SPI, which Dict DSL implements. -
../ARCHITECTURE.md§4.3 —CapabilitySpecwithCostHint. -
../PACKAGE_STRUCTURE.md§4.2 —planner/input_profile.pymention.