-
Notifications
You must be signed in to change notification settings - Fork 0
Sprint 06 Artifact and API
Azahari Zaman edited this page Jun 27, 2026
·
1 revision
This page was migrated from the paxman repositorys docs/sprints/ folder as part of the Sprint 11 repo springclean. The original git history is preserved in the paxman repo (commit 3121eb2 and earlier).
Duration: 2 weeks Goal: Implement the Artifact subsystem (the product + replay source) and the public API (
paxman.normalize()andpaxman.replay()). End of sprint: the full pipeline works end-to-end with realpaxman.normalize()returning a realExecutionArtifactthat can be replayed. Status: This is the sprint that produces the first v0.1.0-alpha internally usable build. Still pre-release (no PyPI publish), but callable from a script.
-
artifact.py—ExecutionArtifact,FieldResultdata models (the final output bundle) -
confidence.py— confidence band mapping (float ↔CERTAIN/HIGH/MEDIUM/LOW/UNTRUSTED) -
evidence.py— evidence references + provenance -
diagnostics.py— structured diagnostics -
statistics.py— execution statistics -
serializer.py— stable JSON encoding for the artifact (delegates tocross-cutting/serialization.py) -
_hash.py— replay hash internals (SHA-256) -
replay.py— replay hash computation + rehydration + version checks
-
types.py— re-exports:CanonicalContract,CanonicalField,FieldType,Status,ConfidenceBand,ResolutionPolicy,Budget,Policy,ExecutionArtifact,CurrencyPolicy -
errors.py— public error re-exports (all 13 classes fromcross-cutting/errors.py) -
protocols.py— public SPIs:ContractAdapter,Capability -
registry.py— publicregister_adapter(),register_capability() -
version.py—__version__string -
normalize.py— top-level orchestration:paxman.normalize(input_data, contract, budget=None, policy=None) -> ExecutionArtifact -
replay.py— top-levelpaxman.replay(artifact, contract) -> ExecutionArtifact
- Re-export public surface from
api/* - Re-export
__version__ - Tiny: ≤ 30 lines
- Unit tests for all Artifact modules
- Unit tests for all
api/modules -
First end-to-end smoke test:
paxman.normalize(text, InvoiceContract)returns anExecutionArtifact - Replay equality:
paxman.replay(artifact, contract) == artifact(byte-equal JSON) - Replay tamper detection: modified artifact raises
HashMismatchError - Replay version mismatch: wrong Paxman version raises
VersionMismatchError - Public API snapshot test:
tests/public_api/test_public_api.pyfails if anything new is added without an ADR
-
import-lintercontract:artifact/may NOT import fromapi/;api/may import from any layer -
banditsecurity scan clean -
pip-auditclean
- Property tests (Sprint 7) — Sprint 6 has unit tests and 1 end-to-end smoke.
-
Hypothesis strategies (Sprint 7) —
paxman.testingstrategies module. - Performance optimization (Sprint 9) — Sprint 6 is correctness-only.
- Golden artifacts (Sprint 7) — Sprint 6 produces one smoke artifact; Sprint 7 bootstraps the full set.
- PyPI publish (Sprint 10).
| ID | Deliverable | Effort (id-ed) |
|---|---|---|
| D6.1 |
artifact/artifact.py — ExecutionArtifact, FieldResult
|
2.0 |
| D6.2 |
artifact/confidence.py — band mapping |
0.5 |
| D6.3 | artifact/evidence.py |
1.0 |
| D6.4 | artifact/diagnostics.py |
1.0 |
| D6.5 | artifact/statistics.py |
1.0 |
| D6.6 |
artifact/serializer.py (uses cross-cutting/serialization.py) |
2.0 |
| D6.7 |
artifact/_hash.py — SHA-256 internals |
1.0 |
| D6.8 |
artifact/replay.py — rehydration + version checks |
3.0 |
| D6.9 |
api/types.py — re-exports |
2.0 |
| D6.10 |
api/errors.py — re-exports (12 public errors per V1_ACCEPTANCE_CRITERIA.md §1.4: PaxmanError, InvalidContractError, ExecutionError, CapabilityError, InferenceProviderError, BudgetExceededError, ReconciliationError, ReplayError, VersionMismatchError, HashMismatchError, ConfigurationError, CapabilityNotFoundError [added per Oracle review C1 — required by V1_ACCEPTANCE_CRITERIA.md §1.5]) |
1.0 |
| D6.11 |
api/protocols.py — re-exports |
1.0 |
| D6.12 |
api/registry.py — register_adapter, register_capability
|
1.0 |
| D6.13 |
api/version.py — __version__
|
0.5 |
| D6.14 |
api/normalize.py — top-level orchestration |
3.0 |
| D6.15 |
api/replay.py — paxman.replay()
|
1.5 |
| D6.16 |
src/paxman/__init__.py — re-exports (≤30 lines) |
1.0 |
| D6.17 | Unit tests for all Artifact modules | 3.0 |
| D6.18 | Unit tests for all api/ modules |
2.0 |
| D6.19 |
First end-to-end smoke test (tests/integration/test_smoke_e2e.py) |
1.5 |
| D6.20 | Replay equality test (byte-equal) | 1.0 |
| D6.21 | Replay tamper detection test | 0.5 |
| D6.22 | Replay version mismatch test | 0.5 |
| D6.23 |
tests/public_api/test_public_api.py — public API snapshot |
1.0 |
| D6.24 |
import-linter contract for artifact/ and api/
|
0.5 |
| D6.25 | Update README.md quickstart to use the real paxman.normalize()
|
0.5 |
Total: ~31.5 id-ed. Sized for 4 engineers × 2 weeks (2 on artifact, 1 on api, 1 on tests + public API).
| Type | Item | Notes |
|---|---|---|
| People | 4 engineers (1 senior, 3 mid-level) | Replay is subtle; needs senior review |
| Tools | All Sprint 1-5 deps | Standard Python dev env |
| Tests | Reconciler + all 5 capabilities from Sprint 3-5 | Done |
| Docs |
REPLAY_AND_DETERMINISM.md — the full replay model |
Read by the replay implementer |
| Decisions | Replay hash inputs (per REPLAY_AND_DETERMINISM.md §2.1) — already decided |
Already in doc |
| Tool | Version | Purpose | Notes |
|---|---|---|---|
| hashlib (stdlib) | — | SHA-256 for replay hash | Stdlib |
| json (stdlib) | — | Uses cross-cutting/serialization.py, not stdlib directly |
Per the anti-pattern in TESTING_STRATEGY.md §10.2 |
| packaging (PyPA) | latest | Version comparison for replay version checks | New dev dep |
| hypothesis | ≥ 6.0 | (Sprint 7) | Not used this sprint |
None.
-
paxman.normalize(text, InvoiceContract)returns anExecutionArtifactend-to-end. -
paxman.replay(artifact, contract)returns a byte-equal artifact. - Modifying any field of the artifact (via
dataclasses.replace) and callingpaxman.replayraisesHashMismatchError. -
paxman.replaywith a different Paxman version (mocked) raisesVersionMismatchError. 4b.paxman.replayraisesCapabilityNotFoundErrorwhen a pinned capability is no longer registered (perV1_ACCEPTANCE_CRITERIA.md§1.5).CapabilityNotFoundErroris a subclass ofReplayError(added in Sprint 1 D1.10). -
paxman.register_adapter(MyAdapter())andpaxman.register_capability(MyCapability())work. - The public API surface is exactly:
paxman.normalize,paxman.replay,paxman.register_adapter,paxman.register_capability,paxman.__version__, plus the public types and errors listed inV1_ACCEPTANCE_CRITERIA.md§1.4. -
tests/public_api/test_public_api.pyfails if any new symbol is added topaxman/__init__.pywithout an ADR. - The artifact contains all required fields:
normalized_data,field_results,unresolved_fields,evidence,diagnostics,execution_plan,replay_hash,statistics. - The artifact serializes to a stable JSON (sorted keys, no whitespace, RFC 8785-style).
- Test coverage on
artifact/≥ 95% (V1 acceptance §2.2 — replay is critical). - Test coverage on
api/≥ 90%. -
mypy --strict src/paxmanis clean on all 7 subsystems + api. -
import-linteris clean. -
banditis clean. -
pip-auditis clean. -
make ciis green. - The
README.mdquickstart runs end-to-end (manual smoke test by an engineer).
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| The replay hash is unstable across Python versions or platforms | Medium | High | Use SHA-256, hex-encode. The hash inputs are explicitly listed in REPLAY_AND_DETERMINISM.md §2.1. Add a property test that runs the hash 1000 times and asserts byte-equal output. |
The orchestrator in api/normalize.py mishandles an error path (e.g., Reconciler raises, but the artifact is returned anyway) |
Medium | High | Explicitly enumerate the error-handling paths. Test every documented error path. |
| The artifact JSON shape is incompatible with future versions | Medium | Medium | Embed paxman_version, planner_version, capability_versions in the artifact. Document the schema in artifact/artifact.py module docstring. |
| The public API snapshot test is too strict and breaks every PR | Low | Medium | Allow __version__ and the listed public types. Anything else requires an ADR. |
artifact/serializer.py accidentally re-implements the stdlib JSON encoder (defeats the determinism guarantee) |
Low | High | Hard rule: artifact/serializer.py MUST delegate to cross-cutting/serialization.py. Add a unit test that asserts the imported encoder is the one from cross-cutting. |
import-linter flags api/ importing from contract/, planner/, etc. (legitimate) but rejects api/normalize.py importing from all of them |
Low | Low | The api/ layer is explicitly allowed to import from any layer per PACKAGE_STRUCTURE.md §11. Configure import-linter to allow this. |
| The first end-to-end smoke test takes >10 seconds | Low | Low | Use a tiny contract (3 fields, plain text input). Performance is a Sprint 9 concern. |
-
../V1_ACCEPTANCE_CRITERIA.md§1.3 (artifact, api), §1.4 (public API), §1.5 (replay). -
../PACKAGE_STRUCTURE.md§8 —artifact/module spec, §9 —api/module spec. -
../REPLAY_AND_DETERMINISM.md§2, §3 — replay hash and replay protocol. -
../ARCHITECTURE.md§4.6, §4.7, §9. -
../TESTING_STRATEGY.md§5, §10.2. -
../SECURITY.md— public surface stability (no leaking internal types).