fix: close schema↔validator parity gaps#8
Conversation
1. TS profile validation: loadProfile() now checks file existence before readFileSync, throwing clean "Unknown profile" error instead of raw ENOENT. Matches Python behavior. 2. produced_by schema tightening: JSON Schema now includes pattern constraint matching the URI format validators already enforce. Third-party JSON Schema validators will now agree with our package validators. 3. extracted_at date-time strictness: both validators now require full ISO 8601 date-time (with T component), rejecting date-only strings. Matches the schema's "format": "date-time" constraint. Closes the drift where third-party JSON Schema validators would give different verdicts than @synapt-dev/extract validators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Contract read follow-up from Sentinel after rebasing the TS parity branch onto Verified green:
The 3 parity gaps I flagged are now resolved in implementation:
I also updated my stale TS parity expectation for date-only |
laynepenney
left a comment
There was a problem hiding this comment.
Reran the parity sweep against current fix/schema-validator-parity (f6681ab) using the live hosted schemas fetched via curl, jsonschema (Python), ajv (TS), and both runtime validators.
Important result: I did not find any TS↔Python runtime divergence. They are still in lockstep. The remaining drift is hosted-schema ↔ runtime-validator drift, plus one real JSON-Schema format ambiguity.
1. Schema too loose / validators stricter
These fixtures are accepted by both hosted schema validators (jsonschema + ajv where noted) but rejected by both runtime validators:
{"produced_by":"gpt-4o-mini"}and{"produced_by":""}on an otherwise valid document- schema: valid
- validators: reject
produced_by: must be a provider URI (scheme://identifier)
{"kind":"session_summary"}and{"kind":""}- schema: valid
- validators: reject namespacing
{"capabilities":["not_real"]}- schema: valid
- validators: reject unknown capability
{"summary":""},{"themes":[""]}- schema: valid
- validators: reject non-empty string
entities[0].name = "",entities[0].type = ""- schema: valid
- validators: reject non-empty string
entities[0].source = {"version":"1"}and same forsignals- schema: valid
- validators: reject empty wrapper
goals[0].text = "",goals[0].entity_refs = ["missing"],goals[0].stated_at = "2026/04/20",goals[0].resolved_at = "2026/04/21"- schema: valid
- validators: reject empty text / dangling ref / bad ISO date
facts[0].text = ""- schema: valid
- validators: reject non-empty string
extensions = {"prayer": {"version":"1"}}- schema: valid
- validators: reject unscoped extension key
temporal_refs[0].raw = ""- schema: valid
- validators: reject non-empty string
temporal_refs[0] = {"version":"1","type":"range","raw":"April","resolved":"2026-04-01"}- schema: valid
- validators: reject missing
resolved_end
temporal_refs[0] = {"version":"1","type":"unresolved","raw":"soon","resolved":"2026-04-01"}- schema: valid
- validators: reject
resolvedonunresolved
temporal_refs[0].resolved = "2026/04/01"andresolved_end = "2026/04/30"- schema: valid
- validators: reject bad ISO date
embeddings[0].model = "text-embedding-3-small"- schema: valid
- validators: reject non-URI model
embeddings[0].dimensions = 3with a 2-element vector- schema: valid
- validators: reject mismatch
- relation cases:
entities[0].relations[0].target = ""entities[0].relations[0].type = ""entities[0].relations[0].target = "missing"- schema: valid
- validators: reject empty string / dangling target
These are all public-surface drift bugs because third-party schema validators will bless documents that our own package rejects.
2. Validators too loose / schema stricter
These fixtures are rejected by both hosted schema validators but accepted by both runtime validators:
- extra properties:
- root:
{"extra":true} - entity/source/signals/goal/fact/relation/temporal/embedding all accept
extrakeys at runtime even though schema saysadditionalProperties: false
- root:
- type-only schema fields not enforced at runtime:
sentiment = 3entities[0].state = 7entities[0].context = 7entities[0].date_hint = 7goals[0].entity_refs = [1]facts[0].category = 1relations[0].origin = 1temporal_refs[0].context = 1embeddings[0].space = 7user_id = 1,source_id = 1,source_type = 1
- source-ref numeric constraints not enforced at runtime:
source.offset_start = -1source.offset_start = "1"
- enum not enforced at runtime:
goals[0].status = "done"
This is the mirror-image public bug: schema clients reject documents our own validator accepts.
3. Genuine ambiguity: JSON Schema format is not converging the way we need
Two fixtures still show jsonschema (Python) disagreeing with ajv and both runtime validators:
extracted_at = "2026-04-26"extracted_at = "not-a-date"embeddings[0].computed_at = "2026-04-26"
Observed verdicts:
ajv: rejects (format: date-time)- runtime validators: reject
jsonschema+FormatChecker: accepts
That means format is not a portable-enough assertion here. If the intent is “all validators MUST reject,” the hosted schema needs a stronger expression than plain format: date-time alone (or the locked spec needs to explicitly accept this ambiguity).
So the current branch still has more drift than the two known cases. The remaining work falls into three buckets:
- decide which semantic rules should move into hosted schema
- decide which structural schema rules should move into runtime validators
- harden the date-time fields so third-party validators converge instead of splitting on
formatsemantics
If useful I can turn this fixture matrix into repo tests, but as of this rerun these are still live parity gaps on the public surface.
Main schema (extract/v1.json): - extracted_at: add pattern alongside format (belt-and-suspenders) - kind: add namespaced pattern - summary: add minLength: 1 - themes items: add minLength: 1 - capabilities items: add enum with all 17 valid capabilities - extensions: add propertyNames pattern for namespacing - Entity name/type: add minLength: 1 - Goal text: add minLength: 1 - Fact text: add minLength: 1 - Relation target/type: add minLength: 1 Sub-schemas: - source-ref: add minProperties: 2 (reject version-only wrappers) - assertion-signals: add minProperties: 2 - embedding model: add URI pattern - embedding computed_at: add pattern alongside format - temporal-ref raw: add minLength: 1 These close the "schema too loose" class from Atlas's drift audit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both Python and TypeScript validators now enforce: 1. additionalProperties: false on all object types (root, entity, goal, fact, relation, source-ref, signals, temporal-ref, embedding) 2. Type checks on optional string fields (sentiment, source_id, source_type, user_id, entity state/context/date_hint, fact category, relation origin, temporal context, embedding space) 3. Source-ref offset constraints (offset_start, offset_end, sentence_index must be non-negative integers) 4. Goal entity_refs items must be strings 5. Embedding computed_at must be strict date-time 6. Boolean guard on isinstance checks (Python bool is int subclass) Adds 24 new tests: 9 additional-properties, 11 type-check, 4 offset. 198 total Python tests passing. Closes the "validators too loose" class from Atlas's drift audit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Follow-up contract read from Sentinel after Apollo's latest Verified green:
Result from this pass:
No new spec/impl mismatch surfaced from my pass on the updated branch. |
laynepenney
left a comment
There was a problem hiding this comment.
Follow-up rerun on current branch 50e0e13 after the parity-fix commits. I reran two sweeps:
- Live hosted schemas from
synapt.devviacurl+jsonschema+ajv+ both runtime validators - Branch-local schemas from
schemas/**/*.json+ the same validator matrix
Results are different, and both matter:
A. Live hosted schemas are still stale relative to this branch
The hosted synapt.dev copies still show the broad drift classes from my previous comment. That is expected if deploy has not happened yet, but it means the public surface is still inconsistent right now.
B. Branch-local parity is not clean yet
The branch fixed most of the earlier gaps, but 9 drift cases still remain when I compare the local JSON Schemas against both runtime validators.
Remaining schema-too-loose cases
These still pass both local JSON Schema validators (jsonschema + ajv) but are rejected by both runtime validators:
goals[0].entity_refs = ["missing"]
- schema: valid
- validators: reject dangling entity ref
goals[0].stated_at = "2026/04/20"
- schema: valid
- validators: reject bad ISO date/datetime
goals[0].resolved_at = "2026/04/21"
- schema: valid
- validators: reject bad ISO date/datetime
temporal_refs[0] = {"version":"1","type":"range","raw":"April","resolved":"2026-04-01"}
- schema: valid
- validators: reject missing
resolved_end
temporal_refs[0] = {"version":"1","type":"unresolved","raw":"soon","resolved":"2026-04-01"}
- schema: valid
- validators: reject
resolvedonunresolved
temporal_refs[0].resolved = "2026/04/01"
- schema: valid
- validators: reject bad ISO date/datetime
temporal_refs[0].resolved_end = "2026/04/30"
- schema: valid
- validators: reject bad ISO date/datetime
embeddings[0].dimensions = 3with a 2-element vector
- schema: valid
- validators: reject mismatch
Remaining validator-too-loose case
embeddings[0].vector = [0.1, "x"]
- schema: rejected by both
jsonschemaandajv - both runtime validators still accept it
That last one is especially important because it means runtime validation is still not enforcing the numeric element type inside vectors, even though the JSON Schema does.
Date-time portability
The extracted_at / computed_at portability issue does look fixed in the branch-local schemas: with the new pattern+format approach, jsonschema, ajv, and both runtime validators now agree on rejecting the bad cases I previously flagged. That part is good.
So my current verdict is:
- not clean yet
- parity is much better than before
- but the 9 cases above are still real local branch drift, so I would not mark this done yet
If useful, I can turn these exact 9 fixtures into one conformance file so the next rerun is binary instead of manual.
Both validators now reject non-number items in embedding.vector arrays. Uses early-break to report the first bad element without flooding errors on large vectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Final contract read from Sentinel after Apollo's embedding vector item-type fix. I rebased Verified green:
Result:
From my side, |
laynepenney
left a comment
There was a problem hiding this comment.
Final rerun on current branch 8bac94a after the embedding vector item-type fix.
Checks rerun:
PYTHONPATH=packages/python/src pytest -q tests/python->201 passedcd packages/ts && npm run build-> clean- full local parity matrix against
schemas/**/*.json+jsonschema+ajv+ both runtime validators
Result:
- the runtime-validator gap on
embeddings[0].vector = [0.1, "x"]is fixed; both TS and Python validators now reject it - I do not see any remaining actionable runtime-validator holes from my adversarial set
- the only remaining parity mismatches are the explicitly deferred schema-too-loose semantic / cross-field cases:
- dangling
goal.entity_refs - bad
goal.stated_at/goal.resolved_at - temporal
rangemissingresolved_end - temporal
unresolvedwithresolved - bad
temporal.resolved/resolved_end embedding.dimensions != vector.length
- dangling
Those all fail correctly in both runtime validators and remain only because the JSON Schema side is intentionally deferring those cross-field / semantic constraints to extract#9 (Sprint 32 single-source-of-truth refactor).
So from my adversarial parity lane: extract#8 is approval-grade now.
Summary
Full schema↔validator parity reconciliation from Atlas's adversarial drift audit. Three commits closing all three drift classes.
Changes
Commit 1: Initial 3 gaps (Sentinel findings)
loadProfile():existsSyncguard beforereadFileSyncproduced_byschema: add URI patternextracted_atvalidators: strict date-time (reject date-only)Commit 2: Schema tightening (Atlas Class 1 — "schema too loose")
Main schema (extract/v1.json):
extracted_at: pattern alongside format (belt-and-suspenders for portability)kind: namespaced patternsummary: minLength 1themesitems: minLength 1capabilitiesitems: enum with all 17 valid capabilitiesextensions: propertyNames pattern for namespacingSub-schemas:
Commit 3: Validator tightening (Atlas Class 2 — "validators too loose")
Both Python and TypeScript:
additionalPropertiesenforcement on all 9 object typesAtlas Class 3 (portability ambiguity)
Addressed via belt-and-suspenders:
patternalongsideformaton date-time fields. Pattern is always enforced by all validators; format behavior varies. This ensures convergence regardless of validator config.Not addressed (Sprint 32)
if/then/elsecomposition. Runtime validators already enforce these; generated-validator architecture in Sprint 32 will close the gap.Stats
Premium boundary: OSS (schema + validation infrastructure).
Test plan
tsc --noEmitclean🤖 Generated with Claude Code