Skip to content

Sprint 31 ceremony (extract): IL v1 package#5

Merged
laynepenney merged 18 commits into
mainfrom
sprint-31
Apr 26, 2026
Merged

Sprint 31 ceremony (extract): IL v1 package#5
laynepenney merged 18 commits into
mainfrom
sprint-31

Conversation

@laynepenney
Copy link
Copy Markdown
Member

@laynepenney laynepenney commented Apr 26, 2026

Sprint 31 Ceremony: SynaptExtraction IL v1 Package

Ships @synapt-dev/extract (npm) and synapt-extract (PyPI) as a tight beta.

PRs included (7)

PR Title Author
#1 Schema, validation, and finalization core Apollo
#2 Composable prompt system Apollo
#3 Publish prep (readme, prompt assets, CI, SHA pinning) Apollo
#4 Schema path rename (extraction/ → extract/) Apollo
#6 Switch npm publish to OIDC trusted publishing Apollo
#7 TypeScript parity test suite (Vitest) Sentinel
#8 Schema↔validator parity reconciliation (4 commits) Apollo

What ships

  • SynaptExtraction IL v1 schema with 5 sub-schemas (extract, source-ref, assertion-signals, temporal-ref, embedding)
  • Dual-language validators (Python + TypeScript) with full schema↔validator parity
  • Three-stage finalization pipeline (LLM content → client injection → library normalization)
  • Composable prompt system with 17 capabilities, 3 profiles, dependency closure
  • OIDC trusted publishing for npm (no long-lived secrets)
  • SHA-pinned CI/CD for supply-chain hardening
  • 201 Python tests, TypeScript Vitest parity suite

Stats

  • 17 extraction capabilities with dependency closure
  • 3 prompt profiles (minimal/standard/full)
  • 22 prompt fragment assets
  • 5 JSON schemas (main + 4 sub-schemas)
  • Full additionalProperties enforcement across all 9 object types
  • Belt-and-suspenders date-time validation (pattern + format)

Premium boundary: OSS (schema, validation, prompt infrastructure).

Test plan

  • 201 Python tests passing
  • TypeScript Vitest parity suite passing
  • TypeScript tsc --noEmit clean
  • All 5 JSON schemas valid
  • Sentinel approved (test parity)
  • Atlas approved (adversarial sweep)
  • Layne creates v0.1.1 release tag to trigger publish

🤖 Generated with Claude Code

laynepenney and others added 18 commits April 26, 2026 09:38
Implements buildExtractionPrompt() in both TypeScript and Python.

- 17 capability prompt fragments (prompts/v1/*.txt) per locked spec
- Shared preamble/postamble with Mustache-style template variables
- 3 profile files (minimal/standard/full) as capability set shorthand
- Capability dependency closure (e.g., relations auto-includes entity_ids)
- Canonical composition order: primary objects, modifiers, cross-cutting
- Capability-specific rules appended before the text block
- Profile add/remove API for fine-grained customization
- resolve_capabilities() exposed for introspection
- 53 new Python tests, all 109 tests passing
- TypeScript type-checks clean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Atlas review findings on extract#2:
- Validate capability names upfront; unknown capabilities raise ValueError
  before any file IO (no more raw FileNotFoundError on bogus.txt)
- Fix template double-escaping: caller-controlled values (categories,
  source_type) are no longer recursively expanded through the template
  engine. Values are treated as opaque strings.
- Reject empty capability sets explicitly
- Reject modifier-only capability sets (assertion_signals/evidence_anchoring
  without entities, goals, or facts)
- Update PROMPTS_DIR to resolve from installed package path first, falling
  back to repo root for development

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: composable prompt system (recall#794)
- CI workflow: Python 3.10-3.13 test matrix + TypeScript type-check
- npm publish workflow: triggered on GitHub release, provenance enabled
- PyPI publish workflow: trusted publishing via gh-action-pypi-publish
- package.json: publishConfig with public access and provenance
- pyproject.toml: classifiers, schema URL, documentation URL
- README: package overview, quick start (TS + Python), pipeline docs

Trusted publishing setup (npm token, PyPI environment) is a manual step.

Ref synapt-dev/recall#795

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review findings on extract#3:
- Fix pyproject.toml readme path: create packages/python/README.md instead
  of referencing ../../README.md which breaks setuptools
- Remove PEP 639-incompatible license classifier
- Add prompt asset bundling: npm prepack copies prompts/ into package,
  Python pyproject.toml declares package-data for prompts/**
- Add build-python CI job to catch build failures before publish
- SHA-pin all GitHub Actions (actions/checkout, setup-node, setup-python,
  gh-action-pypi-publish) for supply-chain hardening
- Add .gitignore entries for build-time prompt copies

npm still uses NPM_TOKEN for auth; full OIDC trusted publishing requires
linking the package on npmjs.com (Layne setup). PyPI uses OIDC via
gh-action-pypi-publish (no secrets needed, already configured).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: publish prep for npm + PyPI (recall#795)
Canonical path is synapt.dev/schemas/extract/v1.json, not extraction/.
Updated directory name, $id field, and all references in README, Python
README, pyproject.toml, and test fixtures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: rename schemas/extraction/ to schemas/extract/ per locked spec
Remove NPM_TOKEN secret now that trusted publishing is configured on
npmjs.com (synapt-dev/extract → publish-npm.yml). Subsequent releases
authenticate via GitHub OIDC token exchange with Sigstore provenance
attestations. No long-lived secrets needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. TS profile validation: loadProfile() now checks file existence
   before readFileSync, throwing clean "Unknown profile" error
   instead of raw ENOENT. Matches Python behavior.

2. produced_by schema tightening: JSON Schema now includes pattern
   constraint matching the URI format validators already enforce.
   Third-party JSON Schema validators will now agree with our
   package validators.

3. extracted_at date-time strictness: both validators now require
   full ISO 8601 date-time (with T component), rejecting date-only
   strings. Matches the schema's "format": "date-time" constraint.

Closes the drift where third-party JSON Schema validators would
give different verdicts than @synapt-dev/extract validators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main schema (extract/v1.json):
- extracted_at: add pattern alongside format (belt-and-suspenders)
- kind: add namespaced pattern
- summary: add minLength: 1
- themes items: add minLength: 1
- capabilities items: add enum with all 17 valid capabilities
- extensions: add propertyNames pattern for namespacing
- Entity name/type: add minLength: 1
- Goal text: add minLength: 1
- Fact text: add minLength: 1
- Relation target/type: add minLength: 1

Sub-schemas:
- source-ref: add minProperties: 2 (reject version-only wrappers)
- assertion-signals: add minProperties: 2
- embedding model: add URI pattern
- embedding computed_at: add pattern alongside format
- temporal-ref raw: add minLength: 1

These close the "schema too loose" class from Atlas's drift audit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both Python and TypeScript validators now enforce:

1. additionalProperties: false on all object types (root, entity,
   goal, fact, relation, source-ref, signals, temporal-ref, embedding)
2. Type checks on optional string fields (sentiment, source_id,
   source_type, user_id, entity state/context/date_hint, fact
   category, relation origin, temporal context, embedding space)
3. Source-ref offset constraints (offset_start, offset_end,
   sentence_index must be non-negative integers)
4. Goal entity_refs items must be strings
5. Embedding computed_at must be strict date-time
6. Boolean guard on isinstance checks (Python bool is int subclass)

Adds 24 new tests: 9 additional-properties, 11 type-check, 4 offset.
198 total Python tests passing.

Closes the "validators too loose" class from Atlas's drift audit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both validators now reject non-number items in embedding.vector
arrays. Uses early-break to report the first bad element without
flooding errors on large vectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: switch npm publish to OIDC trusted publishing
fix: close schema↔validator parity gaps
test: add TypeScript parity suite for extract
@laynepenney laynepenney merged commit 08292aa into main Apr 26, 2026
6 checks passed
@laynepenney laynepenney deleted the sprint-31 branch April 26, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant