Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Bug Report
description: Report a bug in schema validation, finalization, or prompt generation
labels: ["bug"]
body:
- type: dropdown
id: package
attributes:
label: Package
options:
- "@synapt-dev/extract (npm)"
- "synapt-extract (PyPI)"
- "JSON Schema"
- "Prompt system"
validations:
required: true

- type: input
id: version
attributes:
label: Version
placeholder: "0.3.0"
validations:
required: true

- type: textarea
id: description
attributes:
label: Description
description: What happened, and what did you expect?
validations:
required: true

- type: textarea
id: reproduction
attributes:
label: Reproduction
description: Minimal code or extraction document that triggers the bug.
render: typescript
validations:
required: true

- type: textarea
id: environment
attributes:
label: Environment
description: "Runtime (Node, Deno, Python), version, and OS."
placeholder: "Node 22.0.0, macOS 15.1"
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Security Vulnerability
url: mailto:security@synapt.dev
about: Report security vulnerabilities privately via email. Do not open a public issue.
38 changes: 38 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Feature Request
description: Propose a new extraction capability, schema field, or API change
labels: ["enhancement"]
body:
- type: dropdown
id: area
attributes:
label: Area
options:
- "Schema (new field or sub-schema)"
- "Validation (new check or constraint)"
- "Finalization (pipeline behavior)"
- "Prompt system (capability or profile)"
- "Other"
validations:
required: true

- type: textarea
id: proposal
attributes:
label: Proposal
description: What should change? Include the schema shape or API signature if applicable.
validations:
required: true

- type: textarea
id: motivation
attributes:
label: Motivation
description: What use case does this enable? What problem does it solve?
validations:
required: true

- type: textarea
id: compatibility
attributes:
label: Compatibility
description: "Is this additive (v1.x safe) or breaking (requires v2)? Would existing documents need migration?"
72 changes: 71 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:

- run: pip install build

- run: test -d ../../prompts && cp -r ../../prompts src/synapt_extract/prompts || true
- run: rm -rf src/synapt_extract/prompts && test -d ../../prompts && cp -r ../../prompts src/synapt_extract/prompts || true

- run: python -m build

Expand All @@ -55,3 +55,73 @@ jobs:
- run: npm ci

- run: npx tsc --noEmit

test-typescript:
runs-on: ubuntu-latest
defaults:
run:
working-directory: packages/ts
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: "22"

- run: npm ci

- run: npx vitest run

reproducibility:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: "22"

- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: "3.12"

- name: Verify npm pack determinism
working-directory: packages/ts
run: |
npm ci
npm run build
mkdir -p /tmp/npm-pack-1 /tmp/npm-pack-2
npm pack --pack-destination /tmp/npm-pack-1
npm pack --pack-destination /tmp/npm-pack-2
echo "=== Pack 1 ==="
sha256sum /tmp/npm-pack-1/*.tgz
echo "=== Pack 2 ==="
sha256sum /tmp/npm-pack-2/*.tgz
HASH1=$(sha256sum /tmp/npm-pack-1/*.tgz | cut -d' ' -f1)
HASH2=$(sha256sum /tmp/npm-pack-2/*.tgz | cut -d' ' -f1)
if [ "$HASH1" != "$HASH2" ]; then
echo "::error::npm pack is not deterministic"
exit 1
fi
echo "npm pack determinism verified: $HASH1"

- name: Verify Python build determinism
working-directory: packages/python
env:
SOURCE_DATE_EPOCH: "1704067200"
run: |
pip install build
rm -rf src/synapt_extract/prompts && test -d ../../prompts && cp -r ../../prompts src/synapt_extract/prompts || true
python -m build --outdir /tmp/py-dist-1
rm -rf src/synapt_extract.egg-info build
python -m build --outdir /tmp/py-dist-2
# Verify wheel determinism (the installable artifact).
# sdist (.tar.gz) is excluded: setuptools embeds wall-clock
# mtimes in PAX extended headers, a known upstream limitation.
HASH1=$(sha256sum /tmp/py-dist-1/*.whl | cut -d' ' -f1)
HASH2=$(sha256sum /tmp/py-dist-2/*.whl | cut -d' ' -f1)
if [ "$HASH1" != "$HASH2" ]; then
echo "::error::Python wheel build is not deterministic"
exit 1
fi
echo "Python wheel determinism verified: $HASH1"
10 changes: 9 additions & 1 deletion .github/workflows/publish-npm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
types: [published]

permissions:
contents: read
contents: write
id-token: write

jobs:
Expand All @@ -27,3 +27,11 @@ jobs:
- run: npm run build

- run: npm publish --provenance --access public

- name: Generate SBOM
run: npm sbom --omit=dev --sbom-format cyclonedx > sbom.cdx.json

- name: Upload SBOM to release
env:
GH_TOKEN: ${{ github.token }}
run: gh release upload ${{ github.event.release.tag_name }} sbom.cdx.json --clobber
2 changes: 1 addition & 1 deletion .github/workflows/publish-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:

- run: pip install build

- run: test -d ../../prompts && cp -r ../../prompts src/synapt_extract/prompts || true
- run: rm -rf src/synapt_extract/prompts && test -d ../../prompts && cp -r ../../prompts src/synapt_extract/prompts || true

- run: python -m build

Expand Down
71 changes: 71 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,76 @@
# Changelog

## v0.3.0

v1.2 spec: 8 new extraction fields, 5 new sub-schemas, sentiment dual-shape, entity/goal sub-schema promotion.

### New sub-schemas

- `schemas/entity/v1.json` -- promoted from inline `$defs` to standalone with `$id`
- `schemas/goal/v1.json` -- promoted from inline `$defs` to standalone with `$id`
- `schemas/question/v1.json` -- questions raised in source text
- `schemas/action/v1.json` -- concrete next-steps with origin tracking (extracted vs proposed_from_goals)
- `schemas/decision/v1.json` -- directional commitments identified in source
- `schemas/sentiment/v1.json` -- structured sentiment with valence/intensity/confidence
- `schemas/source-metadata/v1.json` -- source document metadata (token count, modality, format)

### New extraction fields

- `keywords`: surface lexical terms (sibling to themes; keywords are specific terms, themes are topical categories)
- `questions`: questions raised in the source text, with optional `directed_to` entity ref
- `actions`: action items with required `origin` field ("extracted" or "proposed_from_goals")
- `decisions`: directional commitments with optional `decided_at` timestamp
- `language`: IETF BCP 47 language tag (e.g. "en-US", "es", "pt-BR")
- `source_metadata`: source document metadata for normalization across lengths and formats
- `confidence`: extraction-level overall confidence score (0.0 to 1.0)
- `sentiment` dual-shape: accepts string (v1.0) or structured SynaptSentiment object (v1.2)

### New capabilities

`keywords`, `structured_sentiment`, `questions`, `actions`, `decisions`, `language`, `source_metadata`, `confidence` (8 new, 25 total)

### Entity enhancements

- `aliases` field on entities for per-extraction same-entity grouping

### Prompt system

- 8 new prompt fragment files for all new capabilities
- `CAPABILITY_RULES` for structured_sentiment, actions, and keywords
- Updated full.json profile with all 25 capabilities
- `CANONICAL_ORDER` updated for deterministic prompt composition

### Finalize pipeline

- `detectCapabilities` detects all 8 new capabilities from payload structure
- Sub-schema version injection for questions, actions, decisions, sentiment object, and source_metadata
- Evidence anchoring and assertion signals detection extended to questions, actions, and decisions

### Validators

- Entity ID cross-referencing extended to actions and decisions (dangling entity_ref detection)
- 5 new sub-schema validators with additionalProperties enforcement
- Sentiment dual-shape dispatch (string vs object)
- BCP 47 language tag validation
- Confidence bounds validation (0.0 to 1.0)

### Tests

- 219 TypeScript tests, 273 Python tests, 15 conformance cases
- 9 new conformance fixtures for v1.2 fields

### Compatibility

v1.2 is additive. v1.0 and v1.1 documents remain valid. The `sentiment` field now accepts either a string (v1.0) or a SynaptSentiment object (v1.2). Readers MUST branch on string vs object, same pattern as `produced_by`.

### Prompt gap fixes (from v0.2.0)

Three gaps discovered during v0.2.0 dogfooding are now fixed:

1. `goals.txt` now always mentions `entity_refs` as a required field (was only prompted with `goal_entity_refs` capability)
2. `temporal_refs.txt` now describes the full sub-schema shape (`type`, `resolved_end`, `context`) instead of just `raw` and `resolved`
3. `goal_timing.txt` now explicitly scopes `stated_at`/`resolved_at` to goals only, preventing LLM from adding these to entities

## v0.2.0

v1.1 spec: typed SynaptProducer schema, additive backwards-compat, in-place v1.x = additive only policy.
Expand Down
37 changes: 30 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,20 @@ Any text + Any LLM -> SynaptExtraction (IL) -> @synapt/memory (intelligence)

This repo contains the v1 schema, types, validators, finalization pipeline, and composable prompt system in both TypeScript and Python.

## Packages
## Install

| Package | Registry | Install |
|---------|----------|---------|
| `@synapt-dev/extract` | npm | `npm install @synapt-dev/extract` |
| `synapt-extract` | PyPI | `pip install synapt-extract` |
| `@synapt-dev/extract` | npm | `npm install @synapt-dev/extract@0.3.0` |
| `synapt-extract` | PyPI | `pip install synapt-extract==0.3.0` |

**Deno:**

```typescript
import { buildExtractionPrompt } from "npm:@synapt-dev/extract@0.3.0";
```

**Version pinning:** Always pin to an exact version (`@0.3.0`, `==0.3.0`). Do not use ranges (`^0.3.0`, `~0.3.0`, `>=0.3.0`). The IL schema evolves across minor versions (v1.1 added `produced_by` object form, v1.2 added 8 new fields). Pinning prevents unexpected schema changes from affecting your extraction pipeline.

## Quick start

Expand Down Expand Up @@ -81,11 +89,11 @@ SynaptExtraction documents are assembled in three stages:

## Prompt profiles

| Profile | Model class | Capabilities |
| Profile | Model class | Capabilities (25 total) |
|---------|------------|--------------|
| `minimal` | 3B-7B local | entities, entity_state, goals, themes, summary |
| `standard` | GPT-4o-mini, Haiku | + entity_context, goal_timing, facts, temporal_refs, sentiment, evidence_anchoring |
| `full` | GPT-4o, Sonnet, Opus | + entity_ids, goal_entity_refs, relations, relation_origin, assertion_signals, temporal_classes |
| `full` | GPT-4o, Sonnet, Opus | + entity_ids, goal_entity_refs, keywords, structured_sentiment, questions, actions, decisions, relations, relation_origin, assertion_signals, temporal_classes, language, source_metadata, confidence |

## JSON Schema

Expand All @@ -95,7 +103,20 @@ The canonical schema is hosted at:
https://synapt.dev/schemas/extract/v1.json
```

Sub-schemas: `source-ref/v1.json`, `embedding/v1.json`, `assertion-signals/v1.json`, `temporal-ref/v1.json`.
Sub-schemas: `source-ref/v1.json`, `embedding/v1.json`, `assertion-signals/v1.json`, `temporal-ref/v1.json`, `producer/v1.json`, `entity/v1.json`, `goal/v1.json`, `question/v1.json`, `action/v1.json`, `decision/v1.json`, `sentiment/v1.json`, `source-metadata/v1.json`.

## Compatibility

v1.x schema updates are additive only. Breaking changes require v2.

- v1.0 documents remain valid under v1.2 validators
- `produced_by` accepts string (v1.0) or SynaptProducer object (v1.1+)
- `sentiment` accepts string (v1.0) or SynaptSentiment object (v1.2+)
- Readers MUST branch on string vs object for both fields

## Supply chain

Releases are published with Sigstore provenance via npm OIDC trusted publishing and PyPI trusted publishing. Each GitHub Release includes a CycloneDX SBOM (`sbom.cdx.json`).

## Repo structure

Expand All @@ -106,10 +127,12 @@ extract/
python/ # synapt-extract (Python, PyPI)
schemas/ # JSON Schema files (language-agnostic)
prompts/
v1/ # Capability prompt fragments
v1/ # 25 capability prompt fragments + preamble/postamble
profiles/ # Profile definitions (minimal, standard, full)
tests/
python/ # Python test suite
conformance/ # Cross-language conformance fixtures
docs/ # Design documents
```

## License
Expand Down
Loading
Loading