Semantic Data Interchange Format
Compact, semantic and canonicalizable structured data
for AI agents, deterministic workflows and human-auditable records.
What is SDIF? · Ecosystem · Why it exists · Status · Contributing
pip install sdif-format
PyPI package · v1.0.0 release · Documentation
|
Compact
Less repeated structure. Fewer wasted tokens. |
Semantic
Tables, relations, metadata and intent. |
Canonical
Stable output for hashing, signing and comparison. |
Auditable
Designed to be read, reviewed and trusted. |
SDIF — Semantic Data Interchange Format is a compact, canonicalizable and AI-friendly data format for structured information that needs to move cleanly between humans, tools, agents and deterministic workflows.
It is designed for cases where data should be:
- small enough to be efficient in AI context windows;
- structured enough for machines;
- readable enough for humans;
- deterministic enough for hashing, signing and reproducible workflows;
- semantic enough to express tables, relations, metadata and intent.
@sdif 1.0
kind Plan
id release.v1
title "Release readiness plan"
items[id,status,owner,evidence]:
R1 done build "reports/build.md"
R2 open qa "reports/tests.md"
R3 done security "reports/audit.md"
rel:
release.v1 validated_by R1
release.v1 blocked_by R2
release.v1 governed_by R3
Structured information closer to a document,
while still behaving like a contract.
This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples and syntax integrations.
|
CORE FORMAT Specification, parser, canonicalizer and CLI for the Semantic Data Interchange Format. |
MEASUREMENT Reproducible benchmark datasets and reports for comparing SDIF with existing formats. |
SYNTAX TOOLING Tree-sitter grammar foundation for syntax highlighting and editor integrations. |
Repository map
| Repository | Purpose |
|---|---|
sdif |
Core format, specification, parser, canonicalization and CLI |
sdif-benchmarks |
Benchmark datasets, reports and comparison tooling |
tree-sitter-sdif |
Grammar, syntax highlighting and editor integration foundation |
.github |
Organization profile, shared community files and public metadata |
Modern software workflows exchange more structured context than ever.
That context moves through APIs, files, prompts, agents, documentation systems, CI pipelines, benchmarks and human reviews. The usual formats all solve part of the problem, but none of them quite match this new middle ground.
|
JSON
Universal and reliable, but noisy when repeated records dominate. |
YAML
Readable, but often too permissive for deterministic workflows. |
CSV
Compact, but loses structure, relations and meaning very quickly. |
Markdown
Great for humans, but not enough when data must be parsed and verified. |
SDIF tries to sit in that gap.
Not as a replacement for every format, but as a focused layer for structured data that needs to remain compact, meaningful, reviewable and reproducible.
|
|
|
|
Repeated structure should not require repeated noise. SDIF aims to reduce unnecessary bytes and tokens while keeping documents understandable. |
A good SDIF file should be inspectable in a plain text editor. Reviewability is part of the format, not a side effect. |
|
Equivalent data should be able to produce deterministic bytes. That matters for hashing, signing, reproducibility and comparison. |
Data is more than rows and fields. SDIF treats relations, metadata, context and intent as first-class concerns. |
|
Token efficiency, stable structure and low ambiguity are core design goals, especially for agentic and LLM-assisted workflows. |
SDIF should be useful, testable and implementable. The format should not require heroics to parse or adopt. |
|
Specification
Stable v1.0 |
Python tooling
Parser, CLI, canonicalization and validation |
Distribution
Available on PyPI as sdif-format |
SDIF v1.0.0 is available as a public Python package:
pip install sdif-formatimport sdifThe current focus is now on adoption, documentation, conformance and ecosystem tooling:
- keep the v1.0 format contract stable;
- improve examples and documentation;
- expand conformance fixtures;
- publish reproducible benchmarks;
- improve editor and syntax tooling;
- gather feedback from real-world datasets and AI workflows.
We prefer evidence over claims. Benchmarks, golden files and reproducible examples are part of the product, not marketing decoration.
SDIF is not trying to replace JSON, YAML, CSV, Markdown, XML, Parquet or Protocol Buffers.
Those formats are useful and battle-tested.
SDIF focuses on a narrower problem:
compact, semantic, canonicalizable structured data
that can move cleanly between humans, machines and AI systems.
That focus is intentional.
We are still early, so the most valuable contributions are not only code.
|
|
Good criticism is welcome. Vague hype is less useful.
We want SDIF to be boring in the best possible way.
| Clear syntax | Small files | Stable output |
| Readable examples | Useful errors | Reproducible benchmarks |
A format should not require heroics to implement. If SDIF works, it should feel obvious after you use it.
The best place to follow the project is this GitHub organization.
Useful links:
- Core repository: https://github.com/sdif-format/sdif
- Python package: https://pypi.org/project/sdif-format/
- Documentation: https://sdif-format.github.io/
- Issues and feedback: https://github.com/sdif-format/sdif/issues
Constructive criticism, real datasets, benchmark ideas and parser feedback are especially welcome
