SDIF Format

Semantic Data Interchange Format

Compact, semantic and canonicalizable structured data
for AI agents, deterministic workflows and human-auditable records.

What is SDIF? · Ecosystem · Why it exists · Status · Contributing

pip install sdif-format

PyPI package · v1.0.0 release · Documentation

Compact

Less repeated structure.
Fewer wasted tokens.

Semantic

Tables, relations,
metadata and intent.

Canonical

Stable output for hashing,
signing and comparison.

Auditable

Designed to be read,
reviewed and trusted.

What is SDIF?

SDIF — Semantic Data Interchange Format is a compact, canonicalizable and AI-friendly data format for structured information that needs to move cleanly between humans, tools, agents and deterministic workflows.

It is designed for cases where data should be:

small enough to be efficient in AI context windows;
structured enough for machines;
readable enough for humans;
deterministic enough for hashing, signing and reproducible workflows;
semantic enough to express tables, relations, metadata and intent.

@sdif 1.0

kind Plan
id release.v1
title "Release readiness plan"

items[id,status,owner,evidence]:
  R1	done	build	"reports/build.md"
  R2	open	qa	"reports/tests.md"
  R3	done	security	"reports/audit.md"

rel:
  release.v1 validated_by R1
  release.v1 blocked_by R2
  release.v1 governed_by R3

Structured information closer to a document,
while still behaving like a contract.

Ecosystem

This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples and syntax integrations.

_{CORE FORMAT}

sdif

Specification, parser, canonicalizer and CLI for the Semantic Data Interchange Format.

Explore sdif →

_MEASUREMENT

sdif-benchmarks

Reproducible benchmark datasets and reports for comparing SDIF with existing formats.

View benchmarks →

_{SYNTAX TOOLING}

tree-sitter-sdif

Tree-sitter grammar foundation for syntax highlighting and editor integrations.

Open grammar →

Repository map

Repository	Purpose
`sdif`	Core format, specification, parser, canonicalization and CLI
`sdif-benchmarks`	Benchmark datasets, reports and comparison tooling
`tree-sitter-sdif`	Grammar, syntax highlighting and editor integration foundation
`.github`	Organization profile, shared community files and public metadata

Why it exists

Modern software workflows exchange more structured context than ever.

That context moves through APIs, files, prompts, agents, documentation systems, CI pipelines, benchmarks and human reviews. The usual formats all solve part of the problem, but none of them quite match this new middle ground.

JSON

Universal and reliable, but noisy when repeated records dominate.

YAML

Readable, but often too permissive for deterministic workflows.

CSV

Compact, but loses structure, relations and meaning very quickly.

Markdown

Great for humans, but not enough when data must be parsed and verified.

SDIF tries to sit in that gap.

Not as a replacement for every format, but as a focused layer for structured data that needs to remain compact, meaningful, reviewable and reproducible.

Designed for

AI workflows

Agent memory snapshots
Compact context payloads
AI-friendly summaries
Tool-to-tool exchange
Structured prompt artifacts

Engineering

Project plans
Roadmaps
Registries
Manifests
Technical specifications

Verification

Benchmark reports
Canonical records
Hashable datasets
Golden files
Comparison-friendly artifacts

Design principles

Compact by default Repeated structure should not require repeated noise. SDIF aims to reduce unnecessary bytes and tokens while keeping documents understandable.	Human-auditable A good SDIF file should be inspectable in a plain text editor. Reviewability is part of the format, not a side effect.
Canonicalizable Equivalent data should be able to produce deterministic bytes. That matters for hashing, signing, reproducibility and comparison.	Semantic Data is more than rows and fields. SDIF treats relations, metadata, context and intent as first-class concerns.
AI-friendly Token efficiency, stable structure and low ambiguity are core design goals, especially for agentic and LLM-assisted workflows.	Practical first SDIF should be useful, testable and implementable. The format should not require heroics to parse or adopt.

Status

Specification

Stable
v1.0

Python tooling

Parser, CLI,
canonicalization and validation

Distribution

Available on PyPI as
sdif-format

SDIF v1.0.0 is available as a public Python package:

pip install sdif-format

import sdif

The current focus is now on adoption, documentation, conformance and ecosystem tooling:

keep the v1.0 format contract stable;
improve examples and documentation;
expand conformance fixtures;
publish reproducible benchmarks;
improve editor and syntax tooling;
gather feedback from real-world datasets and AI workflows.

We prefer evidence over claims. Benchmarks, golden files and reproducible examples are part of the product, not marketing decoration.

What SDIF is not

SDIF is not trying to replace JSON, YAML, CSV, Markdown, XML, Parquet or Protocol Buffers.

Those formats are useful and battle-tested.

SDIF focuses on a narrower problem:

compact, semantic, canonicalizable structured data
that can move cleanly between humans, machines and AI systems.

That focus is intentional.

Contributing

We are still early, so the most valuable contributions are not only code.

Useful contributions

Test SDIF with real datasets
Find ambiguous syntax or edge cases
Compare SDIF against existing formats
Improve documentation and examples
Build small tools around the format

Especially welcome

Reproducible benchmarks
Golden files
Parser feedback
AI workflow experiments
Constructive criticism

Good criticism is welcome. Vague hype is less useful.

Project philosophy

We want SDIF to be boring in the best possible way.

Clear syntax	Small files	Stable output
Readable examples	Useful errors	Reproducible benchmarks

A format should not require heroics to implement. If SDIF works, it should feel obvious after you use it.

Contact

The best place to follow the project is this GitHub organization.

Useful links:

Core repository: https://github.com/sdif-format/sdif
Python package: https://pypi.org/project/sdif-format/
Documentation: https://sdif-format.github.io/
Issues and feedback: https://github.com/sdif-format/sdif/issues

Constructive criticism, real datasets, benchmark ideas and parser feedback are especially welcome

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDIF Format

What is SDIF?

Ecosystem

sdif

sdif-benchmarks

tree-sitter-sdif

Why it exists

Designed for

AI workflows

Engineering

Verification

Design principles

Compact by default

Human-auditable

Canonicalizable

Semantic

AI-friendly

Practical first

Status

What SDIF is not

Contributing

Useful contributions

Especially welcome

Project philosophy

Contact

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!