Skip to content
@sdif-format

SDIF Format

Semantic Data Interchange Format. A compact, canonicalizable, AI-friendly data format for structured, auditable machine workflows.

SDIF Format

Semantic Data Interchange Format

Compact, semantic and canonicalizable structured data
for AI agents, deterministic workflows and human-auditable records.

What is SDIF? · Ecosystem · Why it exists · Status · Contributing

PyPI Python versions Status Canonicalizable Open tooling


pip install sdif-format

PyPI package · v1.0.0 release · Documentation


Compact

Less repeated structure.
Fewer wasted tokens.
Semantic

Tables, relations,
metadata and intent.
Canonical

Stable output for hashing,
signing and comparison.
Auditable

Designed to be read,
reviewed and trusted.


What is SDIF?

SDIF — Semantic Data Interchange Format is a compact, canonicalizable and AI-friendly data format for structured information that needs to move cleanly between humans, tools, agents and deterministic workflows.

It is designed for cases where data should be:

  • small enough to be efficient in AI context windows;
  • structured enough for machines;
  • readable enough for humans;
  • deterministic enough for hashing, signing and reproducible workflows;
  • semantic enough to express tables, relations, metadata and intent.

@sdif 1.0

kind Plan
id release.v1
title "Release readiness plan"

items[id,status,owner,evidence]:
  R1	done	build	"reports/build.md"
  R2	open	qa	"reports/tests.md"
  R3	done	security	"reports/audit.md"

rel:
  release.v1 validated_by R1
  release.v1 blocked_by R2
  release.v1 governed_by R3

Structured information closer to a document,
while still behaving like a contract.



Ecosystem

This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples and syntax integrations.

CORE FORMAT

sdif

Specification, parser, canonicalizer and CLI for the Semantic Data Interchange Format.

Explore sdif →

MEASUREMENT

sdif-benchmarks

Reproducible benchmark datasets and reports for comparing SDIF with existing formats.

View benchmarks →

SYNTAX TOOLING

tree-sitter-sdif

Tree-sitter grammar foundation for syntax highlighting and editor integrations.

Open grammar →


Repository map
Repository Purpose
sdif Core format, specification, parser, canonicalization and CLI
sdif-benchmarks Benchmark datasets, reports and comparison tooling
tree-sitter-sdif Grammar, syntax highlighting and editor integration foundation
.github Organization profile, shared community files and public metadata


Why it exists

Modern software workflows exchange more structured context than ever.

That context moves through APIs, files, prompts, agents, documentation systems, CI pipelines, benchmarks and human reviews. The usual formats all solve part of the problem, but none of them quite match this new middle ground.

JSON

Universal and reliable, but noisy when repeated records dominate.
YAML

Readable, but often too permissive for deterministic workflows.
CSV

Compact, but loses structure, relations and meaning very quickly.
Markdown

Great for humans, but not enough when data must be parsed and verified.

SDIF tries to sit in that gap.

Not as a replacement for every format, but as a focused layer for structured data that needs to remain compact, meaningful, reviewable and reproducible.



Designed for

AI workflows

  • Agent memory snapshots
  • Compact context payloads
  • AI-friendly summaries
  • Tool-to-tool exchange
  • Structured prompt artifacts

Engineering

  • Project plans
  • Roadmaps
  • Registries
  • Manifests
  • Technical specifications

Verification

  • Benchmark reports
  • Canonical records
  • Hashable datasets
  • Golden files
  • Comparison-friendly artifacts


Design principles

Compact by default

Repeated structure should not require repeated noise. SDIF aims to reduce unnecessary bytes and tokens while keeping documents understandable.

Human-auditable

A good SDIF file should be inspectable in a plain text editor. Reviewability is part of the format, not a side effect.

Canonicalizable

Equivalent data should be able to produce deterministic bytes. That matters for hashing, signing, reproducibility and comparison.

Semantic

Data is more than rows and fields. SDIF treats relations, metadata, context and intent as first-class concerns.

AI-friendly

Token efficiency, stable structure and low ambiguity are core design goals, especially for agentic and LLM-assisted workflows.

Practical first

SDIF should be useful, testable and implementable. The format should not require heroics to parse or adopt.



Status

Specification

Stable
v1.0
Python tooling

Parser, CLI,
canonicalization and validation
Distribution

Available on PyPI as
sdif-format

SDIF v1.0.0 is available as a public Python package:

pip install sdif-format
import sdif

The current focus is now on adoption, documentation, conformance and ecosystem tooling:

  • keep the v1.0 format contract stable;
  • improve examples and documentation;
  • expand conformance fixtures;
  • publish reproducible benchmarks;
  • improve editor and syntax tooling;
  • gather feedback from real-world datasets and AI workflows.

We prefer evidence over claims. Benchmarks, golden files and reproducible examples are part of the product, not marketing decoration.



What SDIF is not

SDIF is not trying to replace JSON, YAML, CSV, Markdown, XML, Parquet or Protocol Buffers.

Those formats are useful and battle-tested.

SDIF focuses on a narrower problem:

compact, semantic, canonicalizable structured data
that can move cleanly between humans, machines and AI systems.

That focus is intentional.



Contributing

We are still early, so the most valuable contributions are not only code.

Useful contributions

  • Test SDIF with real datasets
  • Find ambiguous syntax or edge cases
  • Compare SDIF against existing formats
  • Improve documentation and examples
  • Build small tools around the format

Especially welcome

  • Reproducible benchmarks
  • Golden files
  • Parser feedback
  • AI workflow experiments
  • Constructive criticism

Good criticism is welcome. Vague hype is less useful.



Project philosophy

SDIF symbol

We want SDIF to be boring in the best possible way.

Clear syntax Small files Stable output
Readable examples Useful errors Reproducible benchmarks

A format should not require heroics to implement. If SDIF works, it should feel obvious after you use it.



Contact

The best place to follow the project is this GitHub organization.

Useful links:

Constructive criticism, real datasets, benchmark ideas and parser feedback are especially welcome

Pinned Loading

  1. sdif sdif Public

    Semantic Data Interchange Format. Compact, canonicalizable structured data for AI and deterministic workflows.

    Python 2

Repositories

Showing 5 of 5 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…