Deterministic CLI for converting structured data into compact, prompt-ready text for LLM workflows.
Quick Commands · CLI · Automation
It is intentionally narrow:
- input: JSON, YAML, TOML, CSV, TOON
- output: TOON, TSV, YAML, compact JSON
- routing: versioned heuristic profiles
- stats: byte-based size estimates, not exact tokenizer counts
The point is reproducible formatting for shell pipelines, not universal token optimality.
brew install mlnja/tap/llmfmtDownload the latest Debian package for your architecture and install it with dpkg:
VERSION=0.1.0
curl -fsSL -o /tmp/llmfmt_${VERSION}_amd64.deb \
https://github.com/mlnja/llmfmt/releases/download/v${VERSION}/llmfmt_${VERSION}_amd64.deb
sudo dpkg -i /tmp/llmfmt_${VERSION}_amd64.deb
llmfmt --versionVERSION=0.1.0
curl -fsSL -o /tmp/llmfmt_${VERSION}_arm64.deb \
https://github.com/mlnja/llmfmt/releases/download/v${VERSION}/llmfmt_${VERSION}_arm64.deb
sudo dpkg -i /tmp/llmfmt_${VERSION}_arm64.deb
llmfmt --versionGitHub READMEs do not support real tabs, so the examples below are grouped as separate command/output blocks.
llmfmt users.jsonid name role
1 alice admin
2 bob user
3 charlie user
llmfmt users.json --output-format toonusers[3]{id,name,role}:
1,alice,admin
2,bob,user
3,charlie,user
curl -s https://jsonplaceholder.typicode.com/users | llmfmt --wrap```yaml
- address:
city: Gwenborough
geo:
lat: '-37.3159'
lng: '81.1496'
...
```
curl -s https://jsonplaceholder.typicode.com/users | llmfmt --stats off- address:
city: Gwenborough
geo:
lat: '-37.3159'
lng: '81.1496'
...
Most structured data shown to LLMs is still dumped as JSON, even when that is verbose or awkward to scan. llmfmt sits in the middle of a pipeline and rewrites the same data into a format that is often smaller and easier for a model to read.
The product wedge is:
- deterministic output
- stable routing
- stdin/stdout-first CLI ergonomics
- a small native binary
llmfmt performs the same steps on every run:
- Read from stdin or a file.
- Detect the input format, unless
--input-formatis set. - Parse into
serde_json::Value. - Canonicalize object key ordering for deterministic behavior.
- Analyze the shape of the data.
- Select an output format through a frozen profile.
- Emit the rendered payload and a size estimate.
For the same input bytes, profile, and flags, the selected format and payload should be stable.
Supported inputs:
- JSON
- YAML
- TOML
- CSV
- TOON
Detection priority:
- JSON
- TOON
- CSV/TSV-style delimited text
- YAML
- TOML
If auto-detection fails, pass --input-format.
toonBest for uniform arrays of objects.tsvBest for flat dense tables with scalar cells only.yamlBest for moderately nested but still human-scannable structures.json-compactFallback for deep, irregular, or mixed structures.
csv is supported as input, but TSV is the preferred tabular output.
Routing is fully delegated to the active profile.
Current built-in profile:
2026041020260411latestresolves to20260411
Current routing intent:
- flat dense object arrays with few fields prefer TSV
- wider uniform object arrays prefer TOON
- moderate nesting prefers YAML
- deep or irregular structures fall back to compact JSON
Profile ids are emitted in stats output so behavior is diagnosable and reproducible.
The router currently works from this summary:
pub struct DataAnalysis {
pub depth: usize,
pub row_count: usize,
pub field_count: usize,
pub sparsity: f32,
pub uniformity: f32,
pub has_nested_arrays: bool,
pub is_uniform_object_array: bool,
pub is_flat_object_array: bool,
pub is_deeply_nested: bool,
}These fields are internal routing inputs, not a stable public API contract yet.
llmfmt [INPUT]
--input-format <json|yaml|toml|csv|toon>
--output-format <toon|tsv|yaml|json-compact>
--profile <latest|YYYYMMDD>
--wrap
--stats <text|json|off>
-o <file>
Defaults:
- profile:
latest - stats:
text
printf '[{"id":1,"name":"alice"},{"id":2,"name":"bob"}]' | llmfmtStdout:
id name
1 alice
2 bob
Stderr:
tsv | size 47B→22B (-53%) [auto|estimate:bytes|profile:20260410]
These examples assume llmfmt is already installed and available on PATH.
Convert a file with auto-routing:
llmfmt users.jsonForce TOON output from a JSON file:
llmfmt users.json --output-format toonForce compact JSON from TOON input:
llmfmt users.toon --output-format json-compact --stats offConvert YAML config data:
llmfmt config.yamlConvert CSV input to TSV:
llmfmt metrics.csv --output-format tsvWrite output to a file:
llmfmt users.json --output-format toon -o /tmp/users.toonWrap the selected output in a fenced block:
llmfmt users.json --wrapUse curl in a shell pipeline:
curl -s https://jsonplaceholder.typicode.com/users | llmfmt --wrapUse curl with a forced format and no stats:
curl -s https://jsonplaceholder.typicode.com/users \
| llmfmt --output-format toon --stats offWrapped output:
printf '{"users":[{"id":1,"name":"alice","role":"admin"}]}' \
| llmfmt --output-format toon --wrap```toon
users[1]{id,name,role}:
1,alice,admin
## Stats
Stats are byte-based estimates. They are useful for relative comparisons but should not be called token counts.
Text mode:
```text
tsv | size 47B→22B (-53%) [auto|estimate:bytes|profile:20260410]
JSON mode emits:
input_formatoutput_formatprofileforcedestimate_kindinput_bytesoutput_bytesdelta_percent
To suppress stats entirely:
llmfmt users.json --stats offOr suppress stderr at the shell level:
llmfmt users.json 2>/dev/nullTSV output is only allowed when:
- the top level is an array of objects, or a single-key object containing such an array
- every row is an object
- every field is scalar
Nested arrays and objects are rejected for TSV output.
TOON parsing and emission use the official toon-format crate.
This repo includes:
- GitHub Actions CI for
check,clippy,fmt, andtest - GitHub Actions release packaging for Linux and macOS tarballs
- a
just update-taphelper to update a Homebrew formula in a sibling tap repo
Release assets are expected under mlnja/llmfmt. The tap updater defaults to that repo and can still be overridden if needed.
- exact model-specific token counting
- field filtering or truncation
- runtime-loaded routing configs
- lossy abbreviation of keys or values
- replacing JSON in storage or APIs
cargo test