llmfmt

Deterministic CLI for converting structured data into compact, prompt-ready text for LLM workflows.

It is intentionally narrow:

input: JSON, YAML, TOML, CSV, TOON
output: TOON, TSV, YAML, compact JSON
routing: versioned heuristic profiles
stats: byte-based size estimates, not exact tokenizer counts

The point is reproducible formatting for shell pipelines, not universal token optimality.

Installation

Homebrew

brew install mlnja/tap/llmfmt

Linux

Download the latest Debian package for your architecture and install it with dpkg:

Linux x86_64

VERSION=0.1.0
curl -fsSL -o /tmp/llmfmt_${VERSION}_amd64.deb \
  https://github.com/mlnja/llmfmt/releases/download/v${VERSION}/llmfmt_${VERSION}_amd64.deb
sudo dpkg -i /tmp/llmfmt_${VERSION}_amd64.deb
llmfmt --version

Linux ARM64

VERSION=0.1.0
curl -fsSL -o /tmp/llmfmt_${VERSION}_arm64.deb \
  https://github.com/mlnja/llmfmt/releases/download/v${VERSION}/llmfmt_${VERSION}_arm64.deb
sudo dpkg -i /tmp/llmfmt_${VERSION}_arm64.deb
llmfmt --version

Quickstart

GitHub READMEs do not support real tabs, so the examples below are grouped as separate command/output blocks.

Auto-route a local JSON file

llmfmt users.json

id	name	role
1	alice	admin
2	bob	user
3	charlie	user

Force TOON output

llmfmt users.json --output-format toon

users[3]{id,name,role}:
  1,alice,admin
  2,bob,user
  3,charlie,user

Wrap API output for direct prompt use

curl -s https://jsonplaceholder.typicode.com/users | llmfmt --wrap

```yaml
- address:
    city: Gwenborough
    geo:
      lat: '-37.3159'
      lng: '81.1496'
...
```

Run quietly in scripts

curl -s https://jsonplaceholder.typicode.com/users | llmfmt --stats off

- address:
    city: Gwenborough
    geo:
      lat: '-37.3159'
      lng: '81.1496'
...

Why this exists

Most structured data shown to LLMs is still dumped as JSON, even when that is verbose or awkward to scan. llmfmt sits in the middle of a pipeline and rewrites the same data into a format that is often smaller and easier for a model to read.

The product wedge is:

deterministic output
stable routing
stdin/stdout-first CLI ergonomics
a small native binary

Conversion model

llmfmt performs the same steps on every run:

Read from stdin or a file.
Detect the input format, unless --input-format is set.
Parse into serde_json::Value.
Canonicalize object key ordering for deterministic behavior.
Analyze the shape of the data.
Select an output format through a frozen profile.
Emit the rendered payload and a size estimate.

For the same input bytes, profile, and flags, the selected format and payload should be stable.

Input formats

Supported inputs:

JSON
YAML
TOML
CSV
TOON

Detection priority:

JSON
TOON
CSV/TSV-style delimited text
YAML
TOML

If auto-detection fails, pass --input-format.

Output formats

toon Best for uniform arrays of objects.
tsv Best for flat dense tables with scalar cells only.
yaml Best for moderately nested but still human-scannable structures.
json-compact Fallback for deep, irregular, or mixed structures.

csv is supported as input, but TSV is the preferred tabular output.

Routing

Routing is fully delegated to the active profile.

Current built-in profile:

20260410
20260411
latest resolves to 20260411

Current routing intent:

flat dense object arrays with few fields prefer TSV
wider uniform object arrays prefer TOON
moderate nesting prefers YAML
deep or irregular structures fall back to compact JSON

Profile ids are emitted in stats output so behavior is diagnosable and reproducible.

Data analysis

The router currently works from this summary:

pub struct DataAnalysis {
    pub depth: usize,
    pub row_count: usize,
    pub field_count: usize,
    pub sparsity: f32,
    pub uniformity: f32,
    pub has_nested_arrays: bool,
    pub is_uniform_object_array: bool,
    pub is_flat_object_array: bool,
    pub is_deeply_nested: bool,
}

These fields are internal routing inputs, not a stable public API contract yet.

CLI

llmfmt [INPUT]
  --input-format <json|yaml|toml|csv|toon>
  --output-format <toon|tsv|yaml|json-compact>
  --profile <latest|YYYYMMDD>
  --wrap
  --stats <text|json|off>
  -o <file>

Defaults:

profile: latest
stats: text

Example

printf '[{"id":1,"name":"alice"},{"id":2,"name":"bob"}]' | llmfmt

Stdout:

id	name
1	alice
2	bob

Stderr:

tsv | size 47B→22B (-53%) [auto|estimate:bytes|profile:20260410]

Quick commands

These examples assume llmfmt is already installed and available on PATH.

Convert a file with auto-routing:

llmfmt users.json

Force TOON output from a JSON file:

llmfmt users.json --output-format toon

Force compact JSON from TOON input:

llmfmt users.toon --output-format json-compact --stats off

Convert YAML config data:

llmfmt config.yaml

Convert CSV input to TSV:

llmfmt metrics.csv --output-format tsv

Write output to a file:

llmfmt users.json --output-format toon -o /tmp/users.toon

Wrap the selected output in a fenced block:

llmfmt users.json --wrap

Use curl in a shell pipeline:

curl -s https://jsonplaceholder.typicode.com/users | llmfmt --wrap

Use curl with a forced format and no stats:

curl -s https://jsonplaceholder.typicode.com/users \
  | llmfmt --output-format toon --stats off

Wrapped output:

printf '{"users":[{"id":1,"name":"alice","role":"admin"}]}' \
  | llmfmt --output-format toon --wrap

```toon
users[1]{id,name,role}:
  1,alice,admin


## Stats

Stats are byte-based estimates. They are useful for relative comparisons but should not be called token counts.

Text mode:

```text
tsv | size 47B→22B (-53%) [auto|estimate:bytes|profile:20260410]

JSON mode emits:

input_format
output_format
profile
forced
estimate_kind
input_bytes
output_bytes
delta_percent

To suppress stats entirely:

llmfmt users.json --stats off

Or suppress stderr at the shell level:

llmfmt users.json 2>/dev/null

Validation

TSV output is only allowed when:

the top level is an array of objects, or a single-key object containing such an array
every row is an object
every field is scalar

Nested arrays and objects are rejected for TSV output.

TOON parsing and emission use the official toon-format crate.

Automation

This repo includes:

GitHub Actions CI for check, clippy, fmt, and test
GitHub Actions release packaging for Linux and macOS tarballs
a just update-tap helper to update a Homebrew formula in a sibling tap repo

Release assets are expected under mlnja/llmfmt. The tap updater defaults to that repo and can still be overridden if needed.

Non-goals

exact model-specific token counting
field filtering or truncation
runtime-loaded routing configs
lossy abbreviation of keys or values
replacing JSON in storage or APIs

Development

cargo test

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmfmt

Installation

Homebrew

Linux

Linux x86_64

Linux ARM64

Quickstart

Auto-route a local JSON file

Force TOON output

Wrap API output for direct prompt use

Run quietly in scripts

Why this exists

Conversion model

Input formats

Output formats

Routing

Data analysis

CLI

Example

Quick commands

Validation

Automation

Non-goals

Development

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmfmt

Installation

Homebrew

Linux

Linux x86_64

Linux ARM64

Quickstart

Auto-route a local JSON file

Force TOON output

Wrap API output for direct prompt use

Run quietly in scripts

Why this exists

Conversion model

Input formats

Output formats

Routing

Data analysis

CLI

Example

Quick commands

Validation

Automation

Non-goals

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages