Skip to content

jakeliu/mllang

MLLANG — Markdown Language for AI Agents

PyPI License: Apache 2.0 Spec: v0.1 locked Vendors: 9+ CI

Compact text-surface protocol for AI-agent state. Lives inside markdown fenced blocks. Cross-vendor ratified across 6+ LLM families.


What it solves

Multi-agent workflows need to pass state between LLM turns. JSON-RPC is verbose. Plain prose drifts. Every team invents their own slot syntax. MLLANG = a small shared convention that fits inside markdown fenced blocks, 50–70% less tokens than JSON-RPC for the same state.


30-second example

# Refactor auth module

Workflow: switch sessions → JWT, gate on test=pass, rollback ready.

```mllang
V:0.1.r1; I:auth-001; G:{task=refactor_auth, from=sessions, to=jwt};
S:{users_db=preserve, api_compat=hard}; D:[migration_plan, rollback_path];
R:[downtime, token_leak]; N:@K -> implement; H:test=pass; P:0.85;
EN: Refactor auth sessions → JWT, preserve DB + API compat.
```

One-line workflow summary above the fenced block, full state inside the packet, EN: line for human skim. No content duplication. Agents parse the block; humans glance at the summary. Long prose is opt-in (mode="verbose") for human-authored docs.

See docs/markdown-embedded.md for the full pattern.


Quick start

from mllang import Packet, embed_in_markdown, extract_summary_and_packet

# Build a packet
p = Packet(
    version="0.1.r1",
    thread_id="auth-001",
    goal={"task": "refactor_auth", "to": "jwt"},
    state={"users_db": "preserve"},
    next_agent="@K -> implement",
    halt="test=pass",
    confidence=0.85,
    en_shadow="Refactor auth sessions → JWT, preserve DB.",
)

# Embed in markdown (summary mode = default, tight)
md = embed_in_markdown(
    p,
    title="Refactor auth module",
    summary="Workflow: switch sessions → JWT, gate on test=pass.",
)

# Pull both channels back out
summary, packet = extract_summary_and_packet(md)
print(summary)              # the workflow summary
print(packet.next_agent)    # @K -> implement
print(packet.halt)          # test=pass

Three modes: summary (default, agent-loaded files), verbose (long prose for human-authored docs), packet_only (pure agent pipelines).


Where it fits

MLLANG composes with existing standards rather than replacing them:

Layer Standard Role
Doc / agent rules AGENTS.md, llms.txt, Cursor rules host markdown containing MLLANG blocks
Tool calls MCP (Model Context Protocol) carry MLLANG packets as tool payload
Agent transport A2A, ACP carry MLLANG packets as message body
Runtime state MLLANG THIS LAYER — slot grammar, halt enum, confidence

MLLANG sits inside MCP/A2A as payload, inside markdown files as runtime state. Doesn't compete with them.

Relationship to RecursiveMAS

RecursiveMAS (arXiv 2604.25917) solves multi-agent collaboration by exchanging latent vectors between recursive agent rounds — elegant when you control the model weights, but it requires direct activation access (Qwen / Llama / Gemma / DeepSeek self-hosted) and cannot drive closed APIs like Claude / GPT / Gemini.

MLLANG is the text-surface approximation of the same goal. Loses the latent-recursion benefit. Gains universal compatibility — runs everywhere RecursiveMAS can't (closed APIs, mixed-vendor loops, paste-based workflows). A team running RecursiveMAS internally can still emit MLLANG packets as the text log of their latent rounds. Orthogonal layers; not competing.


Why MLLANG vs alternatives

JSON-RPC YAML frontmatter LangGraph state MLLANG
Format JSON-RPC envelope YAML Python dict Compact ASCII
Single-line No No No Yes
Token cost High High High 50–70% lower
Embed in markdown No header only No YES (fenced block)
Human readable Verbose Yes No 5-min learn
Confidence built-in No No No P: slot
Halt enum No No No 9-way native
Multi-agent handoff Manual Manual Framework-locked N: slot

Cross-vendor ratification

MLLANG v0.1 was tested across these LLM families before locking:

Family Model Status
Anthropic Claude Sonnet 4.6 / Opus 4.7 ✅ ratified
OpenAI GPT-5.5 Thinking, Codex (gpt-5.5) ✅ ratified
Google Gemini Pro web, Gemini Flash CLI ✅ ratified
Google (local) Gemma-4 26B (AIR backend) ✅ ratified
Google (local) Gemma-4 4B (local, ~4B params) ✅ ratified — small-model compactness test PASS
DeepSeek DeepSeek V3 ✅ ratified (P:0.92, 1 round)
Moonshot Kimi K2.6 ✅ ratified (P:0.88, 1 round)
Alibaba Qwen 3.6 ✅ ratified (P:0.85, 1 round)

Mean ratification confidence: 0.93 (0.95 original 6 vendors + 0.883 Chinese 3-vendor pass). Spec locked 2026-05-19.

Audit trail: examples/05_negotiation_trace.jsonl (original 6), examples/ratification_chinese_2026-05-20.jsonl (Chinese pass).

The Chinese-family pass converged in 1 round per model (vs 3 rounds for the original spec-locking pass) because the spec was already locked — each backend only needed to acknowledge and emit a valid packet, not negotiate the grammar.


Repo structure

mllang/
├── spec/                  Spec (v0.1 locked + v0.2 draft)
├── parser/                Python reference parser (pure stdlib)
├── examples/              5 generic packet examples + markdown-embedded demo
├── bootstrap/             Role prompts (orchestrator, critic, implementer, synthesizer)
├── conformance/           Test suite — any parser implementation can score against it
├── codex-cli-integration/ Conservative Codex CLI session parser + report wrapper
├── docs/                  GitHub Pages site
├── rfc/                   Quarterly RFC proposals
└── .github/workflows/     CI: conformance test on PR

Install

pip install mllang-protocol

PyPI distribution name is mllang-protocol (the bare mllang name was already held by an unrelated 2021 ML library on PyPI). The Python import path is still mllang:

from mllang import Packet, parse, embed_in_markdown, sanitize

Editable install from the repo also works:

git clone https://github.com/jakeliu/mllang.git
cd mllang
pip install -e parser/

Documentation


Contributing

See CONTRIBUTING.md. Quarterly RFC review windows. Multi-vendor test required for any spec change.


Sister package — shim-engine

Same repo, separate PyPI listing. Local-first observability for LLM-agent loops — token / latency / cost / outcome distribution in a JSONL log, aggregated by a small CLI. Works with any model. Optional MLLANG awareness for 5x more signal.

pip install shim-engine               # standalone (any LLM)
pip install 'shim-engine[mllang]'     # auto-extract halt / confidence / agent code from MLLANG responses
pip install 'mllang-protocol[shim]'  # same as above, reverse install order

Killer demo:

$ shim-engine-report calls.jsonl
200 packets logged
mean token reduction:  56.8% vs JSON-RPC equivalent  (200/200 packets MLLANG-tagged)
estimated tokens saved: 9,536
p50 latency: 4.32s    p99: 7.92s
mean P: 0.81
halt distribution:     test=pass 62% | <=> 18% | accept 9% | escalate@H 7% | risk!high 4%
model distribution:    gpt-5.5-thinking 42% | claude-opus 32% | gemini-pro 13% | gemma-26b 10% | deepseek-v3 4%
agent distribution:    @G 23% | @X 21% | @K 20% | @M 19% | @C 17%

Full guide: docs/shim.md and shim/README.md.


Claude Code skill — /mllang

One-keystroke MLLANG inside Claude Code (Anthropic's official CLI). The skill loads the bootstrap, instruments sub-agent (Agent tool) calls via a PostToolUse hook, and lets you check live token-savings at any time.

Install:

curl -sL https://raw.githubusercontent.com/jakeliu/mllang/main/claude-code-skill/install.sh | bash
pip install 'shim-engine[mllang]'    # optional, enables /mllang report

Then in Claude Code:

/mllang             # load bootstrap, ratify session
/mllang load critic # adopt the critic role's conventions
/mllang report      # token-savings report from sub-agent calls this session
/mllang status      # one-liner summary
/mllang spec        # 5-line MLLANG v0.1 summary

The sub-agent log is privacy-redacted (slot SHAPES only, no slot values) and lives at ~/.claude/mllang-shim/session.jsonl. Full guide: claude-code-skill/README.md.


Codex CLI integration

Conservative Path 1 for Codex CLI. No hooks are installed. The integration parses Codex's persisted session JSONL under ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl, records one shim-engine row per completed turn, and reports real tokens in/out from Codex token_count events.

Install:

bash codex-cli-integration/install.sh
pip install 'shim-engine[mllang]'    # optional, enables reports

Use:

python3 ~/.codex/mllang-integration/scripts/codex_session_parser.py --latest
bash ~/.codex/mllang-integration/scripts/codex-mllang-report.sh

Budget source is tagged honestly: B-slot when the final MLLANG packet self-reports B:{tokens_in=..., tokens_out=..., time=...}, otherwise session-token_count, otherwise none. Capture boundary is top-level Codex turn, not sub-agent. Full guide: codex-cli-integration/README.md.


MCP server (Claude Desktop / Cline / Zed)

pip install 'mllang-protocol[mcp]'
mllang-mcp-server

Or drop into claude_desktop_config.json:

{
  "mcpServers": {
    "mllang": { "command": "mllang-mcp-server" }
  }
}

Exposes 7 tools (mllang_parse / mllang_compose / mllang_validate / mllang_embed_in_markdown / mllang_extract_summary / mllang_sanitize / mllang_spec). End-to-end test in conformance/test_mcp_server.py drives every tool over real MCP stdio. Full guide: docs/mcp.md.


Telemetry & privacy

MLLANG ships with a built-in sanitize() function. Telemetry is opt-in only, off by default, and the library never auto-enables it.

# default — nothing sent
export MLLANG_TELEMETRY=off

# opt-in levels:
export MLLANG_TELEMETRY=shape        # slot presence + halt + confidence (recommended)
export MLLANG_TELEMETRY=structured   # add map keys + verb names + counts
export MLLANG_TELEMETRY=full         # add redacted values (research consent only)
from mllang import sanitize, sanitize_to_json

payload = sanitize(packet)         # None unless env var set
line = sanitize_to_json(packet)    # None or one-line JSON

Slot shapes are public; slot values stay private. Thread ids are hashed (<I:hash:<sha256_12>>), file paths and tool args are never logged, and a leak-detector refuses payloads that still contain emails / paths / API-key patterns / long quoted strings.

Full slot-by-slot rules and before/after examples: docs/telemetry.md. Disclosure policy and IP-leak bug bounty: .github/SECURITY.md.


Status

  • v0.1 spec: LOCKED 2026-05-19
  • v0.2 spec: draft, RFC window open
  • Parser: pure Python 3, no external deps
  • Conformance: 160-packet test suite (parse / halt / roundtrip / markdown extract / sanitize / embed)
  • Telemetry: opt-in sanitize() with 4 levels + leak-detector defense

License

Apache 2.0. See LICENSE.


Citation

If you use MLLANG in research or production, please cite:

MLLANG: Markdown Language for AI Agents (2026)
github.com/jakeliu/mllang

About

Compact text-surface protocol for AI-agent state. Lives in markdown fenced blocks. Cross-vendor ratified.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors