orate

Programmatic decoding over local LLM inference. The developer writes a generator; the model is an oracle consulted at marked yield points. Structured output, tool calls, and agent control flow collapse into one primitive: yield.

Pre-alpha. Built for Built with Opus 4.7: a Claude Code hackathon (Apr 21–26, 2026). Problem statement #2 — Build For What's Next.

The argument, in four acts

Act 1 — Schemas are a ceiling. A JSON schema can declare types. It cannot declare "a prime whose digits sum to 10" or "a word that is both a fruit and a color." These are value-level properties; types are the wrong vocabulary for them. See examples/act_01_schema_breaks.py.

Act 2 — Predicates move the bar. A where= predicate on a gen spec turns a value-level constraint into a first-class citizen. On reject the accept set tightens and the engine re-samples; reject_message injects a steering hint. Deterministic correctness, no dice. See examples/act_02_predicate_fixes.py.

Act 3 — Programs subsume tool-calling. A coroutine's yield is a decision point. So is a tool call. So is a sub-agent handoff. Why are these three different APIs today? They need not be:

@program
def turn():
    action = yield gen.choice(["attack", "speak"])
    if action == "attack":
        target = yield gen.choice(["dragon", "goblin", "ghost"])
        weakness = yield gen.tool(lookup_enemy_weakness, enemy=target)  # tool-as-yield
        attack = yield gen.struct(
            weapon=gen.choice(["sword", "bow", "staff"]),
            stance=gen.choice(["aggressive", "defensive"]),
        )
        damage = yield gen.integer(1, 10 + (5 if weakness != "none" else 0))
        return {"target": target, "weakness": weakness, "attack": attack, "damage": damage}
    line = yield gen.string(max_len=140)
    return {"line": line}

One @program. One KV cache. One engine. No separate tool-use API. See examples/act_03_unified_yield.py.

Act 4 — The model writes the program. For a new task, the model authors its own @program at runtime — a typed AST over a small DSL, grammar-constrained by the same machinery that filtered scalar values in Act 2. The program is verified against the task's demonstrations; mismatches tighten the program-level grammar and inject the diff as natural-language context. The same mechanism, one level up.

Applied to ARC-AGI-2: the model proposes a transformation rule (a Program over the grid DSL), orate verifies it on the training demonstrations, and Phase-C retry surfaces describe_mismatch(...) as a steering note. See examples/act_04_arc_sketch.py and src/orate/arc/solve.py.

Status snapshot

Kernel: @program decorator + generator runner; gen.choice / integer / string / boolean / struct / tool primitives with deterministic grammar tightening on where= reject; Phase-B context injection via reject_message; Phase-C whole-program retry via @program(whole_program_retries=N) + reject_program(msg).
Engine: XGrammarEngine running llama-cpp-python + XGrammar locally, with forced-token (jump-forward-decode) optimization and grammar-mask on every sample. Tested against /Users/maltelandgren/models/qwen2.5-*.gguf. MockEngine for offline tests.
ARC: data loader (task JSON + grid helpers), ASCII + PNG rendering (canonical 10-color palette), 14-primitive DSL executor, verifier with describe_mismatch (for Phase-C retry context), solve_task proposer that runs the full meta-programming loop.
Tests: 80 unit tests + 12 model-gated (tests/test_xgrammar_engine.py, tests/test_gen_against_local.py) passing.
Lines of code: ~2000 (src) + ~1200 (tests) + ~400 (examples).

Install

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"           # kernel + tests
.venv/bin/pip install -e ".[local,dev]"     # + llama-cpp-python + xgrammar (local engine)
.venv/bin/pip install -e ".[arc,dev]"       # + matplotlib (PNG rendering for ARC)

Quickstart

from orate import gen, program
from orate.engine.xgrammar import XGrammarEngine

@program
def two_digit_prime_with_digit_sum_10():
    n = yield gen.integer(
        10, 99,
        where=lambda v: is_prime(v) and digit_sum(v) == 10,
        reject_message=lambda v: f"{v} is not a prime whose digits sum to 10",
    )
    return n

engine = XGrammarEngine(model_path="/path/to/qwen2.5-0.5b-instruct-q4_k_m.gguf")
engine.prime("You answer with a single value matching the stated constraint.\n\nQ: a two-digit prime whose digits sum to 10\nA: ")

n = two_digit_prime_with_digit_sum_10().run(engine=engine)
# => 19 (or 37 or 73 — the three satisfying values in range)

For the ARC demo:

git clone --depth 1 https://github.com/arcprize/ARC-AGI-2 arc-data/ARC-AGI-2
.venv/bin/python examples/act_04_arc_sketch.py

Layout

src/orate/
  program.py          # @program decorator + runner (Phase-C retry)
  gen.py              # gen.choice/integer/string/boolean/struct/tool + tightening
  engine/
    protocol.py       # Engine Protocol + optional capabilities
    mock.py           # MockEngine — random sampler, seeded
    xgrammar.py       # XGrammarEngine — local grammar-constrained decoding
  arc/
    data.py           # ArcTask + task JSON loader
    render.py         # grid_to_ascii / save_grid_png (matplotlib, 10-color palette)
    dsl.py            # 14 transformation primitives + Program AST + execute
    verify.py         # verify_on_train + describe_mismatch (Phase-B retry context)
    solve.py          # solve_task — the meta-programming proposer (Act 4)

examples/
  act_01_schema_breaks.py
  act_02_predicate_fixes.py
  act_03_unified_yield.py
  act_04_arc_sketch.py
  smoke_local.py      # runs against real Qwen2.5 locally

tests/                  # 80 unit + 12 model-gated

Design stance

Determinism by default. The engine uses argmax over grammar-masked logits. Stochastic sampling is a future explicit opt-in, not a hidden correctness mechanism.
Grammar is the guarantee; the model is the proposer. where= predicates never silently drop constraints — on exhaustion we raise GrammarExhausted, we do not return a wrong value.
Engine-agnostic authoring layer. Every example ran against MockEngine before a real model touched it. Swapping in XGrammarEngine or (future) an API-backed engine changes the proposer's quality, not the program's correctness.
Local first. The library's truest form constrains inference at the logit level — something only a controlled inference stack exposes. API fallbacks are structural (JSON mode + retry), not fundamental.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
src/orate		src/orate
tests		tests
.env.example		.env.example
.gitignore		.gitignore
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

orate

The argument, in four acts

Status snapshot

Install

Quickstart

Layout

Design stance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

orate

The argument, in four acts

Status snapshot

Install

Quickstart

Layout

Design stance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages