Contracts for LLM calls. Validate every response, retry with smarter models, catch bad answers before production.
Companion gem for ruby_llm.
response = RubyLLM.chat(model: "gpt-4.1-mini").ask(prompt)
parsed = JSON.parse(response.content) # crashes when LLM returns prose
priority = parsed["priority"] # "urgent"? "CRITICAL"? nil?JSON parsing crashes. Wrong values slip through. You switch models and quality drops silently.
Same prompt, wrapped in a contract:
class ClassifyTicket < RubyLLM::Contract::Step::Base
prompt <<~PROMPT
Classify this support ticket by priority.
Return JSON with a "priority" field.
{input}
PROMPT
validate("valid priority") { |o| %w[low medium high urgent].include?(o[:priority]) }
retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
end
result = ClassifyTicket.run(ticket_text)
result.ok? # => true
result.parsed_output # => {priority: "high"}
result.trace[:attempts] # => [{model: "gpt-4.1-nano", status: :ok}]Bad JSON? :parse_error. Wrong value? :validation_failed and auto-retry on a smarter model. Network timeout? Auto-retry. All with cost tracking.
{input}is a gem placeholder (not Ruby#{}). Replaced at runtime with the value you pass torun().
gem "ruby_llm-contract"RubyLLM.configure { |c| c.openai_api_key = ENV["OPENAI_API_KEY"] }
RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }Works with any ruby_llm provider (OpenAI, Anthropic, Gemini, etc).
- Validated responses —
validateblocks catch wrong answers;output_schemaenforces JSON structure via provider AND client-side - Model escalation —
retry_policy models: %w[nano mini full]starts cheap, auto-escalates when contract fails. 90% of requests succeed on nano. ~$40/mo instead of ~$200 at 10k requests. - Cost control —
max_input,max_costrefuse before calling the LLM. Zero tokens spent on oversized input. - Eval in CI —
expect(MyStep).to pass_eval("smoke")verifies your contract offline, zero API calls. No other Ruby gem does this. - Defensive parsing — code fences, BOM, prose wrapping,
nullresponses — 14 edge cases handled - Pipeline — chain steps with fail-fast. Hallucination in step 1 stops before step 2 runs.
- Testing —
RubyLLM::Contract::Adapters::Testfor deterministic specs,satisfy_contractRSpec matcher
output_schema vs with_schema: with_schema asks the provider to return specific JSON. output_schema does the same (calls with_schema under the hood) plus validates client-side. Cheap models sometimes ignore schema — output_schema catches that.
Nested schema needs object do...end:
# WRONG — array of strings:
array :groups do; string :who; end
# RIGHT — array of objects:
array :groups do; object do; string :who; end; endSchema validates shape, not meaning. LLM returns {"priority": "low"} for a data loss incident — valid JSON, wrong answer. Always add validate blocks.
| Guide | |
|---|---|
| Getting Started | Features walkthrough, model escalation, eval, structured/dynamic prompts |
| Best Practices | 6 patterns for bulletproof validates |
| Output Schema | Full schema reference + constraints |
| Pipeline | Multi-step composition, timeout, fail-fast |
| Testing | Test adapter, RSpec matchers |
| Prompt AST | Node types, interpolation |
| Architecture | Module diagram |
v0.2 — eval that matters:
- Dataset eval with
add_case input:, expected:(partial matching) - Online eval — real LLM calls, compare output vs expected
- CI gate —
pass_eval("regression").with_minimum_score(0.8) - Model comparison — same dataset on nano vs mini vs full
v0.3:
- Regression baselines — compare eval results with previous run
- Eval persistence — store history for drift detection
v0.4:
- Auto-routing — learn which model works for which input patterns
- Contract-level dashboard
MIT