Skip to content

justi/ruby_llm-contract

Repository files navigation

ruby_llm-contract

Contracts for LLM calls. Validate every response, retry with smarter models, catch bad answers before production.

Companion gem for ruby_llm.

The problem

response = RubyLLM.chat(model: "gpt-4.1-mini").ask(prompt)
parsed = JSON.parse(response.content)  # crashes when LLM returns prose
priority = parsed["priority"]          # "urgent"? "CRITICAL"? nil?

JSON parsing crashes. Wrong values slip through. You switch models and quality drops silently.

The fix

Same prompt, wrapped in a contract:

class ClassifyTicket < RubyLLM::Contract::Step::Base
  prompt <<~PROMPT
    Classify this support ticket by priority.
    Return JSON with a "priority" field.

    {input}
  PROMPT

  validate("valid priority") { |o| %w[low medium high urgent].include?(o[:priority]) }
  retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
end

result = ClassifyTicket.run(ticket_text)
result.ok?               # => true
result.parsed_output     # => {priority: "high"}
result.trace[:attempts]  # => [{model: "gpt-4.1-nano", status: :ok}]

Bad JSON? :parse_error. Wrong value? :validation_failed and auto-retry on a smarter model. Network timeout? Auto-retry. All with cost tracking.

{input} is a gem placeholder (not Ruby #{}). Replaced at runtime with the value you pass to run().

Install

gem "ruby_llm-contract"
RubyLLM.configure { |c| c.openai_api_key = ENV["OPENAI_API_KEY"] }
RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }

Works with any ruby_llm provider (OpenAI, Anthropic, Gemini, etc).

What you get

  • Validated responsesvalidate blocks catch wrong answers; output_schema enforces JSON structure via provider AND client-side
  • Model escalationretry_policy models: %w[nano mini full] starts cheap, auto-escalates when contract fails. 90% of requests succeed on nano. ~$40/mo instead of ~$200 at 10k requests.
  • Cost controlmax_input, max_cost refuse before calling the LLM. Zero tokens spent on oversized input.
  • Eval in CIexpect(MyStep).to pass_eval("smoke") verifies your contract offline, zero API calls. No other Ruby gem does this.
  • Defensive parsing — code fences, BOM, prose wrapping, null responses — 14 edge cases handled
  • Pipeline — chain steps with fail-fast. Hallucination in step 1 stops before step 2 runs.
  • TestingRubyLLM::Contract::Adapters::Test for deterministic specs, satisfy_contract RSpec matcher

Gotchas

output_schema vs with_schema: with_schema asks the provider to return specific JSON. output_schema does the same (calls with_schema under the hood) plus validates client-side. Cheap models sometimes ignore schema — output_schema catches that.

Nested schema needs object do...end:

# WRONG — array of strings:
array :groups do; string :who; end

# RIGHT — array of objects:
array :groups do; object do; string :who; end; end

Schema validates shape, not meaning. LLM returns {"priority": "low"} for a data loss incident — valid JSON, wrong answer. Always add validate blocks.

Docs

Guide
Getting Started Features walkthrough, model escalation, eval, structured/dynamic prompts
Best Practices 6 patterns for bulletproof validates
Output Schema Full schema reference + constraints
Pipeline Multi-step composition, timeout, fail-fast
Testing Test adapter, RSpec matchers
Prompt AST Node types, interpolation
Architecture Module diagram

Roadmap

v0.2 — eval that matters:

  • Dataset eval with add_case input:, expected: (partial matching)
  • Online eval — real LLM calls, compare output vs expected
  • CI gate — pass_eval("regression").with_minimum_score(0.8)
  • Model comparison — same dataset on nano vs mini vs full

v0.3:

  • Regression baselines — compare eval results with previous run
  • Eval persistence — store history for drift detection

v0.4:

  • Auto-routing — learn which model works for which input patterns
  • Contract-level dashboard

License

MIT

About

Contract-first LLM step execution for RubyLLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages