examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo by justi · Pull Request #22 · justi/ruby_llm-contract

justi · 2026-04-23T00:36:25Z

Adds the lowest-friction entry point for evaluating the gem: a runnable showcase that prints the fallback loop end-to-end, with zero API keys.

What it does

ruby examples/11_fallback_showcase.rb runs two contrasting steps in sequence:

Part A — without the contract's validate("not a refusal")
The Test adapter returns in-schema refusal JSON ({"tldr": "I cannot help..."}). Schema passes. The refusal is what parsed_output hands back — demonstrating exactly what happens in production without a contract.

Part B — with the full SummarizeArticle + retry_policy
Test adapter returns [REFUSAL_RESPONSE, GOOD_RESPONSE]. Attempt 1 (gpt-4.1-nano) rejected by validate, attempt 2 (gpt-4.1-mini) succeeds. Output includes the per-attempt trace so the escalation is visible, not described.

Sample output:

======================================================================
A — Without the contract's 'not a refusal' validate:
======================================================================
status:       :ok            # schema passes — no guard
tldr shipped: "I cannot help with that request."
              ^^ this would render into the UI card, apologising to the user

======================================================================
B — With the contract + retry_policy fallback:
======================================================================
status:             :ok
final model:        "gpt-4.1-mini"
total attempts:     2

Per-attempt trace:
  attempt 1  model=gpt-4.1-nano   status=validation_failed
  attempt 2  model=gpt-4.1-mini   status=ok

Why this file, in this location

Real-user feedback: "hard to understand what the gem gives me". The why.md guide (shipped in #21) describes the failure modes; this showcase lets developers see the fallback loop run in 30 seconds before committing to reading docs.

bin/demo is not a Ruby convention (standard is bin/console / bin/setup via bundle gem). This project already has examples/*.rb as runnable showcases — placing the demo there matches the existing pattern.

No version bump

Per instruction: docs + examples only, gem stays at 0.7.2.

Copilot

Pull request overview

Adds a runnable “aha moment” fallback demo and aligns docs/output terminology around model fallback (including updated summary output labels and refreshed guides).

Changes:

Add examples/11_fallback_showcase.rb plus examples index updates to provide a zero-API-key runnable fallback demonstration.
Rename retry-optimizer / model-comparison terminal labels to “fallback” terminology (and adjust specs accordingly).
Large documentation refresh across guides (new “Why contracts?” guide, rewrites/edits to Getting Started / Testing / Pipeline / Schema / Optimization, etc.) and bump release metadata to 0.7.2.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
spec/ruby_llm/contract/eval/retry_optimizer_spec.rb	Updates assertions for renamed summary labels.
lib/ruby_llm/contract/version.rb	Bumps gem version to 0.7.2.
lib/ruby_llm/contract/eval/retry_optimizer.rb	Renames summary labels and adds `hardest_eval` alias.
lib/ruby_llm/contract/eval/model_comparison.rb	Renames production-mode table headers and adjusts column widths.
examples/README.md	Updates examples index and adds fallback showcase entry.
examples/11_fallback_showcase.rb	New runnable fallback/refusal + retry_policy demonstration using Test adapter.
docs/guide/why.md	New “Why contracts?” guide with failure-mode framing and pointers to the showcase.
docs/guide/testing.md	Refines testing guide examples and guidance.
docs/guide/prompt_ast.md	Expands prompt-AST guide with richer narrative/examples.
docs/guide/pipeline.md	Updates pipeline guide to the `SummarizeArticle` narrative.
docs/guide/output_schema.md	Fixes DSL constraint keyword docs (snake_case) and refines narrative.
docs/guide/optimizing_retry_policy.md	Major rewrite around “cheapest viable fallback list” workflow.
docs/guide/migration.md	Updates migration examples and adds cross-links.
docs/guide/getting_started.md	Rewrites walkthrough using `SummarizeArticle` consistently.
docs/guide/eval_first.md	Refines eval-first narrative and examples.
docs/guide/best_practices.md	Updates validate patterns and “fallback” terminology.
docs/architecture.md	Updates architecture terminology and module layout docs.
README.md	Adds “Do I need this?” and updates docs index and narrative.
Gemfile.lock	Updates locked gem version to 0.7.2.
CHANGELOG.md	Adds 0.7.2 entry documenting changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T00:41:06Z

 module RubyLLM
  module Contract
-    VERSION = "0.7.1"
+    VERSION = "0.7.2"


PR description says there is "No version bump" and the gem stays at 0.7.2, but this change bumps VERSION from 0.7.1 to 0.7.2 (and Gemfile.lock / CHANGELOG are updated accordingly). Please reconcile by either updating the PR description to reflect the version bump or dropping the version change if it’s not intended for this PR.

Copilot · 2026-04-23T00:41:06Z

 | 6 | Sections | Labeled context blocks (heredoc replacement, with before/after) |
 | 7 | Hash input | Multiple fields with auto-interpolation |
-| 8 | 2-arity invariants | Cross-validate output against input |
+| 8 | 2-arity validates | Cross-validate output against input |


In the 00_basics step table, “2-arity validates” reads like a verb; for clarity and consistency with the rest of the docs/examples, consider renaming this to “2-arity validate blocks” or “2-arity validations”.

Suggested change

| 8 | 2-arity validates | Cross-validate output against input |

| 8 | 2-arity validate blocks | Cross-validate output against input |

Copilot · 2026-04-23T00:41:06Z

 Real-world before/after: classify Reddit threads as PROMO/FILLER/SKIP.
-Shows ID matching, enum validation, score consistency invariants.
+Shows ID matching, enum validation, score consistency validates.

 ## 02_generate_comment.rb — Comment generation

 Real-world before/after: generate Reddit comments with persona.
-Shows sections, banned openings, link presence, length constraints, 2-arity invariants.
+Shows sections, banned openings, link presence, length constraints, 2-arity validates.

 ## 03_target_audience.rb — Audience profiling

 Real-world before/after: generate target audience profiles.
-Shows cascade failure prevention, locale validation, structural invariants.
+Shows cascade failure prevention, locale validation, cross-field validates.


Several sentences use “validates” as a noun (e.g., “score consistency validates”, “2-arity validates”, “cross-field validates”), which is ungrammatical and a bit confusing in a docs index. Suggest changing to “validation”, “validate checks”, or “validate blocks” for each of these phrases.

Copilot · 2026-04-23T00:41:06Z

+Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed.
+Example 04 requires an API key.


The “Running” section says examples 07–10 use the test adapter by default, but 11_fallback_showcase.rb is also test-adapter-only. Also, the comment “Real LLM — requires Ollama or API key” doesn’t match example 04’s stated requirement (provider API key). Please update this section so the adapter requirements are accurate and consistent.

Suggested change

Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed.

Example 04 requires an API key.

Examples 00–03, 05, 07–11 use the test adapter by default — no API keys needed.

Example 04 requires Ollama or a provider API key.

Copilot · 2026-04-23T00:41:07Z

+  rule    "Return valid JSON only."                   # appended as separate system message
+  section "AUDIENCE", "Rails developers"              # labeled system message: [AUDIENCE]\n...
+  example input:  "Ruby 3.4 ships frozen strings...", # user/assistant few-shot pair
+          output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}'


The prompt AST example’s few-shot output: string isn’t valid JSON ([... ] / ...). Since the surrounding text says “Return valid JSON only”, this is easy to copy/paste into a prompt and accidentally teach invalid JSON. Consider making the example output syntactically valid JSON (using real placeholder strings/arrays).

Suggested change

output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}'

output: '{"tldr":"Placeholder summary","takeaways":["Placeholder takeaway 1","Placeholder takeaway 2"],"tone":"analytical"}'

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T01:09:09Z

+# Real LLM — requires a provider API key (OpenAI, Anthropic, Gemini, etc.):
 ruby examples/04_real_llm.rb


The comment now implies that running the “Real LLM” example always requires a provider API key, but this repo/gem also supports running via Ollama (no API key). Consider adjusting the wording to include Ollama as an option to avoid misleading users.

Copilot · 2026-04-23T01:09:10Z

+
+puts "status:       #{naive_result.status.inspect}            # schema passes — no guard"
+puts "tldr shipped: #{naive_result.parsed_output[:tldr].inspect}"
+puts "              ^^ this would render into the UI card, apologising to the user"


Spelling inconsistency: this message uses UK spelling (“apologising”), but the rest of the repo/docs use US spelling (e.g., “behavior”). Consider changing to “apologizing” for consistency.

Suggested change

puts " ^^ this would render into the UI card, apologising to the user"

puts " ^^ this would render into the UI card, apologizing to the user"

Copilot · 2026-04-23T01:09:10Z

+  # For production, expand to match your provider + model mix; consider also
+  # rejecting responses where the takeaways repeat a template ("please provide
+  # more context"). Real apps accumulate these heuristics from production logs.
+  REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze


REFUSAL_PATTERN is hard to read/modify as a single long regex line. Consider rewriting it using extended regex mode (/x) or composing from an array of prefix patterns so future tweaks (adding phrases/providers) are less error-prone.

Suggested change

REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze

REFUSAL_PREFIX_PATTERNS = [

"i\\s+(?:cannot|can.?t|am unable|apologi[sz]e)",

"i['\\s]?m\\s+(?:unable|sorry)",

"sorry\\b",

"as an ai"

].freeze

REFUSAL_PATTERN = /\A(?:#{Regexp.union(REFUSAL_PREFIX_PATTERNS).source})/i.freeze

Lowest-friction entry point for developers evaluating the gem. Zero API keys, Test adapter simulates cheap-model in-schema refusal, contract rejects it, retry_policy escalates to mid-tier model. Prints per-attempt trace so the fallback loop is visible, not described. Two parts in sequence: - Part A: runs a stripped step without the "not a refusal" validate. Refusal JSON ships to parsed_output — demonstrates what happens without a contract (the schema passes; the user sees an apology). - Part B: runs the full SummarizeArticle with validate + retry_policy. Attempt 1 (nano) rejected on validate, attempt 2 (mini) succeeds. Side updates: - examples/README.md: new entry + added to "Running" list - docs/guide/why.md: Failure 3 (refusal-as-valid-JSON) now points to the showcase so why.md readers have a 30-second path to seeing it Placed in examples/ (not bin/) to match this project's existing convention — bundle gem's bin/console/bin/setup are the only standard executables; demo scripts conventionally live under examples/. Runs cleanly end-to-end; 1341 specs pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Audit pass: every runnable example should tell the reader what to expect without forcing them to clone and run. 00 and 02 use Ruby-style inline `# => :ok` comments; 06 and 07 already had Expected output blocks in the header comment (from PR #22). Closing the gap for the other four. - 01_real_llm.rb: inline `# =>` after each puts showing typical values; added "Example console output" block (numbers vary because the example calls a real LLM). - 03_summarize_with_keywords.rb: "Expected output" block in the header with the full probability-sorted keyword table. - 04_summarize_and_translate.rb: "Expected output" block in the header; also guards the Total cost print — `result.trace.total_cost` is nil under the Test adapter and was printing "Total cost: $". Now prints "$0.0 (Test adapter)" when cost is nil. - 05_eval_dataset.rb: "Expected output" block in the header covering both runs and the inline eval_case section. Verified after the change: 7/7 examples run clean (01 skipped, needs key), 1311 specs pass, Test adapter cost path prints the labelled fallback. No version bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adoption-friction release. No runtime behavior changes — every delta is in `docs/`, `examples/`, or `spec/integration/` (plus version.rb / Gemfile.lock bumps). Upgrading from 0.7.2 picks up the expanded guide set, the consolidated runnable showcases, and one extra integration spec. Consolidates 7 merged PRs (#21–#27) into one release: - #21 Guide rewrite + adoption friction (why.md, "Do I need this?", outcome labels, TL;DR boxes) - #22 Runnable aha-moment showcases (fallback + retry variants) - #23 architecture.md refresh + docs/ideas untracked - #24 Schema pitfall fix (5 example files) + expected output coverage - #25 Examples consolidation — drop Reddit, renumber 00-06, restore pipeline + real-LLM minimal - #26 Rails integration FAQ guide (7 pre-emptive questions) - #27 Pipeline-level run_eval coverage — closes the "09 STEP 5" known issue from 0.7.2 Copilot review of the CHANGELOG itself flagged two inaccuracies before merge: - "No gem-level code changes" replaced with "No runtime behavior changes" so version.rb / Gemfile.lock bumps are not misrepresented. - Stale `examples/09_eval_dataset.rb` reference updated to current `05_eval_dataset.rb` after the renumber. Verification: 1287 specs pass, 6/6 test-adapter examples run clean, bundle install resolves 0.7.3. Full changelog entry on main in CHANGELOG.md.

Copilot AI review requested due to automatic review settings April 23, 2026 00:36

Copilot started reviewing on behalf of justi April 23, 2026 00:36 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

justi force-pushed the examples/fallback-showcase branch 2 times, most recently from 1ebc15b to 127ddd4 Compare April 23, 2026 01:03

justi requested a review from Copilot April 23, 2026 01:04

Copilot started reviewing on behalf of justi April 23, 2026 01:05 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

justi force-pushed the examples/fallback-showcase branch 2 times, most recently from 9b6653c to dcb427c Compare April 23, 2026 03:36

justi force-pushed the examples/fallback-showcase branch from dcb427c to 639c4d2 Compare April 23, 2026 03:54

justi merged commit 729a2f2 into main Apr 23, 2026
1 check passed

justi mentioned this pull request Apr 23, 2026

examples: fix array-without-object schema pitfall + expand Expected output coverage #24

Merged

This was referenced Apr 23, 2026

docs: add Rails integration FAQ guide (pre-emptive adoption answers) #26

Merged

0.7.3: adoption-friction release (docs + examples consolidation) #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo#22

examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo#22
justi merged 1 commit into
mainfrom
examples/fallback-showcase

justi commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	\| 8 \| 2-arity validates \| Cross-validate output against input \|
	\| 8 \| 2-arity validate blocks \| Cross-validate output against input \|

		Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed.
		Example 04 requires an API key.

	output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}'
	output: '{"tldr":"Placeholder summary","takeaways":["Placeholder takeaway 1","Placeholder takeaway 2"],"tone":"analytical"}'

		# Real LLM — requires a provider API key (OpenAI, Anthropic, Gemini, etc.):
		ruby examples/04_real_llm.rb

	puts " ^^ this would render into the UI card, apologising to the user"
	puts " ^^ this would render into the UI card, apologizing to the user"

-  REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze
+  REFUSAL_PREFIX_PATTERNS = [
+    "i\\s+(?:cannot|can.?t|am unable|apologi[sz]e)",
+    "i['\\s]?m\\s+(?:unable|sorry)",
+    "sorry\\b",
+    "as an ai"
+  ].freeze
+  REFUSAL_PATTERN = /\A(?:#{Regexp.union(REFUSAL_PREFIX_PATTERNS).source})/i.freeze

Conversation

justi commented Apr 23, 2026

What it does

Why this file, in this location

No version bump

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants