examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo#22
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a runnable “aha moment” fallback demo and aligns docs/output terminology around model fallback (including updated summary output labels and refreshed guides).
Changes:
- Add
examples/11_fallback_showcase.rbplus examples index updates to provide a zero-API-key runnable fallback demonstration. - Rename retry-optimizer / model-comparison terminal labels to “fallback” terminology (and adjust specs accordingly).
- Large documentation refresh across guides (new “Why contracts?” guide, rewrites/edits to Getting Started / Testing / Pipeline / Schema / Optimization, etc.) and bump release metadata to 0.7.2.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/ruby_llm/contract/eval/retry_optimizer_spec.rb | Updates assertions for renamed summary labels. |
| lib/ruby_llm/contract/version.rb | Bumps gem version to 0.7.2. |
| lib/ruby_llm/contract/eval/retry_optimizer.rb | Renames summary labels and adds hardest_eval alias. |
| lib/ruby_llm/contract/eval/model_comparison.rb | Renames production-mode table headers and adjusts column widths. |
| examples/README.md | Updates examples index and adds fallback showcase entry. |
| examples/11_fallback_showcase.rb | New runnable fallback/refusal + retry_policy demonstration using Test adapter. |
| docs/guide/why.md | New “Why contracts?” guide with failure-mode framing and pointers to the showcase. |
| docs/guide/testing.md | Refines testing guide examples and guidance. |
| docs/guide/prompt_ast.md | Expands prompt-AST guide with richer narrative/examples. |
| docs/guide/pipeline.md | Updates pipeline guide to the SummarizeArticle narrative. |
| docs/guide/output_schema.md | Fixes DSL constraint keyword docs (snake_case) and refines narrative. |
| docs/guide/optimizing_retry_policy.md | Major rewrite around “cheapest viable fallback list” workflow. |
| docs/guide/migration.md | Updates migration examples and adds cross-links. |
| docs/guide/getting_started.md | Rewrites walkthrough using SummarizeArticle consistently. |
| docs/guide/eval_first.md | Refines eval-first narrative and examples. |
| docs/guide/best_practices.md | Updates validate patterns and “fallback” terminology. |
| docs/architecture.md | Updates architecture terminology and module layout docs. |
| README.md | Adds “Do I need this?” and updates docs index and narrative. |
| Gemfile.lock | Updates locked gem version to 0.7.2. |
| CHANGELOG.md | Adds 0.7.2 entry documenting changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| module RubyLLM | ||
| module Contract | ||
| VERSION = "0.7.1" | ||
| VERSION = "0.7.2" |
There was a problem hiding this comment.
PR description says there is "No version bump" and the gem stays at 0.7.2, but this change bumps VERSION from 0.7.1 to 0.7.2 (and Gemfile.lock / CHANGELOG are updated accordingly). Please reconcile by either updating the PR description to reflect the version bump or dropping the version change if it’s not intended for this PR.
| | 6 | Sections | Labeled context blocks (heredoc replacement, with before/after) | | ||
| | 7 | Hash input | Multiple fields with auto-interpolation | | ||
| | 8 | 2-arity invariants | Cross-validate output against input | | ||
| | 8 | 2-arity validates | Cross-validate output against input | |
There was a problem hiding this comment.
In the 00_basics step table, “2-arity validates” reads like a verb; for clarity and consistency with the rest of the docs/examples, consider renaming this to “2-arity validate blocks” or “2-arity validations”.
| | 8 | 2-arity validates | Cross-validate output against input | | |
| | 8 | 2-arity validate blocks | Cross-validate output against input | |
| Real-world before/after: classify Reddit threads as PROMO/FILLER/SKIP. | ||
| Shows ID matching, enum validation, score consistency invariants. | ||
| Shows ID matching, enum validation, score consistency validates. | ||
|
|
||
| ## 02_generate_comment.rb — Comment generation | ||
|
|
||
| Real-world before/after: generate Reddit comments with persona. | ||
| Shows sections, banned openings, link presence, length constraints, 2-arity invariants. | ||
| Shows sections, banned openings, link presence, length constraints, 2-arity validates. | ||
|
|
||
| ## 03_target_audience.rb — Audience profiling | ||
|
|
||
| Real-world before/after: generate target audience profiles. | ||
| Shows cascade failure prevention, locale validation, structural invariants. | ||
| Shows cascade failure prevention, locale validation, cross-field validates. |
There was a problem hiding this comment.
Several sentences use “validates” as a noun (e.g., “score consistency validates”, “2-arity validates”, “cross-field validates”), which is ungrammatical and a bit confusing in a docs index. Suggest changing to “validation”, “validate checks”, or “validate blocks” for each of these phrases.
| Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed. | ||
| Example 04 requires an API key. |
There was a problem hiding this comment.
The “Running” section says examples 07–10 use the test adapter by default, but 11_fallback_showcase.rb is also test-adapter-only. Also, the comment “Real LLM — requires Ollama or API key” doesn’t match example 04’s stated requirement (provider API key). Please update this section so the adapter requirements are accurate and consistent.
| Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed. | |
| Example 04 requires an API key. | |
| Examples 00–03, 05, 07–11 use the test adapter by default — no API keys needed. | |
| Example 04 requires Ollama or a provider API key. |
| rule "Return valid JSON only." # appended as separate system message | ||
| section "AUDIENCE", "Rails developers" # labeled system message: [AUDIENCE]\n... | ||
| example input: "Ruby 3.4 ships frozen strings...", # user/assistant few-shot pair | ||
| output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}' |
There was a problem hiding this comment.
The prompt AST example’s few-shot output: string isn’t valid JSON ([... ] / ...). Since the surrounding text says “Return valid JSON only”, this is easy to copy/paste into a prompt and accidentally teach invalid JSON. Consider making the example output syntactically valid JSON (using real placeholder strings/arrays).
| output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}' | |
| output: '{"tldr":"Placeholder summary","takeaways":["Placeholder takeaway 1","Placeholder takeaway 2"],"tone":"analytical"}' |
1ebc15b to
127ddd4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Real LLM — requires a provider API key (OpenAI, Anthropic, Gemini, etc.): | ||
| ruby examples/04_real_llm.rb |
There was a problem hiding this comment.
The comment now implies that running the “Real LLM” example always requires a provider API key, but this repo/gem also supports running via Ollama (no API key). Consider adjusting the wording to include Ollama as an option to avoid misleading users.
|
|
||
| puts "status: #{naive_result.status.inspect} # schema passes — no guard" | ||
| puts "tldr shipped: #{naive_result.parsed_output[:tldr].inspect}" | ||
| puts " ^^ this would render into the UI card, apologising to the user" |
There was a problem hiding this comment.
Spelling inconsistency: this message uses UK spelling (“apologising”), but the rest of the repo/docs use US spelling (e.g., “behavior”). Consider changing to “apologizing” for consistency.
| puts " ^^ this would render into the UI card, apologising to the user" | |
| puts " ^^ this would render into the UI card, apologizing to the user" |
| # For production, expand to match your provider + model mix; consider also | ||
| # rejecting responses where the takeaways repeat a template ("please provide | ||
| # more context"). Real apps accumulate these heuristics from production logs. | ||
| REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze |
There was a problem hiding this comment.
REFUSAL_PATTERN is hard to read/modify as a single long regex line. Consider rewriting it using extended regex mode (/x) or composing from an array of prefix patterns so future tweaks (adding phrases/providers) are less error-prone.
| REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze | |
| REFUSAL_PREFIX_PATTERNS = [ | |
| "i\\s+(?:cannot|can.?t|am unable|apologi[sz]e)", | |
| "i['\\s]?m\\s+(?:unable|sorry)", | |
| "sorry\\b", | |
| "as an ai" | |
| ].freeze | |
| REFUSAL_PATTERN = /\A(?:#{Regexp.union(REFUSAL_PREFIX_PATTERNS).source})/i.freeze |
9b6653c to
dcb427c
Compare
Lowest-friction entry point for developers evaluating the gem. Zero API keys, Test adapter simulates cheap-model in-schema refusal, contract rejects it, retry_policy escalates to mid-tier model. Prints per-attempt trace so the fallback loop is visible, not described. Two parts in sequence: - Part A: runs a stripped step without the "not a refusal" validate. Refusal JSON ships to parsed_output — demonstrates what happens without a contract (the schema passes; the user sees an apology). - Part B: runs the full SummarizeArticle with validate + retry_policy. Attempt 1 (nano) rejected on validate, attempt 2 (mini) succeeds. Side updates: - examples/README.md: new entry + added to "Running" list - docs/guide/why.md: Failure 3 (refusal-as-valid-JSON) now points to the showcase so why.md readers have a 30-second path to seeing it Placed in examples/ (not bin/) to match this project's existing convention — bundle gem's bin/console/bin/setup are the only standard executables; demo scripts conventionally live under examples/. Runs cleanly end-to-end; 1341 specs pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dcb427c to
639c4d2
Compare
Audit pass: every runnable example should tell the reader what to expect without forcing them to clone and run. 00 and 02 use Ruby-style inline `# => :ok` comments; 06 and 07 already had Expected output blocks in the header comment (from PR #22). Closing the gap for the other four. - 01_real_llm.rb: inline `# =>` after each puts showing typical values; added "Example console output" block (numbers vary because the example calls a real LLM). - 03_summarize_with_keywords.rb: "Expected output" block in the header with the full probability-sorted keyword table. - 04_summarize_and_translate.rb: "Expected output" block in the header; also guards the Total cost print — `result.trace.total_cost` is nil under the Test adapter and was printing "Total cost: $". Now prints "$0.0 (Test adapter)" when cost is nil. - 05_eval_dataset.rb: "Expected output" block in the header covering both runs and the inline eval_case section. Verified after the change: 7/7 examples run clean (01 skipped, needs key), 1311 specs pass, Test adapter cost path prints the labelled fallback. No version bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adoption-friction release. No runtime behavior changes — every delta is in `docs/`, `examples/`, or `spec/integration/` (plus version.rb / Gemfile.lock bumps). Upgrading from 0.7.2 picks up the expanded guide set, the consolidated runnable showcases, and one extra integration spec. Consolidates 7 merged PRs (#21–#27) into one release: - #21 Guide rewrite + adoption friction (why.md, "Do I need this?", outcome labels, TL;DR boxes) - #22 Runnable aha-moment showcases (fallback + retry variants) - #23 architecture.md refresh + docs/ideas untracked - #24 Schema pitfall fix (5 example files) + expected output coverage - #25 Examples consolidation — drop Reddit, renumber 00-06, restore pipeline + real-LLM minimal - #26 Rails integration FAQ guide (7 pre-emptive questions) - #27 Pipeline-level run_eval coverage — closes the "09 STEP 5" known issue from 0.7.2 Copilot review of the CHANGELOG itself flagged two inaccuracies before merge: - "No gem-level code changes" replaced with "No runtime behavior changes" so version.rb / Gemfile.lock bumps are not misrepresented. - Stale `examples/09_eval_dataset.rb` reference updated to current `05_eval_dataset.rb` after the renumber. Verification: 1287 specs pass, 6/6 test-adapter examples run clean, bundle install resolves 0.7.3. Full changelog entry on main in CHANGELOG.md.
Adds the lowest-friction entry point for evaluating the gem: a runnable showcase that prints the fallback loop end-to-end, with zero API keys.
What it does
ruby examples/11_fallback_showcase.rbruns two contrasting steps in sequence:Part A — without the contract's
validate("not a refusal")The Test adapter returns in-schema refusal JSON (
{"tldr": "I cannot help..."}). Schema passes. The refusal is whatparsed_outputhands back — demonstrating exactly what happens in production without a contract.Part B — with the full
SummarizeArticle+retry_policyTest adapter returns
[REFUSAL_RESPONSE, GOOD_RESPONSE]. Attempt 1 (gpt-4.1-nano) rejected by validate, attempt 2 (gpt-4.1-mini) succeeds. Output includes the per-attempt trace so the escalation is visible, not described.Sample output:
Why this file, in this location
Real-user feedback: "hard to understand what the gem gives me". The why.md guide (shipped in #21) describes the failure modes; this showcase lets developers see the fallback loop run in 30 seconds before committing to reading docs.
bin/demois not a Ruby convention (standard isbin/console/bin/setupviabundle gem). This project already hasexamples/*.rbas runnable showcases — placing the demo there matches the existing pattern.No version bump
Per instruction: docs + examples only, gem stays at 0.7.2.