Skip to content

examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo#22

Merged
justi merged 1 commit into
mainfrom
examples/fallback-showcase
Apr 23, 2026
Merged

examples: add 11_fallback_showcase.rb — runnable 'aha moment' demo#22
justi merged 1 commit into
mainfrom
examples/fallback-showcase

Conversation

@justi
Copy link
Copy Markdown
Owner

@justi justi commented Apr 23, 2026

Adds the lowest-friction entry point for evaluating the gem: a runnable showcase that prints the fallback loop end-to-end, with zero API keys.

What it does

ruby examples/11_fallback_showcase.rb runs two contrasting steps in sequence:

Part A — without the contract's validate("not a refusal")
The Test adapter returns in-schema refusal JSON ({"tldr": "I cannot help..."}). Schema passes. The refusal is what parsed_output hands back — demonstrating exactly what happens in production without a contract.

Part B — with the full SummarizeArticle + retry_policy
Test adapter returns [REFUSAL_RESPONSE, GOOD_RESPONSE]. Attempt 1 (gpt-4.1-nano) rejected by validate, attempt 2 (gpt-4.1-mini) succeeds. Output includes the per-attempt trace so the escalation is visible, not described.

Sample output:

======================================================================
A — Without the contract's 'not a refusal' validate:
======================================================================
status:       :ok            # schema passes — no guard
tldr shipped: "I cannot help with that request."
              ^^ this would render into the UI card, apologising to the user

======================================================================
B — With the contract + retry_policy fallback:
======================================================================
status:             :ok
final model:        "gpt-4.1-mini"
total attempts:     2

Per-attempt trace:
  attempt 1  model=gpt-4.1-nano   status=validation_failed
  attempt 2  model=gpt-4.1-mini   status=ok

Why this file, in this location

Real-user feedback: "hard to understand what the gem gives me". The why.md guide (shipped in #21) describes the failure modes; this showcase lets developers see the fallback loop run in 30 seconds before committing to reading docs.

bin/demo is not a Ruby convention (standard is bin/console / bin/setup via bundle gem). This project already has examples/*.rb as runnable showcases — placing the demo there matches the existing pattern.

No version bump

Per instruction: docs + examples only, gem stays at 0.7.2.

Copilot AI review requested due to automatic review settings April 23, 2026 00:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a runnable “aha moment” fallback demo and aligns docs/output terminology around model fallback (including updated summary output labels and refreshed guides).

Changes:

  • Add examples/11_fallback_showcase.rb plus examples index updates to provide a zero-API-key runnable fallback demonstration.
  • Rename retry-optimizer / model-comparison terminal labels to “fallback” terminology (and adjust specs accordingly).
  • Large documentation refresh across guides (new “Why contracts?” guide, rewrites/edits to Getting Started / Testing / Pipeline / Schema / Optimization, etc.) and bump release metadata to 0.7.2.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
spec/ruby_llm/contract/eval/retry_optimizer_spec.rb Updates assertions for renamed summary labels.
lib/ruby_llm/contract/version.rb Bumps gem version to 0.7.2.
lib/ruby_llm/contract/eval/retry_optimizer.rb Renames summary labels and adds hardest_eval alias.
lib/ruby_llm/contract/eval/model_comparison.rb Renames production-mode table headers and adjusts column widths.
examples/README.md Updates examples index and adds fallback showcase entry.
examples/11_fallback_showcase.rb New runnable fallback/refusal + retry_policy demonstration using Test adapter.
docs/guide/why.md New “Why contracts?” guide with failure-mode framing and pointers to the showcase.
docs/guide/testing.md Refines testing guide examples and guidance.
docs/guide/prompt_ast.md Expands prompt-AST guide with richer narrative/examples.
docs/guide/pipeline.md Updates pipeline guide to the SummarizeArticle narrative.
docs/guide/output_schema.md Fixes DSL constraint keyword docs (snake_case) and refines narrative.
docs/guide/optimizing_retry_policy.md Major rewrite around “cheapest viable fallback list” workflow.
docs/guide/migration.md Updates migration examples and adds cross-links.
docs/guide/getting_started.md Rewrites walkthrough using SummarizeArticle consistently.
docs/guide/eval_first.md Refines eval-first narrative and examples.
docs/guide/best_practices.md Updates validate patterns and “fallback” terminology.
docs/architecture.md Updates architecture terminology and module layout docs.
README.md Adds “Do I need this?” and updates docs index and narrative.
Gemfile.lock Updates locked gem version to 0.7.2.
CHANGELOG.md Adds 0.7.2 entry documenting changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

module RubyLLM
module Contract
VERSION = "0.7.1"
VERSION = "0.7.2"
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says there is "No version bump" and the gem stays at 0.7.2, but this change bumps VERSION from 0.7.1 to 0.7.2 (and Gemfile.lock / CHANGELOG are updated accordingly). Please reconcile by either updating the PR description to reflect the version bump or dropping the version change if it’s not intended for this PR.

Copilot uses AI. Check for mistakes.
Comment thread examples/README.md
| 6 | Sections | Labeled context blocks (heredoc replacement, with before/after) |
| 7 | Hash input | Multiple fields with auto-interpolation |
| 8 | 2-arity invariants | Cross-validate output against input |
| 8 | 2-arity validates | Cross-validate output against input |
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 00_basics step table, “2-arity validates” reads like a verb; for clarity and consistency with the rest of the docs/examples, consider renaming this to “2-arity validate blocks” or “2-arity validations”.

Suggested change
| 8 | 2-arity validates | Cross-validate output against input |
| 8 | 2-arity validate blocks | Cross-validate output against input |

Copilot uses AI. Check for mistakes.
Comment thread examples/README.md
Comment on lines 25 to +36
Real-world before/after: classify Reddit threads as PROMO/FILLER/SKIP.
Shows ID matching, enum validation, score consistency invariants.
Shows ID matching, enum validation, score consistency validates.

## 02_generate_comment.rb — Comment generation

Real-world before/after: generate Reddit comments with persona.
Shows sections, banned openings, link presence, length constraints, 2-arity invariants.
Shows sections, banned openings, link presence, length constraints, 2-arity validates.

## 03_target_audience.rb — Audience profiling

Real-world before/after: generate target audience profiles.
Shows cascade failure prevention, locale validation, structural invariants.
Shows cascade failure prevention, locale validation, cross-field validates.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several sentences use “validates” as a noun (e.g., “score consistency validates”, “2-arity validates”, “cross-field validates”), which is ungrammatical and a bit confusing in a docs index. Suggest changing to “validation”, “validate checks”, or “validate blocks” for each of these phrases.

Copilot uses AI. Check for mistakes.
Comment thread examples/README.md Outdated
Comment on lines +139 to +140
Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed.
Example 04 requires an API key.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Running” section says examples 07–10 use the test adapter by default, but 11_fallback_showcase.rb is also test-adapter-only. Also, the comment “Real LLM — requires Ollama or API key” doesn’t match example 04’s stated requirement (provider API key). Please update this section so the adapter requirements are accurate and consistent.

Suggested change
Examples 00–03, 05, 07–10 use the test adapter by default — no API keys needed.
Example 04 requires an API key.
Examples 00–03, 05, 07–11 use the test adapter by default — no API keys needed.
Example 04 requires Ollama or a provider API key.

Copilot uses AI. Check for mistakes.
Comment thread docs/guide/prompt_ast.md
rule "Return valid JSON only." # appended as separate system message
section "AUDIENCE", "Rails developers" # labeled system message: [AUDIENCE]\n...
example input: "Ruby 3.4 ships frozen strings...", # user/assistant few-shot pair
output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}'
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt AST example’s few-shot output: string isn’t valid JSON ([... ] / ...). Since the surrounding text says “Return valid JSON only”, this is easy to copy/paste into a prompt and accidentally teach invalid JSON. Consider making the example output syntactically valid JSON (using real placeholder strings/arrays).

Suggested change
output: '{"tldr":"...","takeaways":[...],"tone":"analytical"}'
output: '{"tldr":"Placeholder summary","takeaways":["Placeholder takeaway 1","Placeholder takeaway 2"],"tone":"analytical"}'

Copilot uses AI. Check for mistakes.
@justi justi force-pushed the examples/fallback-showcase branch 2 times, most recently from 1ebc15b to 127ddd4 Compare April 23, 2026 01:03
@justi justi requested a review from Copilot April 23, 2026 01:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/README.md Outdated
Comment on lines 135 to 136
# Real LLM — requires a provider API key (OpenAI, Anthropic, Gemini, etc.):
ruby examples/04_real_llm.rb
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment now implies that running the “Real LLM” example always requires a provider API key, but this repo/gem also supports running via Ollama (no API key). Consider adjusting the wording to include Ollama as an option to avoid misleading users.

Copilot uses AI. Check for mistakes.
Comment thread examples/11_fallback_showcase.rb Outdated

puts "status: #{naive_result.status.inspect} # schema passes — no guard"
puts "tldr shipped: #{naive_result.parsed_output[:tldr].inspect}"
puts " ^^ this would render into the UI card, apologising to the user"
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling inconsistency: this message uses UK spelling (“apologising”), but the rest of the repo/docs use US spelling (e.g., “behavior”). Consider changing to “apologizing” for consistency.

Suggested change
puts " ^^ this would render into the UI card, apologising to the user"
puts " ^^ this would render into the UI card, apologizing to the user"

Copilot uses AI. Check for mistakes.
Comment thread examples/11_fallback_showcase.rb Outdated
# For production, expand to match your provider + model mix; consider also
# rejecting responses where the takeaways repeat a template ("please provide
# more context"). Real apps accumulate these heuristics from production logs.
REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REFUSAL_PATTERN is hard to read/modify as a single long regex line. Consider rewriting it using extended regex mode (/x) or composing from an array of prefix patterns so future tweaks (adding phrases/providers) are less error-prone.

Suggested change
REFUSAL_PATTERN = /\A(i\s+(cannot|can.?t|am unable|apologi[sz]e)|i['\s]?m\s+(unable|sorry)|sorry\b|as an ai)/i.freeze
REFUSAL_PREFIX_PATTERNS = [
"i\\s+(?:cannot|can.?t|am unable|apologi[sz]e)",
"i['\\s]?m\\s+(?:unable|sorry)",
"sorry\\b",
"as an ai"
].freeze
REFUSAL_PATTERN = /\A(?:#{Regexp.union(REFUSAL_PREFIX_PATTERNS).source})/i.freeze

Copilot uses AI. Check for mistakes.
@justi justi force-pushed the examples/fallback-showcase branch 2 times, most recently from 9b6653c to dcb427c Compare April 23, 2026 03:36
Lowest-friction entry point for developers evaluating the gem. Zero API
keys, Test adapter simulates cheap-model in-schema refusal, contract
rejects it, retry_policy escalates to mid-tier model. Prints per-attempt
trace so the fallback loop is visible, not described.

Two parts in sequence:

- Part A: runs a stripped step without the "not a refusal" validate.
  Refusal JSON ships to parsed_output — demonstrates what happens
  without a contract (the schema passes; the user sees an apology).
- Part B: runs the full SummarizeArticle with validate + retry_policy.
  Attempt 1 (nano) rejected on validate, attempt 2 (mini) succeeds.

Side updates:
- examples/README.md: new entry + added to "Running" list
- docs/guide/why.md: Failure 3 (refusal-as-valid-JSON) now points to
  the showcase so why.md readers have a 30-second path to seeing it

Placed in examples/ (not bin/) to match this project's existing
convention — bundle gem's bin/console/bin/setup are the only standard
executables; demo scripts conventionally live under examples/.

Runs cleanly end-to-end; 1341 specs pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justi justi force-pushed the examples/fallback-showcase branch from dcb427c to 639c4d2 Compare April 23, 2026 03:54
@justi justi merged commit 729a2f2 into main Apr 23, 2026
1 check passed
justi added a commit that referenced this pull request Apr 23, 2026
Audit pass: every runnable example should tell the reader what to expect
without forcing them to clone and run. 00 and 02 use Ruby-style inline
`# => :ok` comments; 06 and 07 already had Expected output blocks in the
header comment (from PR #22). Closing the gap for the other four.

- 01_real_llm.rb: inline `# =>` after each puts showing typical values;
  added "Example console output" block (numbers vary because the example
  calls a real LLM).
- 03_summarize_with_keywords.rb: "Expected output" block in the header
  with the full probability-sorted keyword table.
- 04_summarize_and_translate.rb: "Expected output" block in the header;
  also guards the Total cost print — `result.trace.total_cost` is nil
  under the Test adapter and was printing "Total cost:        $".
  Now prints "$0.0 (Test adapter)" when cost is nil.
- 05_eval_dataset.rb: "Expected output" block in the header covering
  both runs and the inline eval_case section.

Verified after the change: 7/7 examples run clean (01 skipped, needs key),
1311 specs pass, Test adapter cost path prints the labelled fallback.

No version bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
justi added a commit that referenced this pull request Apr 24, 2026
Adoption-friction release. No runtime behavior changes — every delta is in `docs/`, `examples/`, or `spec/integration/` (plus version.rb / Gemfile.lock bumps). Upgrading from 0.7.2 picks up the expanded guide set, the consolidated runnable showcases, and one extra integration spec.

Consolidates 7 merged PRs (#21#27) into one release:

- #21 Guide rewrite + adoption friction (why.md, "Do I need this?", outcome labels, TL;DR boxes)
- #22 Runnable aha-moment showcases (fallback + retry variants)
- #23 architecture.md refresh + docs/ideas untracked
- #24 Schema pitfall fix (5 example files) + expected output coverage
- #25 Examples consolidation — drop Reddit, renumber 00-06, restore pipeline + real-LLM minimal
- #26 Rails integration FAQ guide (7 pre-emptive questions)
- #27 Pipeline-level run_eval coverage — closes the "09 STEP 5" known issue from 0.7.2

Copilot review of the CHANGELOG itself flagged two inaccuracies before merge:
- "No gem-level code changes" replaced with "No runtime behavior changes" so version.rb / Gemfile.lock bumps are not misrepresented.
- Stale `examples/09_eval_dataset.rb` reference updated to current `05_eval_dataset.rb` after the renumber.

Verification: 1287 specs pass, 6/6 test-adapter examples run clean, bundle install resolves 0.7.3.

Full changelog entry on main in CHANGELOG.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants