examples: fix array-without-object schema pitfall + expand Expected output coverage#24
Merged
Merged
Conversation
RubyLLM::Schema DSL silently ignores all but the first child declaration when
array items are not wrapped in `object do...end`. The resulting JSON schema
says `items: {type: "string"}` instead of `items: {type: "object", ...}`.
This matches the behaviour documented in spec/ruby_llm/contract/
nested_schema_spec.rb:71 ("WRONG: array without object wrapper produces flat
string items").
At runtime, anyone who copied one of these examples and pointed it at a real
LLM (or a Test adapter returning Hashes) would see "translations[0]: expected
string, got Hash" validation errors that point at the wrong thing (the data)
rather than the actual bug (the schema).
Affected files:
- examples/01_classify_threads.rb — array :threads
- examples/04_real_llm.rb — array :decisions, :action_items (x2 classes),
:analyses with nested :issues
- examples/05_output_schema.rb — array :groups
- examples/07_keyword_extraction.rb — array :topics (keywords was fixed in #22)
- examples/08_translation.rb — array :segments, :translations, :reviews
Every array child block is now wrapped in `object do...end`. All 11 examples
(04 skipped — needs API key) still run clean end-to-end. 1341 specs pass.
No version bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…EADME Real-user feedback via chat: "check all examples — output should be visible in README so the user doesn't have to run them just to see what to expect". Examples 11 and 12 already had Expected output blocks (PR #22 and #23). 00, 04, 05, 07, 08 have feature tables that already document structure. This commit closes the gap for 01, 02, 03, 09, 10 — each gets a short "Expected output" section matching what the script actually prints. Output blocks are byte-verified against `bundle exec ruby examples/NN_*.rb`. Where the full output is long (09 has 5 steps, 10 prints a big pipeline table), the block is abridged to the load-bearing lines. No version bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates examples to avoid a known output_schema DSL pitfall for arrays of objects, and expands examples/README.md with “Expected output” sections so readers can see what each script prints without running it.
Changes:
- Wrap array item declarations in
object do ... endacross multiple examples to ensure arrays validate as arrays of objects (not strings). - Add/expand “Expected output” blocks in
examples/README.mdfor examples 01–03, 09, and 10. - Adjust nested array schemas in complex examples (real LLM + translation + keyword/topic pipelines) to match intended JSON structure.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| examples/README.md | Adds “Expected output” blocks for several examples (abridged where appropriate). |
| examples/01_classify_threads.rb | Fixes array :threads schema by wrapping per-item fields in object do ... end. |
| examples/04_real_llm.rb | Fixes multiple array-of-object schemas (decisions, action_items, analyses, nested issues). |
| examples/05_output_schema.rb | Fixes array :groups schema to correctly model items as objects. |
| examples/07_keyword_extraction.rb | Fixes keywords and topics arrays to be arrays of objects. |
| examples/08_translation.rb | Fixes segments, translations, and reviews arrays to be arrays of objects. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codex review of PR #24 flagged the "abridged across 5 steps" heading as misleading — the block shows steps 2-4 only. Step 1 is dataset setup and step 5 (pipeline check) has a known additional-property validation failure tracked separately (out of scope for this PR). Heading now states the exact scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
justi
added a commit
that referenced
this pull request
Apr 24, 2026
…nown issue) (#27) Closes the "09 STEP 5 pipeline evaluation fails" known issue flagged in PR #24 and adds coverage that prevents regression. ## Root cause The failure was **example code**, not a gem bug — the Test adapter was returning a JSON blob with keys for every step, and one step's strict schema legitimately rejected the extras. PR #25 removed the broken section along with the Reddit / support examples, which closed the symptom but left pipeline-level run_eval with zero runnable example and zero integration coverage. ## What this PR adds **1. Runnable pipeline run_eval in examples/04_summarize_and_translate.rb** ~20 lines at the end of the file: define_eval on the pipeline, Test adapter with a response per step, run_eval call, inline expected output. The eval matches on the final review step's overall_verdict, demonstrating that pipeline expectations target the last step's output. **2. spec/integration/pipeline_eval_spec.rb (3 cases)** - Happy path: end-to-end run_eval scores the final-step output 1.0. - Final-step mismatch: eval scores 0.0 and surfaces the diff. - Fail-fast propagation: a validate rejection in an intermediate step propagates to the report (asserts step_status == :validation_failed and details include the validate label, proving the validate path — not the schema — is being exercised). ## Reviews addressed - **Copilot**: flagged that the original fail-fast case used `tldr: "x" * 500`, which short-circuits on the step's `max_length: 200` schema before the validate runs. Fix: removed max_length from the test-only schema and used a 50-char tldr that passes schema and trips the validate. Added step_status + details assertions so the test would fail loudly if it ever regressed back to schema-rejection. ## Verification - 1287 specs pass (was 1284 + 3 new). - 6/6 examples with the Test adapter run clean. ## No version bump The follow-up bumps to 0.7.3 with a CHANGELOG entry that references this PR.
4 tasks
justi
added a commit
that referenced
this pull request
Apr 24, 2026
Adoption-friction release. No runtime behavior changes — every delta is in `docs/`, `examples/`, or `spec/integration/` (plus version.rb / Gemfile.lock bumps). Upgrading from 0.7.2 picks up the expanded guide set, the consolidated runnable showcases, and one extra integration spec. Consolidates 7 merged PRs (#21–#27) into one release: - #21 Guide rewrite + adoption friction (why.md, "Do I need this?", outcome labels, TL;DR boxes) - #22 Runnable aha-moment showcases (fallback + retry variants) - #23 architecture.md refresh + docs/ideas untracked - #24 Schema pitfall fix (5 example files) + expected output coverage - #25 Examples consolidation — drop Reddit, renumber 00-06, restore pipeline + real-LLM minimal - #26 Rails integration FAQ guide (7 pre-emptive questions) - #27 Pipeline-level run_eval coverage — closes the "09 STEP 5" known issue from 0.7.2 Copilot review of the CHANGELOG itself flagged two inaccuracies before merge: - "No gem-level code changes" replaced with "No runtime behavior changes" so version.rb / Gemfile.lock bumps are not misrepresented. - Stale `examples/09_eval_dataset.rb` reference updated to current `05_eval_dataset.rb` after the renumber. Verification: 1287 specs pass, 6/6 test-adapter examples run clean, bundle install resolves 0.7.3. Full changelog entry on main in CHANGELOG.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two coordinated examples/ cleanups, two commits for review clarity.
Commit 1 — fix array-without-object schema pitfall (5 files)
RubyLLM::SchemaDSL silently ignores all but the first child declaration when array items are not wrapped inobject do...end. The resulting JSON schema hasitems: {type: "string"}instead ofitems: {type: "object", ...}. Seespec/ruby_llm/contract/nested_schema_spec.rb:71— this exact pitfall has a dedicated spec titled "WRONG: array without object wrapper produces flat string items".Anyone copying one of these examples to real code and feeding in a Hash would hit "translations[0]: expected string, got Hash" validation errors pointing at the data rather than the actual bug (the schema).
Fixed in:
examples/01_classify_threads.rb—array :threadsexamples/04_real_llm.rb—array :decisions,:action_items(x2 classes),:analyseswith nested:issuesexamples/05_output_schema.rb—array :groupsexamples/07_keyword_extraction.rb—array :keywordsandarray :topicsexamples/08_translation.rb—array :segments,:translations,:reviewsEvery array child block is now wrapped in
object do...end.Commit 2 — add Expected output blocks to 01, 02, 03, 09, 10
Real-user feedback: "check all examples — output should be visible in README so the user doesn't have to run them just to see what to expect".
Examples 11 and 12 already had Expected output blocks (PRs #22 and #23). 00, 04, 05, 07, 08 have feature tables that already describe structure. This commit closes the gap for 01, 02, 03, 09, 10.
Each output block is byte-verified against the actual script output. Long outputs (09 has 5 steps, 10 prints a pipeline table) are abridged to the load-bearing lines.
Scope
Three things that surfaced during this audit but are out of scope:
examples/09_eval_dataset.rbSTEP 5 (pipeline eval section) fails withadditional property not allowed— different bug (strict schema + pipeline key threading). Separate PR.Verification
No version bump
Docs + example code only; gem stays at 0.7.2.