Skip to content

examples: fix array-without-object schema pitfall + expand Expected output coverage#24

Merged
justi merged 3 commits into
mainfrom
examples/schema-fixes-and-expected-output
Apr 23, 2026
Merged

examples: fix array-without-object schema pitfall + expand Expected output coverage#24
justi merged 3 commits into
mainfrom
examples/schema-fixes-and-expected-output

Conversation

@justi
Copy link
Copy Markdown
Owner

@justi justi commented Apr 23, 2026

Two coordinated examples/ cleanups, two commits for review clarity.

Commit 1 — fix array-without-object schema pitfall (5 files)

RubyLLM::Schema DSL silently ignores all but the first child declaration when array items are not wrapped in object do...end. The resulting JSON schema has items: {type: "string"} instead of items: {type: "object", ...}. See spec/ruby_llm/contract/nested_schema_spec.rb:71 — this exact pitfall has a dedicated spec titled "WRONG: array without object wrapper produces flat string items".

Anyone copying one of these examples to real code and feeding in a Hash would hit "translations[0]: expected string, got Hash" validation errors pointing at the data rather than the actual bug (the schema).

Fixed in:

  • examples/01_classify_threads.rbarray :threads
  • examples/04_real_llm.rbarray :decisions, :action_items (x2 classes), :analyses with nested :issues
  • examples/05_output_schema.rbarray :groups
  • examples/07_keyword_extraction.rbarray :keywords and array :topics
  • examples/08_translation.rbarray :segments, :translations, :reviews

Every array child block is now wrapped in object do...end.

Commit 2 — add Expected output blocks to 01, 02, 03, 09, 10

Real-user feedback: "check all examples — output should be visible in README so the user doesn't have to run them just to see what to expect".

Examples 11 and 12 already had Expected output blocks (PRs #22 and #23). 00, 04, 05, 07, 08 have feature tables that already describe structure. This commit closes the gap for 01, 02, 03, 09, 10.

Each output block is byte-verified against the actual script output. Long outputs (09 has 5 steps, 10 prints a pipeline table) are abridged to the load-bearing lines.

Scope

Three things that surfaced during this audit but are out of scope:

  • examples/09_eval_dataset.rb STEP 5 (pipeline eval section) fails with additional property not allowed — different bug (strict schema + pipeline key threading). Separate PR.
  • Adding Ruby-style inline comments to puts-heavy examples — larger stylistic change, would bloat this PR.
  • Updating README entries for 00 / 04 / 05 / 07 / 08 — they already have feature tables; adding redundant Expected output blocks would dilute the existing structure.

Verification

  • All 11 examples (04 skipped — needs API key) run clean end-to-end.
  • 1341 specs pass.
  • Expected output blocks in README verified character-accurate against actual script output.

No version bump

Docs + example code only; gem stays at 0.7.2.

justi and others added 2 commits April 23, 2026 13:58
RubyLLM::Schema DSL silently ignores all but the first child declaration when
array items are not wrapped in `object do...end`. The resulting JSON schema
says `items: {type: "string"}` instead of `items: {type: "object", ...}`.
This matches the behaviour documented in spec/ruby_llm/contract/
nested_schema_spec.rb:71 ("WRONG: array without object wrapper produces flat
string items").

At runtime, anyone who copied one of these examples and pointed it at a real
LLM (or a Test adapter returning Hashes) would see "translations[0]: expected
string, got Hash" validation errors that point at the wrong thing (the data)
rather than the actual bug (the schema).

Affected files:
- examples/01_classify_threads.rb — array :threads
- examples/04_real_llm.rb — array :decisions, :action_items (x2 classes),
  :analyses with nested :issues
- examples/05_output_schema.rb — array :groups
- examples/07_keyword_extraction.rb — array :topics (keywords was fixed in #22)
- examples/08_translation.rb — array :segments, :translations, :reviews

Every array child block is now wrapped in `object do...end`. All 11 examples
(04 skipped — needs API key) still run clean end-to-end. 1341 specs pass.

No version bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…EADME

Real-user feedback via chat: "check all examples — output should be visible
in README so the user doesn't have to run them just to see what to expect".

Examples 11 and 12 already had Expected output blocks (PR #22 and #23).
00, 04, 05, 07, 08 have feature tables that already document structure.
This commit closes the gap for 01, 02, 03, 09, 10 — each gets a short
"Expected output" section matching what the script actually prints.

Output blocks are byte-verified against `bundle exec ruby examples/NN_*.rb`.
Where the full output is long (09 has 5 steps, 10 prints a big pipeline
table), the block is abridged to the load-bearing lines.

No version bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 23, 2026 07:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates examples to avoid a known output_schema DSL pitfall for arrays of objects, and expands examples/README.md with “Expected output” sections so readers can see what each script prints without running it.

Changes:

  • Wrap array item declarations in object do ... end across multiple examples to ensure arrays validate as arrays of objects (not strings).
  • Add/expand “Expected output” blocks in examples/README.md for examples 01–03, 09, and 10.
  • Adjust nested array schemas in complex examples (real LLM + translation + keyword/topic pipelines) to match intended JSON structure.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/README.md Adds “Expected output” blocks for several examples (abridged where appropriate).
examples/01_classify_threads.rb Fixes array :threads schema by wrapping per-item fields in object do ... end.
examples/04_real_llm.rb Fixes multiple array-of-object schemas (decisions, action_items, analyses, nested issues).
examples/05_output_schema.rb Fixes array :groups schema to correctly model items as objects.
examples/07_keyword_extraction.rb Fixes keywords and topics arrays to be arrays of objects.
examples/08_translation.rb Fixes segments, translations, and reviews arrays to be arrays of objects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Codex review of PR #24 flagged the "abridged across 5 steps" heading as
misleading — the block shows steps 2-4 only. Step 1 is dataset setup and
step 5 (pipeline check) has a known additional-property validation failure
tracked separately (out of scope for this PR).

Heading now states the exact scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@justi justi merged commit 6172db7 into main Apr 23, 2026
1 check passed
justi added a commit that referenced this pull request Apr 24, 2026
…nown issue) (#27)

Closes the "09 STEP 5 pipeline evaluation fails" known issue flagged in PR #24 and adds coverage that prevents regression.

## Root cause

The failure was **example code**, not a gem bug — the Test adapter was returning a JSON blob with keys for every step, and one step's strict schema legitimately rejected the extras. PR #25 removed the broken section along with the Reddit / support examples, which closed the symptom but left pipeline-level run_eval with zero runnable example and zero integration coverage.

## What this PR adds

**1. Runnable pipeline run_eval in examples/04_summarize_and_translate.rb**

~20 lines at the end of the file: define_eval on the pipeline, Test adapter with a response per step, run_eval call, inline expected output. The eval matches on the final review step's overall_verdict, demonstrating that pipeline expectations target the last step's output.

**2. spec/integration/pipeline_eval_spec.rb (3 cases)**

- Happy path: end-to-end run_eval scores the final-step output 1.0.
- Final-step mismatch: eval scores 0.0 and surfaces the diff.
- Fail-fast propagation: a validate rejection in an intermediate step propagates to the report (asserts step_status == :validation_failed and details include the validate label, proving the validate path — not the schema — is being exercised).

## Reviews addressed

- **Copilot**: flagged that the original fail-fast case used `tldr: "x" * 500`, which short-circuits on the step's `max_length: 200` schema before the validate runs. Fix: removed max_length from the test-only schema and used a 50-char tldr that passes schema and trips the validate. Added step_status + details assertions so the test would fail loudly if it ever regressed back to schema-rejection.

## Verification

- 1287 specs pass (was 1284 + 3 new).
- 6/6 examples with the Test adapter run clean.

## No version bump

The follow-up bumps to 0.7.3 with a CHANGELOG entry that references this PR.
justi added a commit that referenced this pull request Apr 24, 2026
Adoption-friction release. No runtime behavior changes — every delta is in `docs/`, `examples/`, or `spec/integration/` (plus version.rb / Gemfile.lock bumps). Upgrading from 0.7.2 picks up the expanded guide set, the consolidated runnable showcases, and one extra integration spec.

Consolidates 7 merged PRs (#21#27) into one release:

- #21 Guide rewrite + adoption friction (why.md, "Do I need this?", outcome labels, TL;DR boxes)
- #22 Runnable aha-moment showcases (fallback + retry variants)
- #23 architecture.md refresh + docs/ideas untracked
- #24 Schema pitfall fix (5 example files) + expected output coverage
- #25 Examples consolidation — drop Reddit, renumber 00-06, restore pipeline + real-LLM minimal
- #26 Rails integration FAQ guide (7 pre-emptive questions)
- #27 Pipeline-level run_eval coverage — closes the "09 STEP 5" known issue from 0.7.2

Copilot review of the CHANGELOG itself flagged two inaccuracies before merge:
- "No gem-level code changes" replaced with "No runtime behavior changes" so version.rb / Gemfile.lock bumps are not misrepresented.
- Stale `examples/09_eval_dataset.rb` reference updated to current `05_eval_dataset.rb` after the renumber.

Verification: 1287 specs pass, 6/6 test-adapter examples run clean, bundle install resolves 0.7.3.

Full changelog entry on main in CHANGELOG.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants