0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k) by justi · Pull Request #18 · justi/ruby_llm-contract

justi · 2026-04-22T14:08:05Z

Two coupled changes that close the gap between the new mid/senior-focused README and the guide it links to as "Find the cheapest viable fallback list".

Output label changes (non-breaking)

print_summary now prints terminology consistent with README:

Before	After
`Constraining eval: X`	`Hardest eval: X`
`Suggested chain:`	`Suggested fallback list:`
column `single-shot`	`first-attempt`
column `escalation`	`fallback %`

Programmatic metric names deliberately unchanged to avoid a breaking API bump: single_shot_cost, single_shot_latency_ms, escalation_rate. A hardest_eval alias is added to RetryOptimizer::Result for the narrative accessor.

Two spec assertions updated; full suite 1341 examples, 0 failures.

Guide rewrite

docs/guide/optimizing_retry_policy.md rewritten from 17.7k → 6.4k chars using the same method as the README pass. Continues the SummarizeArticle narrative from README rather than introducing ClassifyThread / MyStep placeholders.

Two rounds of codex review (mid/senior Ruby dev persona):

Round 1: BIGGER REWORK — offline positioned as optimization (but every candidate returns the same sample_response score); terminology regressions; fabricated production_mode output.
Round 2: ONE OR TWO TWEAKS → both applied (explicit 'Order matters' on fallback list; 'Reference' section retitled 'Programmatic API names').

Structural cuts:

Offline repositioned as wiring check; LIVE=1 RUNS=3 is the primary command.
Real output samples from an actual run against Test adapter.
"Manual procedure" / duplicated troubleshooting / gpt-5-specific reasoning-effort case studies removed.

Note: PR #17 (fix broken guide links for budget caps + prompt_ast orphan) is still open on docs/readme-fix-broken-guide-links branch — separate concern, merge order either way.

…_retry_policy guide Two coupled changes that together close the gap between the new mid/senior-focused README and the guide it links to as "Find the cheapest viable fallback list". Changed — output labels `print_summary` now prints terminology consistent with README: Constraining eval: X → Hardest eval: X Suggested chain: → Suggested fallback list: column "single-shot" → "first-attempt" column "escalation" → "fallback %" Programmatic metric names are deliberately unchanged to avoid a breaking API bump: `single_shot_cost`, `single_shot_latency_ms`, `escalation_rate`. A `hardest_eval` alias is added to `RetryOptimizer::Result` for the narrative accessor. Two spec assertions updated; full suite 1341 examples, 0 failures. Docs — optimizing_retry_policy.md Rewritten from 17.7k to 6.4k characters, same radical-cut style as the README pass. Continues the `SummarizeArticle` narrative from README rather than introducing ClassifyThread / MyStep placeholders. Structural fixes from two rounds of codex review: - Offline mode repositioned as a wiring check (every candidate returns the same sample_response score), real optimization via `LIVE=1 RUNS=3` as the primary command. - Sample outputs captured from an actual run against Test adapter so the format matches what `print_summary` really prints, not a plausible-looking invention. - "Suggested fallback list" rows annotated with "Order matters" so two entries don't read as options rather than a chain. - "Manual procedure" / duplicated troubleshooting / gpt-5-specific reasoning-effort case studies cut — moved to follow-up docs if ever needed. - `Programmatic API names` section at the end names the metrics on Report / AggregatedReport so Kasia-style readers don't feel the guide is inconsistent with the code.

Copilot

Pull request overview

This PR bumps ruby_llm-contract to 0.7.2 and aligns human-facing optimization output and documentation terminology with the updated README narrative, while keeping programmatic metric/API names stable.

Changes:

Rename print_summary labels in RetryOptimizer::Result and production-mode comparison tables (non-breaking; metric keys unchanged).
Add hardest_eval accessor alias to RetryOptimizer::Result for narrative consistency.
Rewrite docs/guide/optimizing_retry_policy.md to a shorter, README-aligned “fallback list” walkthrough with updated sample output.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
spec/ruby_llm/contract/eval/retry_optimizer_spec.rb	Updates output assertions to match new labels.
lib/ruby_llm/contract/version.rb	Bumps gem version to 0.7.2.
lib/ruby_llm/contract/eval/retry_optimizer.rb	Adds `hardest_eval` alias and renames printed labels.
lib/ruby_llm/contract/eval/model_comparison.rb	Renames production-mode table headers (`first-attempt`, `fallback %`).
docs/guide/optimizing_retry_policy.md	Rewrites/shortens guide and aligns terminology + sample outputs.
Gemfile.lock	Updates locked gem version to 0.7.2.
CHANGELOG.md	Adds 0.7.2 entry describing label alignment + guide rewrite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T14:11:42Z

+          lines = [format("  %-#{chain_width}s  %-12s  %-10s  %-14s  %-9s  %s",
+                          "Chain", "first-attempt", "fallback %", "effective cost", "latency", "score")]
          lines << "  #{"-" * (chain_width + 60)}"



In production_mode_table, the header uses a 12-char column for first-attempt, but format_production_row still formats that column as 11 chars (%-11s). This causes the production-mode table columns to be misaligned (and the separator width chain_width + 60 is now off by 1). Update the row formatter (and separator length) to match the new header widths.

…r rename

justi · 2026-04-22T16:47:57Z

Copilot inline finding on model_comparison.rb addressed in commit c32a634.

format_production_row now uses %-13s to match the 13-char header width for first-attempt, and the separator was bumped from chain_width + 60 → chain_width + 62 to cover the 2 extra chars versus the previous single-shot label. Both the header row and data rows now share the same column widths; separator matches.

justi · 2026-04-22T16:52:15Z

Consolidated into PR #21 as commits 1208a1d + d91b13e. Copilot inline finding on column widths carried forward in the fix commit.

Copilot AI review requested due to automatic review settings April 22, 2026 14:08

Copilot started reviewing on behalf of justi April 22, 2026 14:08 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

justi mentioned this pull request Apr 22, 2026

docs(getting-started): rewrite for SummarizeArticle narrative + cut duplication (8.7k → 6.1k) #19

Closed

model_comparison: fix production-mode table column widths after heade…

c32a634

…r rename

justi mentioned this pull request Apr 22, 2026

0.7.2 + batch: guide optimizations (optimizing_retry_policy + getting_started + output_schema + terminal labels) #21

Merged

justi closed this Apr 22, 2026

justi deleted the docs/optimizing-retry-policy-rewrite branch April 22, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18

0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18
justi wants to merge 2 commits into
mainfrom
docs/optimizing-retry-policy-rewrite

justi commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

justi commented Apr 22, 2026

Uh oh!

justi commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

justi commented Apr 22, 2026

Output label changes (non-breaking)

Guide rewrite

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

justi commented Apr 22, 2026

Uh oh!

justi commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants