Skip to content

0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18

Closed
justi wants to merge 2 commits into
mainfrom
docs/optimizing-retry-policy-rewrite
Closed

0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18
justi wants to merge 2 commits into
mainfrom
docs/optimizing-retry-policy-rewrite

Conversation

@justi
Copy link
Copy Markdown
Owner

@justi justi commented Apr 22, 2026

Two coupled changes that close the gap between the new mid/senior-focused README and the guide it links to as "Find the cheapest viable fallback list".

Output label changes (non-breaking)

print_summary now prints terminology consistent with README:

Before After
Constraining eval: X Hardest eval: X
Suggested chain: Suggested fallback list:
column single-shot first-attempt
column escalation fallback %

Programmatic metric names deliberately unchanged to avoid a breaking API bump: single_shot_cost, single_shot_latency_ms, escalation_rate. A hardest_eval alias is added to RetryOptimizer::Result for the narrative accessor.

Two spec assertions updated; full suite 1341 examples, 0 failures.

Guide rewrite

docs/guide/optimizing_retry_policy.md rewritten from 17.7k → 6.4k chars using the same method as the README pass. Continues the SummarizeArticle narrative from README rather than introducing ClassifyThread / MyStep placeholders.

Two rounds of codex review (mid/senior Ruby dev persona):

  • Round 1: BIGGER REWORK — offline positioned as optimization (but every candidate returns the same sample_response score); terminology regressions; fabricated production_mode output.
  • Round 2: ONE OR TWO TWEAKS → both applied (explicit 'Order matters' on fallback list; 'Reference' section retitled 'Programmatic API names').

Structural cuts:

  • Offline repositioned as wiring check; LIVE=1 RUNS=3 is the primary command.
  • Real output samples from an actual run against Test adapter.
  • "Manual procedure" / duplicated troubleshooting / gpt-5-specific reasoning-effort case studies removed.

Note: PR #17 (fix broken guide links for budget caps + prompt_ast orphan) is still open on docs/readme-fix-broken-guide-links branch — separate concern, merge order either way.

…_retry_policy guide

Two coupled changes that together close the gap between the new
mid/senior-focused README and the guide it links to as
"Find the cheapest viable fallback list".

Changed — output labels

`print_summary` now prints terminology consistent with README:

  Constraining eval: X        →  Hardest eval: X
  Suggested chain:            →  Suggested fallback list:
  column "single-shot"        →  "first-attempt"
  column "escalation"         →  "fallback %"

Programmatic metric names are deliberately unchanged to avoid a
breaking API bump: `single_shot_cost`, `single_shot_latency_ms`,
`escalation_rate`. A `hardest_eval` alias is added to
`RetryOptimizer::Result` for the narrative accessor.

Two spec assertions updated; full suite 1341 examples, 0 failures.

Docs — optimizing_retry_policy.md

Rewritten from 17.7k to 6.4k characters, same radical-cut style as
the README pass. Continues the `SummarizeArticle` narrative from
README rather than introducing ClassifyThread / MyStep placeholders.

Structural fixes from two rounds of codex review:
- Offline mode repositioned as a wiring check (every candidate
  returns the same sample_response score), real optimization via
  `LIVE=1 RUNS=3` as the primary command.
- Sample outputs captured from an actual run against Test adapter
  so the format matches what `print_summary` really prints, not a
  plausible-looking invention.
- "Suggested fallback list" rows annotated with "Order matters" so
  two entries don't read as options rather than a chain.
- "Manual procedure" / duplicated troubleshooting / gpt-5-specific
  reasoning-effort case studies cut — moved to follow-up docs if
  ever needed.
- `Programmatic API names` section at the end names the metrics on
  Report / AggregatedReport so Kasia-style readers don't feel the
  guide is inconsistent with the code.
Copilot AI review requested due to automatic review settings April 22, 2026 14:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps ruby_llm-contract to 0.7.2 and aligns human-facing optimization output and documentation terminology with the updated README narrative, while keeping programmatic metric/API names stable.

Changes:

  • Rename print_summary labels in RetryOptimizer::Result and production-mode comparison tables (non-breaking; metric keys unchanged).
  • Add hardest_eval accessor alias to RetryOptimizer::Result for narrative consistency.
  • Rewrite docs/guide/optimizing_retry_policy.md to a shorter, README-aligned “fallback list” walkthrough with updated sample output.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spec/ruby_llm/contract/eval/retry_optimizer_spec.rb Updates output assertions to match new labels.
lib/ruby_llm/contract/version.rb Bumps gem version to 0.7.2.
lib/ruby_llm/contract/eval/retry_optimizer.rb Adds hardest_eval alias and renames printed labels.
lib/ruby_llm/contract/eval/model_comparison.rb Renames production-mode table headers (first-attempt, fallback %).
docs/guide/optimizing_retry_policy.md Rewrites/shortens guide and aligns terminology + sample outputs.
Gemfile.lock Updates locked gem version to 0.7.2.
CHANGELOG.md Adds 0.7.2 entry describing label alignment + guide rewrite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 75 to 78
lines = [format(" %-#{chain_width}s %-12s %-10s %-14s %-9s %s",
"Chain", "first-attempt", "fallback %", "effective cost", "latency", "score")]
lines << " #{"-" * (chain_width + 60)}"

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In production_mode_table, the header uses a 12-char column for first-attempt, but format_production_row still formats that column as 11 chars (%-11s). This causes the production-mode table columns to be misaligned (and the separator width chain_width + 60 is now off by 1). Update the row formatter (and separator length) to match the new header widths.

Copilot uses AI. Check for mistakes.
@justi
Copy link
Copy Markdown
Owner Author

justi commented Apr 22, 2026

Copilot inline finding on model_comparison.rb addressed in commit c32a634.

format_production_row now uses %-13s to match the 13-char header width for first-attempt, and the separator was bumped from chain_width + 60chain_width + 62 to cover the 2 extra chars versus the previous single-shot label. Both the header row and data rows now share the same column widths; separator matches.

@justi
Copy link
Copy Markdown
Owner Author

justi commented Apr 22, 2026

Consolidated into PR #21 as commits 1208a1d + d91b13e. Copilot inline finding on column widths carried forward in the fix commit.

@justi justi closed this Apr 22, 2026
@justi justi deleted the docs/optimizing-retry-policy-rewrite branch April 22, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants