0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18
0.7.2: align output labels + rewrite optimizing_retry_policy guide (17.7k → 6.4k)#18justi wants to merge 2 commits into
Conversation
…_retry_policy guide Two coupled changes that together close the gap between the new mid/senior-focused README and the guide it links to as "Find the cheapest viable fallback list". Changed — output labels `print_summary` now prints terminology consistent with README: Constraining eval: X → Hardest eval: X Suggested chain: → Suggested fallback list: column "single-shot" → "first-attempt" column "escalation" → "fallback %" Programmatic metric names are deliberately unchanged to avoid a breaking API bump: `single_shot_cost`, `single_shot_latency_ms`, `escalation_rate`. A `hardest_eval` alias is added to `RetryOptimizer::Result` for the narrative accessor. Two spec assertions updated; full suite 1341 examples, 0 failures. Docs — optimizing_retry_policy.md Rewritten from 17.7k to 6.4k characters, same radical-cut style as the README pass. Continues the `SummarizeArticle` narrative from README rather than introducing ClassifyThread / MyStep placeholders. Structural fixes from two rounds of codex review: - Offline mode repositioned as a wiring check (every candidate returns the same sample_response score), real optimization via `LIVE=1 RUNS=3` as the primary command. - Sample outputs captured from an actual run against Test adapter so the format matches what `print_summary` really prints, not a plausible-looking invention. - "Suggested fallback list" rows annotated with "Order matters" so two entries don't read as options rather than a chain. - "Manual procedure" / duplicated troubleshooting / gpt-5-specific reasoning-effort case studies cut — moved to follow-up docs if ever needed. - `Programmatic API names` section at the end names the metrics on Report / AggregatedReport so Kasia-style readers don't feel the guide is inconsistent with the code.
There was a problem hiding this comment.
Pull request overview
This PR bumps ruby_llm-contract to 0.7.2 and aligns human-facing optimization output and documentation terminology with the updated README narrative, while keeping programmatic metric/API names stable.
Changes:
- Rename
print_summarylabels inRetryOptimizer::Resultand production-mode comparison tables (non-breaking; metric keys unchanged). - Add
hardest_evalaccessor alias toRetryOptimizer::Resultfor narrative consistency. - Rewrite
docs/guide/optimizing_retry_policy.mdto a shorter, README-aligned “fallback list” walkthrough with updated sample output.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| spec/ruby_llm/contract/eval/retry_optimizer_spec.rb | Updates output assertions to match new labels. |
| lib/ruby_llm/contract/version.rb | Bumps gem version to 0.7.2. |
| lib/ruby_llm/contract/eval/retry_optimizer.rb | Adds hardest_eval alias and renames printed labels. |
| lib/ruby_llm/contract/eval/model_comparison.rb | Renames production-mode table headers (first-attempt, fallback %). |
| docs/guide/optimizing_retry_policy.md | Rewrites/shortens guide and aligns terminology + sample outputs. |
| Gemfile.lock | Updates locked gem version to 0.7.2. |
| CHANGELOG.md | Adds 0.7.2 entry describing label alignment + guide rewrite. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| lines = [format(" %-#{chain_width}s %-12s %-10s %-14s %-9s %s", | ||
| "Chain", "first-attempt", "fallback %", "effective cost", "latency", "score")] | ||
| lines << " #{"-" * (chain_width + 60)}" | ||
|
|
There was a problem hiding this comment.
In production_mode_table, the header uses a 12-char column for first-attempt, but format_production_row still formats that column as 11 chars (%-11s). This causes the production-mode table columns to be misaligned (and the separator width chain_width + 60 is now off by 1). Update the row formatter (and separator length) to match the new header widths.
|
Copilot inline finding on
|
Two coupled changes that close the gap between the new mid/senior-focused README and the guide it links to as "Find the cheapest viable fallback list".
Output label changes (non-breaking)
print_summarynow prints terminology consistent with README:Constraining eval: XHardest eval: XSuggested chain:Suggested fallback list:single-shotfirst-attemptescalationfallback %Programmatic metric names deliberately unchanged to avoid a breaking API bump:
single_shot_cost,single_shot_latency_ms,escalation_rate. Ahardest_evalalias is added toRetryOptimizer::Resultfor the narrative accessor.Two spec assertions updated; full suite 1341 examples, 0 failures.
Guide rewrite
docs/guide/optimizing_retry_policy.mdrewritten from 17.7k → 6.4k chars using the same method as the README pass. Continues theSummarizeArticlenarrative from README rather than introducingClassifyThread/MyStepplaceholders.Two rounds of codex review (mid/senior Ruby dev persona):
Structural cuts:
LIVE=1 RUNS=3is the primary command.Note: PR #17 (fix broken guide links for budget caps + prompt_ast orphan) is still open on
docs/readme-fix-broken-guide-linksbranch — separate concern, merge order either way.