Skip to content

[docs] docs: condense agentic-observability-kit page (21% reduction)#28539

Merged
pelikhan merged 1 commit intomainfrom
docs/unbloat-agentic-observability-kit-3e6975e366d4d3d8
Apr 26, 2026
Merged

[docs] docs: condense agentic-observability-kit page (21% reduction)#28539
pelikhan merged 1 commit intomainfrom
docs/unbloat-agentic-observability-kit-3e6975e366d4d3d8

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Reduces docs/src/content/docs/patterns/agentic-observability-kit.md from 318 to 251 lines (67 lines / 21% removed) by converting verbose prose into tables and compact lists, without losing any technical content.

What was improved

  • Scope section — Removed 10-line description of org/enterprise scope that was already fully covered in the "Deployment by scope" section below it. Replaced with a single cross-reference sentence.
  • "What it analyzes" + "What it computes" sections — Merged two related sections and tightened from 18 lines to 11. The "Effective Tokens matter because..." explanation was folded into the bullet point definition.
  • Visual chart descriptions — Converted four ###-headed subsections (each with a description paragraph + "This is the fastest way to answer:" sentence) into a single 4-row table. Reduced from ~35 lines to ~10 lines while keeping all axis/metric details. Added a concise outcome-adjusted reading note.
  • Metric glossary — Converted five verbose paragraphs (each ending with "It exists to answer one question quickly...") into a 6-row table. Reduced from 20 lines to 8.
  • Calibration from a real repository sample — Converted three "First/Second/Third" narrative paragraphs into a numbered list with bold headings. Reduced from 11 to 5 lines.
  • Domain-specific reading — Converted five prose paragraphs into a table with Domain / Key question / Notes columns. Reduced from 16 to 7 lines.
  • COGS reduction — Changed from "First/Second/Third/Fourth/Fifth" narrative to a bullet list with bold headings; removed the "Fifth" point (portfolio cleanup) which duplicates content in the "Portfolio review capabilities" section.

Estimated reduction

Metric Before After
Lines 318 251
Reduction 67 lines (21%)

Screenshots

Visual screenshots were unavailable due to network isolation between the Playwright container and the agent container. The page was verified to render correctly via curl (HTTP 200, correct title returned).

References:

🗜️ Compressed by Documentation Unbloat · ● 636.4K ·

  • expires on Apr 28, 2026, 4:04 AM UTC

Convert verbose prose sections into tables and compact lists,
removing ~67 lines (21%) without losing technical content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pelikhan pelikhan marked this pull request as ready for review April 26, 2026 10:00
Copilot AI review requested due to automatic review settings April 26, 2026 10:00
@pelikhan pelikhan merged commit 15f6876 into main Apr 26, 2026
3 checks passed
@pelikhan pelikhan deleted the docs/unbloat-agentic-observability-kit-3e6975e366d4d3d8 branch April 26, 2026 10:00
@github-actions github-actions Bot mentioned this pull request Apr 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Condenses the Agentic Observability Kit documentation page by replacing verbose prose with tables and compact lists while aiming to preserve the same technical guidance.

Changes:

  • Condensed Scope and cross-referenced org/enterprise deployment to the “Deployment by scope” section.
  • Merged/tightened the “What it analyzes / computes” content into a single narrative + bullet list.
  • Replaced multi-paragraph sections (visual chart explanations, metric glossary, calibration guidance, domain-specific reading, COGS reduction rationale) with tables and shorter lists.
Show a summary per file
File Description
docs/src/content/docs/patterns/agentic-observability-kit.md Refactors multiple explanatory sections into tables/lists to reduce length while keeping the page usable for quick reference.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 4

The kit is built around the deterministic lineage data returned by `gh aw logs`. Instead of treating every workflow run as an isolated event, it prefers `episodes[]` and `edges[]` so orchestrator and worker runs are analyzed as one logical execution. This avoids misreading delegated runs in isolation and makes cost, risk, and control signals easier to attribute.

The episode rollups include aggregate fields such as total runs, total tokens, total estimated cost, blocked requests, MCP failures, risky nodes, and a suggested routing hint for follow-up. When the summary data is not sufficient, the workflow can audit a small number of individual runs to explain the latest regression, a new MCP failure, or a changed write posture.
Built around `gh aw logs`, the kit prefers `episodes[]` and `edges[]` to analyze orchestrator and worker runs as one logical execution, avoiding misreads of delegated runs in isolation. When episode summaries are insufficient, it audits individual runs to explain regressions or MCP failures. For portfolio review, it uses targeted workflow-file inspection to confirm trigger or schedule overlap.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condensed “What it analyzes” paragraph no longer mentions auditing runs to explain control changes like new write posture, but the source workflow explicitly calls out “new write posture” as a key weak-control signal. Consider re-adding that example (or adjusting wording) so the docs match the workflow’s stated signals.

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +151
| Chart | What it shows | Key question |
|-------|---------------|--------------|
| **Episode Risk-Cost Frontier** | Episodes in cost-risk space (x=cost, y=risk score derived from risky nodes/MCP failures/blocked requests, size=run count) | Which execution chains sit on the cost-risk frontier? |
| **Workflow Stability Matrix** | Workflow-by-metric heatmap of instability signals (risky run rate, fallback rate, poor-control rate, etc.) | Which workflows are chronically unstable vs. noisy in one dimension? |
| **Repository Portfolio Map** | Scatter by cost/value proxy; quadrants labeled keep/optimize/simplify/review | Which workflows deserve investment, simplification, or a decision? |
| **Workflow Overlap Matrix** | Workflow-by-workflow similarity heatmap (task domain, trigger/schedule, fingerprints) | Which workflows solve the same problem closely enough to justify consolidation? |
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the chart summary table, several axis/metric details from the longer prose appear to have been dropped/softened (e.g., Episode Risk-Cost Frontier y-axis inputs like poor-control + escalation eligibility, Stability Matrix column list uses “etc.”, Portfolio Map omits point size = run count, Overlap Matrix omits name/token overlap + assessment-pattern signals). If the goal is “no technical content loss,” please make these rows explicit or reference the exact metric definitions used by the workflow.

Copilot uses AI. Check for mistakes.
Effective Tokens matter because raw token counts alone are not comparable across models or token classes. The implementation normalizes token classes first, then applies a model multiplier. This makes it easier to compare a cache-heavy run against an output-heavy run, or a lightweight model against a more expensive one, without collapsing everything into raw token totals.
- Episode-level rollups for lineage, risk, blocked requests, MCP failures, and suggested route
- Per-run metrics: duration, action minutes, token usage, turns, warnings, and `estimated_cost`
- Effective Tokens — a normalized metric weighting input, output, cache-read, and cache-write tokens by per-model multiplier, enabling cross-run and cross-model comparisons
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Effective Tokens” bullet reads as if token classes are weighted by a per-model multiplier. Elsewhere in the docs/spec, Effective Tokens apply token-class weights and model-relative multipliers as separate steps. Rephrase this definition (and/or link to the Effective Tokens specification) to avoid implying a different computation.

Suggested change
- Effective Tokens — a normalized metric weighting input, output, cache-read, and cache-write tokens by per-model multiplier, enabling cross-run and cross-model comparisons
- Effective Tokens — a normalized metric that first applies token-class weights to input, output, cache-read, and cache-write tokens, then applies the model-relative multiplier to support cross-run and cross-model comparisons

Copilot uses AI. Check for mistakes.
- **Overbuilt workflows**: Resource-heavy runs, repeated `latest_success` comparisons, or overkill assessments signal candidates for smaller models, tighter prompts, or deterministic automation.
- **Avoidable control failures**: Repeated blocked requests, MCP failures, or poor-control assessments mean tokens and Actions minutes are going to retries and fallback paths rather than useful work.
- **Hidden orchestration costs**: Episode rollups expose the true aggregate cost of distributed workflows that dispatch workers or chain `workflow_run` triggers.
- **Low-priority optimization**: Escalation logic groups repeated problems into a single actionable report, so owners focus on the highest-value fixes rather than one issue per workflow.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet label “Low-priority optimization” doesn’t match the description (which is about prioritizing highest-value fixes). Rename the label to reflect prioritization rather than low priority to avoid confusion.

Suggested change
- **Low-priority optimization**: Escalation logic groups repeated problems into a single actionable report, so owners focus on the highest-value fixes rather than one issue per workflow.
- **Prioritized optimization**: Escalation logic groups repeated problems into a single actionable report, so owners focus on the highest-value fixes rather than one issue per workflow.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation doc-unbloat documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants