feat: Expand otel docs & add spec#33380
Conversation
|
✅ smoke-ci: safeoutputs CLI comment + comment-memory run (26117451137)
|
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment.
|
There was a problem hiding this comment.
Pull request overview
Adds a normative OpenTelemetry/OTLP observability specification for gh-aw and introduces an experimental OpenTelemetry setup guide in the docs, along with doc-site navigation/redirect updates.
Changes:
- Added
specs/otel-observability-spec.mddefining the requiredobservability.otlpbehavior, runtime env contract, gateway integration, local mirrors, and compliance tests. - Added
docs/src/content/docs/experimental/opentelemetry.mdxdescribing OTLP write-side export and MCP read-side querying for common backends. - Updated docs navigation/redirects (
docs/astro.config.mjs) and the agent factory status page (docs/src/content/docs/agent-factory-status.mdx).
Show a summary per file
| File | Description |
|---|---|
specs/otel-observability-spec.md |
New spec documenting the observability.otlp contract and conformance expectations. |
docs/src/content/docs/experimental/opentelemetry.mdx |
New experimental setup guide for OTLP export + MCP-based querying. |
docs/src/content/docs/agent-factory-status.mdx |
Adds/adjusts workflow entries in the status table. |
docs/astro.config.mjs |
Adds redirects and sidebar entry for the new experimental OpenTelemetry page. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
docs/src/content/docs/experimental/opentelemetry.mdx:219
- This links to
/gh-aw/grafana/gh-aw-observability-starter.json, but there is nografana/directory or dashboard JSON published with the docs (and the links validator will treat this as a broken local link). Either add the referenced JSON underdocs/public/grafana/(or another published location) or update the link to a real, versioned location (e.g., a GitHub blob URL).
#### Import a starter dashboard template
If you want a dashboard people can upload instead of building from scratch, start with [gh-aw-observability-starter.json](/gh-aw/grafana/gh-aw-observability-starter.json).
docs/src/content/docs/experimental/opentelemetry.mdx:184
- Same issue as above: this uses a site-local link to a
.github/workflows/...path (/gh-aw/.github/workflows/shared/mcp/datadog.md), which won’t exist in the generated docs output and is likely to fail link validation. Prefer a GitHub blob URL or link to the Imports docs that show how to reference it.
5. **Run one workflow to write and one workflow to read.**
Trigger a workflow that exports spans through OTLP, then run a Datadog-backed investigation workflow that imports [shared/mcp/datadog.md](/gh-aw/.github/workflows/shared/mcp/datadog.md).
If write-side OTLP and read-side MCP point at different Datadog sites, export may succeed while the investigation workflow reads the wrong account or environment.
- Files reviewed: 4/4 changed files
- Comments generated: 1
| In this model, the workflow does not emit new spans by itself. It gives the agent a tool that can query existing traces and spans from an external backend. | ||
|
|
||
| > Keep both write-side OTLP configuration and read-side MCP configuration in [shared workflow files](/gh-aw/reference/imports/) and import them where needed. In this repository, that usually means [shared/otlp.md](/gh-aw/.github/workflows/shared/otlp.md) for the combined Sentry and Grafana OTLP pattern, [shared/mcp/grafana.md](/gh-aw/.github/workflows/shared/mcp/grafana.md) or [shared/mcp/datadog.md](/gh-aw/.github/workflows/shared/mcp/datadog.md) for read access, and [shared/otel-queries.md](/gh-aw/.github/workflows/shared/otel-queries.md) for the query playbook. | ||
|
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /grill-with-docs and /zoom-out to review this documentation-heavy PR that adds comprehensive OpenTelemetry setup guides and a formal observability specification.
Why These Skills?
/grill-with-docs: This PR introduces 610 lines of new documentation and formal specifications. Need to verify terminology consistency with existing domain language and ensure the new concepts integrate cleanly withgh-aw's existing vocabulary around observability, OTLP, and MCP./zoom-out: Large documentation additions require checking architectural fit — does this observability story align with the broadergh-awpatterns around shared imports, security, and runtime contracts?
Key Themes
Terminology & Documentation Quality (/grill-with-docs):
- ✅ Excellent use of existing domain terms:
observability.otlp, shared imports pattern, MCP gateway - ✅ Clear distinction between write-side OTLP export and read-side MCP querying
⚠️ Minor typo: "experimetnal" in PR description (line 72 of experimental doc)⚠️ Missing cross-reference from experimental guide back to reference docs⚠️ Relative paths in spec references could break in different contexts
Architectural Fit (/zoom-out):
- ✅ Spec structure is outstanding: conformance levels, RFC 2119 keywords, implementation mapping table
- ✅ Security requirements explicitly stated (§8): credentials masked, headers not in JSON config
- ✅ Multi-backend support (Datadog, Grafana, Sentry) follows existing patterns
⚠️ Spec scope says "no vendor-specific behavior" but mandates Sentry header rewriting (pragmatic but inconsistent)⚠️ Implementation mapping table (§9) is brilliant but will drift without CI enforcement
Positive Highlights
- ✅ Specification-first approach: Having a normative spec (
specs/otel-observability-spec.md) is a huge win for maintainability - ✅ Implementation mapping table (§9): Directly links spec sections to code files — this is exactly what
/zoom-outrecommends for keeping abstractions aligned - ✅ Conformance testing requirements (§10): Test IDs mapped to spec requirements ensure behavior stays synchronized
- ✅ Clear setup instructions: Backend-specific tabs make it easy for users to follow the right path
- ✅ Security-first: Credentials never in gateway JSON, always via environment variables
Suggestions
- Fix typo in PR description and experimental doc ("experimetnal" → "experimental")
- Add cross-reference from experimental guide to reference docs
- Strengthen spec-to-code link: Add CI check to verify implementation mapping table stays current
- Clarify scope: Either move Sentry header rewriting to informative appendix or update §1.2 scope to include "vendor protocol adaptations"
Verdict
This is high-quality documentation work that follows specification best practices and integrates well with gh-aw's existing architecture. The issues flagged are minor polish items — the core observability contract is solid.
Recommending COMMENT (informative feedback, no blocking issues).
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 7M
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in |
Summary
Expands OpenTelemetry documentation with a new experimental guide and comprehensive specification defining OTLP observability conformance for
gh-aw.Changes
Documentation
docs/src/content/docs/experimental/opentelemetry.mdx): 278-line quickstart covering OTLP export, visualization with Jaeger, environment variables, span structure, and troubleshootingdocs/astro.config.mjs): Added OpenTelemetry sidebar entry and redirect from/guides/opentelemetry/to/experimental/opentelemetry/docs/src/content/docs/agent-factory-status.mdx): Added LintMonster and UK AI Operational Resilience workflows, consolidated Smoke OTEL entriesSpecification
specs/opentelemetry-observability.md): Comprehensive specification defining:OTEL_EXPORTER_OTLP_ENDPOINT, workflow overrides), header injection, TLS settingsImpact
Testing
Documentation builds successfully with new navigation and redirects. Specification provides compliance test scenarios for future validation.