Skip to content

Conversation

@corneliusludmann
Copy link
Contributor

What

Add nested phase spans to OpenTelemetry tracing for detailed build timeline visualization.

Why

Phase spans provide:

  • Detailed timeline visualization of build phases (prep, pull, lint, test, build, package)
  • Individual phase error tracking
  • Better observability into build performance bottlenecks

Changes

Core Implementation

  • PhaseAwareReporter Interface: Optional interface following Go idioms (like io.Closer, io.Seeker)
    • PackageBuildPhaseStarted(pkg, phase) - creates phase span
    • PackageBuildPhaseFinished(pkg, phase, err) - ends phase span with error status
  • OTelReporter: Implements PhaseAwareReporter
    • Added phaseSpans map[string]trace.Span for tracking
    • Creates phase spans as children of package spans
    • Removed phase duration attributes (now captured in nested spans)
  • executeBuildPhase: Uses type assertion to call phase-aware reporters
    • No changes needed to existing reporters (backward compatible)

Span Hierarchy

leeway.build (root)
├── leeway.package (component:package-1)
│   ├── leeway.phase (prep)
│   ├── leeway.phase (build)
│   └── leeway.phase (test)
└── leeway.package (component:package-2)
    ├── leeway.phase (prep)
    └── leeway.phase (build)

Phase Span Attributes

Attribute Type Description
leeway.phase.name string Phase name (prep, pull, lint, test, build, package)

Span Status:

  • OK: Phase completed successfully
  • ERROR: Phase failed (error details in span events)

Testing

  • Added reporter_otel_phase_test.go with 5 comprehensive tests:
    • Phase span creation and attributes
    • Error handling and status codes
    • Span hierarchy verification
    • Interface implementation checks
    • Edge case handling
  • Updated existing TestOTelReporter_PhaseDurations to verify phase spans
  • All 18 OTel tests passing ✅

Documentation

  • Updated README.md with span hierarchy diagram
  • Updated docs/observability.md with:
    • Detailed span hierarchy
    • Phase span attributes
    • Updated context propagation

Design Decisions

  1. Optional Interface Pattern: Avoids breaking changes to Reporter interface
  2. Nested Spans: Better timeline visualization than attributes
  3. Type Assertion: Only OTelReporter implements phase tracking
  4. Thread Safety: All operations protected by sync.RWMutex
  5. Memory Management: Phase spans cleaned up after completion

Related

Closes CLC-2107
Depends on #288 (CLC-2106)

corneliusludmann and others added 13 commits November 20, 2025 16:04
Add OpenTelemetry infrastructure for distributed tracing with support
for W3C Trace Context propagation from CI systems.

- Add telemetry package with tracer initialization, shutdown, and trace context parsing
- Implement OTelReporter with build and package span creation
- Add CLI flags: --otel-endpoint, --trace-parent, --trace-state
- Capture build metrics, package timing, and GitHub Actions context
- Thread-safe concurrent package builds with RWMutex
- Graceful degradation when tracing fails
- Comprehensive tests with in-memory exporters
- Documentation in docs/observability.md and README.md

Closes CLC-2106

Co-authored-by: Ona <no-reply@ona.com>
Enable Docker support in the development container for building and
testing Docker packages.

Co-authored-by: Ona <no-reply@ona.com>
Move tracer shutdown from getBuildOpts to build command Run function
to ensure spans are flushed after the build completes, not immediately
after getBuildOpts returns.

- Change getBuildOpts to return shutdown function
- Update all callers to handle the new return value
- Set OTEL_EXPORTER_OTLP_ENDPOINT from CLI flag before InitTracer

Tested with Jaeger and confirmed traces are now properly sent.

Co-authored-by: Ona <no-reply@ona.com>
- Fix errcheck warnings in test files (check tp.Shutdown errors)
- Fix errcheck warnings for os.Setenv/Unsetenv in tests
- Run go fmt on modified files

Co-authored-by: Ona <no-reply@ona.com>
Add comprehensive tests for GitHub Actions environment variable handling:
- TestOTelReporter_GitHubAttributes: Verifies all 10 GitHub attributes
  are correctly added to spans when running in GitHub Actions
- TestOTelReporter_NoGitHubAttributes: Verifies no GitHub attributes
  are added when not running in GitHub Actions

Coverage improvement:
- addGitHubAttributes: 9.1% → 100%

Co-authored-by: Ona <no-reply@ona.com>
Change InitTracer to accept endpoint as parameter instead of reading
from environment variable. This improves testability and follows
separation of concerns.

Changes:
- InitTracer now takes endpoint as parameter: InitTracer(ctx, endpoint)
- Remove confusing env var manipulation in cmd/build.go
- Simplify test (no need to manipulate environment)
- Make configuration flow clearer: flag → function parameter

Benefits:
- Easier to test (no env var side effects)
- Clearer API (explicit parameter vs implicit env var)
- Consistent with other reporters (e.g., SegmentReporter)
- No circular dependency between flag default and env var

Co-authored-by: Ona <no-reply@ona.com>
- Add OTEL_EXPORTER_OTLP_INSECURE env var and --otel-insecure flag
- Default to secure TLS connections (production-ready)
- Fix version retrieval to use actual leeway.Version
- Update documentation with Honeycomb production examples
- Add TLS configuration section to observability docs

Addresses review feedback on PR #288

Co-authored-by: Ona <no-reply@ona.com>
- Remove docs/README.md (unnecessary file)
- Shorten OpenTelemetry section in README.md
- Link to detailed observability.md documentation

Addresses review comments on PR #288

Co-authored-by: Ona <no-reply@ona.com>
- Document that SDK automatically reads OTEL_EXPORTER_OTLP_HEADERS
- Add OTEL_EXPORTER_OTLP_TRACES_HEADERS to environment variables list
- Clarify no additional code configuration is required
- Improve Honeycomb example description

Co-authored-by: Ona <no-reply@ona.com>
Not needed for the PR

Co-authored-by: Ona <no-reply@ona.com>
- Fix trailing whitespace in reporter_otel_test.go
- Update observability.md to accurately reflect current implementation
- Clarify that phase durations are captured as attributes, not spans
- Move phase-level spans to future enhancements section

Co-authored-by: Ona <no-reply@ona.com>
- Add 6 new test functions covering error handling, test coverage attributes,
  phase durations, package status counts, and memory cleanup
- Verify memory management: packageSpans and packageCtxs maps are properly
  cleaned up after PackageBuildFinished
- Fix whitespace consistency in cmd/build.go (use tabs instead of spaces)
- All 13 OTelReporter tests passing

Co-authored-by: Ona <no-reply@ona.com>
Add PhaseAwareReporter optional interface to enable phase-level span
creation without breaking existing reporters. Phase spans are created
as children of package spans for detailed build timeline visualization.

Changes:
- Define PhaseAwareReporter interface with phase start/finish methods
- Implement phase span tracking in OTelReporter
- Modify executeBuildPhase to call phase-aware reporters via type assertion
- Remove phase duration attributes (now captured in nested spans)
- Add comprehensive phase span tests
- Update documentation with span hierarchy and phase attributes

Closes CLC-2107

Co-authored-by: Ona <no-reply@ona.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants