|
| 1 | +# Observability |
| 2 | + |
| 3 | +Leeway supports distributed tracing using OpenTelemetry to provide visibility into build performance and behavior. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +OpenTelemetry tracing in leeway captures: |
| 8 | +- Build lifecycle (start to finish) |
| 9 | +- Individual package builds |
| 10 | +- Build phase durations (prep, pull, lint, test, build, package) |
| 11 | +- Cache hit/miss information |
| 12 | +- GitHub Actions context (when running in CI) |
| 13 | +- Parent trace context propagation from CI systems |
| 14 | + |
| 15 | +## Architecture |
| 16 | + |
| 17 | +### Span Hierarchy |
| 18 | + |
| 19 | +``` |
| 20 | +Root Span (leeway.build) |
| 21 | +├── Package Span 1 (leeway.package) |
| 22 | +│ ├── Phase: prep |
| 23 | +│ ├── Phase: pull |
| 24 | +│ ├── Phase: lint |
| 25 | +│ ├── Phase: test |
| 26 | +│ └── Phase: build |
| 27 | +├── Package Span 2 (leeway.package) |
| 28 | +└── Package Span N (leeway.package) |
| 29 | +``` |
| 30 | + |
| 31 | +- **Root Span**: Created when `BuildStarted` is called, represents the entire build operation |
| 32 | +- **Package Spans**: Created for each package being built, as children of the root span |
| 33 | +- **Phase Spans**: (Future) Individual build phases within each package |
| 34 | + |
| 35 | +### Context Propagation |
| 36 | + |
| 37 | +Leeway supports W3C Trace Context propagation, allowing builds to be part of larger distributed traces: |
| 38 | + |
| 39 | +1. **Parent Context**: Accepts `traceparent` and `tracestate` headers from upstream systems |
| 40 | +2. **Root Context**: Creates a root span linked to the parent context |
| 41 | +3. **Package Context**: Each package span is a child of the root span |
| 42 | + |
| 43 | +## Configuration |
| 44 | + |
| 45 | +### Environment Variables |
| 46 | + |
| 47 | +- `OTEL_EXPORTER_OTLP_ENDPOINT`: OTLP endpoint URL (e.g., `localhost:4318`) |
| 48 | +- `TRACEPARENT`: W3C Trace Context traceparent header (format: `00-{trace-id}-{span-id}-{flags}`) |
| 49 | +- `TRACESTATE`: W3C Trace Context tracestate header (optional) |
| 50 | + |
| 51 | +### CLI Flags |
| 52 | + |
| 53 | +- `--otel-endpoint`: OTLP endpoint URL (overrides `OTEL_EXPORTER_OTLP_ENDPOINT`) |
| 54 | +- `--trace-parent`: W3C traceparent header (overrides `TRACEPARENT`) |
| 55 | +- `--trace-state`: W3C tracestate header (overrides `TRACESTATE`) |
| 56 | + |
| 57 | +### Precedence |
| 58 | + |
| 59 | +CLI flags take precedence over environment variables: |
| 60 | +``` |
| 61 | +CLI flag → Environment variable → Default (disabled) |
| 62 | +``` |
| 63 | + |
| 64 | +## Span Attributes |
| 65 | + |
| 66 | +### Root Span Attributes |
| 67 | + |
| 68 | +| Attribute | Type | Description | Example | |
| 69 | +|-----------|------|-------------|---------| |
| 70 | +| `leeway.version` | string | Leeway version | `"0.7.0"` | |
| 71 | +| `leeway.workspace.root` | string | Workspace root path | `"/workspace"` | |
| 72 | +| `leeway.target.package` | string | Target package being built | `"components/server:app"` | |
| 73 | +| `leeway.target.version` | string | Target package version | `"abc123def"` | |
| 74 | +| `leeway.packages.total` | int | Total packages in build | `42` | |
| 75 | +| `leeway.packages.cached` | int | Packages cached locally | `35` | |
| 76 | +| `leeway.packages.remote` | int | Packages in remote cache | `5` | |
| 77 | +| `leeway.packages.downloaded` | int | Packages downloaded | `3` | |
| 78 | +| `leeway.packages.to_build` | int | Packages to build | `2` | |
| 79 | + |
| 80 | +### Package Span Attributes |
| 81 | + |
| 82 | +| Attribute | Type | Description | Example | |
| 83 | +|-----------|------|-------------|---------| |
| 84 | +| `leeway.package.name` | string | Package full name | `"components/server:app"` | |
| 85 | +| `leeway.package.type` | string | Package type | `"go"`, `"yarn"`, `"docker"`, `"generic"` | |
| 86 | +| `leeway.package.version` | string | Package version | `"abc123def"` | |
| 87 | +| `leeway.package.builddir` | string | Build directory | `"/tmp/leeway/build/..."` | |
| 88 | +| `leeway.package.last_phase` | string | Last completed phase | `"build"` | |
| 89 | +| `leeway.package.duration_ms` | int64 | Total build duration (ms) | `15234` | |
| 90 | +| `leeway.package.phase.{phase}.duration_ms` | int64 | Phase duration (ms) | `5432` | |
| 91 | +| `leeway.package.test.coverage_percentage` | int | Test coverage % | `85` | |
| 92 | +| `leeway.package.test.functions_with_test` | int | Functions with tests | `42` | |
| 93 | +| `leeway.package.test.functions_without_test` | int | Functions without tests | `8` | |
| 94 | + |
| 95 | +### GitHub Actions Attributes |
| 96 | + |
| 97 | +When running in GitHub Actions (`GITHUB_ACTIONS=true`), the following attributes are added to the root span: |
| 98 | + |
| 99 | +| Attribute | Environment Variable | Description | |
| 100 | +|-----------|---------------------|-------------| |
| 101 | +| `github.workflow` | `GITHUB_WORKFLOW` | Workflow name | |
| 102 | +| `github.run_id` | `GITHUB_RUN_ID` | Unique run identifier | |
| 103 | +| `github.run_number` | `GITHUB_RUN_NUMBER` | Run number | |
| 104 | +| `github.job` | `GITHUB_JOB` | Job name | |
| 105 | +| `github.actor` | `GITHUB_ACTOR` | User who triggered the workflow | |
| 106 | +| `github.repository` | `GITHUB_REPOSITORY` | Repository name | |
| 107 | +| `github.ref` | `GITHUB_REF` | Git ref | |
| 108 | +| `github.sha` | `GITHUB_SHA` | Commit SHA | |
| 109 | +| `github.server_url` | `GITHUB_SERVER_URL` | GitHub server URL | |
| 110 | +| `github.workflow_ref` | `GITHUB_WORKFLOW_REF` | Workflow reference | |
| 111 | + |
| 112 | +## Usage Examples |
| 113 | + |
| 114 | +### Basic Usage |
| 115 | + |
| 116 | +```bash |
| 117 | +# Set OTLP endpoint |
| 118 | +export OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 |
| 119 | + |
| 120 | +# Build with tracing enabled |
| 121 | +leeway build :my-package |
| 122 | +``` |
| 123 | + |
| 124 | +### With CLI Flags |
| 125 | + |
| 126 | +```bash |
| 127 | +leeway build :my-package \ |
| 128 | + --otel-endpoint=localhost:4318 |
| 129 | +``` |
| 130 | + |
| 131 | +### With Parent Trace Context |
| 132 | + |
| 133 | +```bash |
| 134 | +# Propagate trace context from CI system |
| 135 | +leeway build :my-package \ |
| 136 | + --otel-endpoint=localhost:4318 \ |
| 137 | + --trace-parent="00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" |
| 138 | +``` |
| 139 | + |
| 140 | +### In GitHub Actions |
| 141 | + |
| 142 | +```yaml |
| 143 | +name: Build |
| 144 | +on: [push] |
| 145 | + |
| 146 | +jobs: |
| 147 | + build: |
| 148 | + runs-on: ubuntu-latest |
| 149 | + steps: |
| 150 | + - uses: actions/checkout@v4 |
| 151 | + |
| 152 | + - name: Build with tracing |
| 153 | + env: |
| 154 | + OTEL_EXPORTER_OTLP_ENDPOINT: ${{ secrets.OTEL_ENDPOINT }} |
| 155 | + run: | |
| 156 | + leeway build :my-package |
| 157 | +``` |
| 158 | +
|
| 159 | +### With Jaeger (Local Development) |
| 160 | +
|
| 161 | +```bash |
| 162 | +# Start Jaeger all-in-one |
| 163 | +docker run -d --name jaeger \ |
| 164 | + -p 4318:4318 \ |
| 165 | + -p 16686:16686 \ |
| 166 | + jaegertracing/all-in-one:latest |
| 167 | + |
| 168 | +# Build with tracing |
| 169 | +export OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4318 |
| 170 | +leeway build :my-package |
| 171 | + |
| 172 | +# View traces at http://localhost:16686 |
| 173 | +``` |
| 174 | + |
| 175 | +## Error Handling |
| 176 | + |
| 177 | +Leeway implements graceful degradation for tracing: |
| 178 | + |
| 179 | +- **Tracer initialization failures**: Logged as warnings, build continues without tracing |
| 180 | +- **Span creation failures**: Logged as warnings, build continues |
| 181 | +- **OTLP endpoint unavailable**: Spans are buffered and flushed on shutdown (with timeout) |
| 182 | +- **Invalid trace context**: Logged as warning, new trace is started |
| 183 | + |
| 184 | +Tracing failures never cause build failures. |
| 185 | + |
| 186 | +## Performance Considerations |
| 187 | + |
| 188 | +- **Overhead**: Minimal (<1% in typical builds) |
| 189 | +- **Concurrent builds**: Thread-safe with RWMutex protection |
| 190 | +- **Shutdown timeout**: 5 seconds to flush pending spans |
| 191 | +- **Batch export**: Spans are batched for efficient export |
| 192 | + |
| 193 | +## Troubleshooting |
| 194 | + |
| 195 | +### No spans appearing in backend |
| 196 | + |
| 197 | +1. Verify OTLP endpoint is reachable: |
| 198 | + ```bash |
| 199 | + curl -v http://localhost:4318/v1/traces |
| 200 | + ``` |
| 201 | + |
| 202 | +2. Check leeway logs for warnings: |
| 203 | + ```bash |
| 204 | + leeway build :package 2>&1 | grep -i otel |
| 205 | + ``` |
| 206 | + |
| 207 | +3. Verify environment variables: |
| 208 | + ```bash |
| 209 | + echo $OTEL_EXPORTER_OTLP_ENDPOINT |
| 210 | + ``` |
| 211 | + |
| 212 | +### Invalid trace context errors |
| 213 | + |
| 214 | +Validate traceparent format: |
| 215 | +``` |
| 216 | +Format: 00-{32-hex-trace-id}-{16-hex-span-id}-{2-hex-flags} |
| 217 | +Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 |
| 218 | +``` |
| 219 | + |
| 220 | +### Spans not linked to parent |
| 221 | + |
| 222 | +Ensure both `traceparent` and `tracestate` (if present) are provided: |
| 223 | +```bash |
| 224 | +leeway build :package \ |
| 225 | + --trace-parent="00-..." \ |
| 226 | + --trace-state="..." |
| 227 | +``` |
| 228 | + |
| 229 | +## Implementation Details |
| 230 | + |
| 231 | +### Thread Safety |
| 232 | + |
| 233 | +- Single `sync.RWMutex` protects `packageCtxs` and `packageSpans` maps |
| 234 | +- Safe for concurrent package builds |
| 235 | +- Read locks for lookups, write locks for modifications |
| 236 | + |
| 237 | +### Shutdown |
| 238 | + |
| 239 | +- Automatic shutdown with 5-second timeout |
| 240 | +- Registered as deferred function in `getBuildOpts` |
| 241 | +- Ensures all spans are flushed before exit |
| 242 | + |
| 243 | +### Testing |
| 244 | + |
| 245 | +Tests use in-memory exporters (`tracetest.NewInMemoryExporter()`) to verify: |
| 246 | +- Span creation and hierarchy |
| 247 | +- Attribute correctness |
| 248 | +- Concurrent package builds |
| 249 | +- Parent context propagation |
| 250 | +- Graceful degradation with nil tracer |
| 251 | + |
| 252 | +## Future Enhancements |
| 253 | + |
| 254 | +- Phase-level spans for detailed timing |
| 255 | +- Custom span events for build milestones |
| 256 | +- Metrics integration (build duration histograms, cache hit rates) |
| 257 | +- Sampling configuration |
| 258 | +- Additional exporters (Zipkin, Prometheus) |
0 commit comments