Skip to content

feat(otel): add OpenTelemetry ingest, query, and frontend traces UI#18

Merged
dviejokfs merged 11 commits intomainfrom
feat/add-otel
Feb 27, 2026
Merged

feat(otel): add OpenTelemetry ingest, query, and frontend traces UI#18
dviejokfs merged 11 commits intomainfrom
feat/add-otel

Conversation

@dviejokfs
Copy link
Copy Markdown
Contributor

Description

Add a complete OpenTelemetry observability stack to Temps — from OTLP/HTTP protobuf ingest through TimescaleDB storage to a frontend traces visualization UI.

Backend (temps-otel crate)

  • OTLP/HTTP protobuf ingest for traces, metrics, and logs (gzip/zstd decompression)
  • Dual auth: API keys (tk_) and deployment tokens (dt_), with header-based and path-based ingest routes
  • TimescaleDB storage with hypertables, continuous aggregates, compression, and retention
  • Query API: filter/list spans, get trace, query metrics with time_bucket, query logs, pipeline stats, health summaries, insights
  • Rate limiting, storage quota checks, anomaly detection, health compute service
  • OpenAPI annotations on all 12+ endpoints, 117 passing unit tests

Auth & Permissions

  • OtelRead/OtelWrite permissions added
  • deployment_id added to deployment tokens for full OTel context propagation
  • Migration for the new column

Frontend

  • Traces list with filtering (time range, service, status, trace ID search) and trace-level error aggregation
  • Trace detail with span waterfall visualization, span detail panel, and refresh button
  • Setup section with environment selector, OTLP endpoint, and Next.js code snippets
  • Sidebar nav item added

Notable fixes

  • Protobuf Span.flags changed from uint32 to fixed32 per OTLP v1.1.0+ spec
  • Removed server-side tail sampling — sampling is the client SDK's responsibility
  • Fixed TraceDetail data extraction (data.data not data.spans)
  • Fixed status code comparisons (uppercase ERROR/OK from API)
  • Fixed waterfall duration label visibility for wide spans

Type of change

  • New feature (non-breaking change that adds functionality)

Checklist

  • I have written tests that cover the changes
  • All new and existing tests pass (cargo test --lib)
  • cargo check --lib passes with no warnings
  • My commits follow the Conventional Commits format
  • I have updated documentation where necessary

Related issues

Ref #17

… integration

- Introduced `temps-otel` crate to the workspace and updated dependencies in `Cargo.toml`.
- Added `OtelRead` and `OtelWrite` permissions to the `Permission` enum in `temps-auth`.
- Registered `OtelPlugin` in the console API for OpenTelemetry metrics, traces, and logs collection.
- Created migration for OpenTelemetry tables in the database.
- Updated relevant files to integrate OpenTelemetry functionality across the application.
- Added `.env` to `.gitignore` to prevent sensitive information from being tracked.
- Updated `Cargo.toml` to include new crates: `temps-environments`, `temps-screenshots`, and `temps-embeddings`.
- Added `tower` and `uuid` dependencies to `Cargo.lock` and `Cargo.toml`.
- Enhanced `CHANGELOG.md` with new features related to PostgreSQL backups and preset providers.
- Updated `docker-compose.yml` for PostgreSQL configuration to support WAL-G for backups.
- Improved CLI error handling and added source map management commands in `temps-cli`.
- Refined analytics event handling and introduced console event ingestion in `temps-analytics-events`.
- Added resource monitoring tab in the project sidebar and a dedicated monitoring settings page with per-environment CPU, memory, and disk metrics.
- Introduced `status_code_class` query parameter for proxy log stats endpoints to filter by status code classes (e.g., "2xx", "3xx").
- Implemented TimescaleDB compression and retention policies for the `proxy_logs` hypertable, optimizing data management.
- Enabled `cargo clippy` pre-commit hook to catch lint issues before CI, improving code quality.
- Updated various components and API types to support new monitoring functionalities and enhance user experience.
- Complete temps-otel crate: OTLP/HTTP protobuf ingest (traces, metrics, logs),
  query handlers, TimescaleDB storage, rate limiting, quota checks, anomaly
  detection, health summaries, and sidecar config generation
- Auth: support tk_ (API key) and dt_ (deployment token) authentication for
  OTel ingest with path-based and header-based routes
- Frontend: Traces list page with filtering (time range, service, status),
  trace detail page with span waterfall visualization and span detail panel,
  setup section with OTLP endpoint and Next.js code snippets
- Add deployment_id to deployment tokens for OTel context propagation
- Fix protobuf Span.flags from uint32 to fixed32 per OTLP v1.1.0+ spec
- Remove server-side tail sampling (sampling is client SDK responsibility)
- Add OtelRead/OtelWrite permissions, plugin registered in console
- 117 passing unit tests, zero clippy warnings
- Add protobuf-compiler installation to all CI jobs that compile the
  workspace (check, clippy, build-tests, unit-tests, integration-tests)
- Add temps-otel to unit-b test group
- Add OTel feature entry to CHANGELOG.md [Unreleased] section
- Document protoc and wasm-pack as prerequisites in CONTRIBUTING.md
  with platform-specific installation instructions
- Add changelog reminder to PR checklist
Add GET /otel/trace-summaries that returns one row per trace (grouped by
trace_id) with root span name, service name, deployment environment,
span count, error count, and duration. This fixes the pagination bug
where the old endpoint returned flat spans causing only ~5 traces to
display per page.

- Add TraceSummary type with deployment_environment field
- Add query_trace_summaries() and count_traces() to OtelStorage trait
- Implement both in TimescaleDB (GROUP BY trace_id with array_agg)
- Implement both in MockOtelStorage for tests
- Add TraceSummariesResponse handler with proper total count
- Register new route and OpenAPI annotations
- Update TracesList.tsx to use new endpoint (remove client-side groupByTrace)
- Show Environment column as badge when viewing all environments
- Change page size from 50 to 10 traces per page
- Auto-inject OTel env vars in workflow_planner for deployments
- TracesList: filters stack vertically on mobile (flex-col sm:flex-row),
  selects go full-width, hide Kind/Spans/Timestamp columns on mobile,
  compact pagination, overflow-x-auto on table
- TraceDetail: waterfall + detail panel stack vertically on mobile
  (flex-col lg:flex-row), detail panel goes full-width, span name
  column narrower on mobile, min-width on scrollable rows
- Add mobile responsiveness guidelines to CLAUDE.md
…ummaries

The deployment_environment column comes from the OTel resource attribute
which most SDKs don't set. JOIN deployments + environments tables to get
the actual environment name, falling back to the resource attribute via
COALESCE. Also qualify all column references with table alias since the
query now involves multiple tables.
…ent and proxying capabilities

- Added `temps-external-plugins` crate for managing standalone binary plugins, including discovery, lifecycle management, and HTTP proxying.
- Implemented `temps-plugin-sdk` crate to provide a standardized interface for plugin authors, including manifest definitions and service registration.
- Integrated external plugins into the main Temps application, allowing for dynamic loading and management via Unix domain sockets.
- Updated `Cargo.toml` and `Cargo.lock` to include new dependencies and workspace members for the external plugins system.
- Enhanced the console API to support graceful shutdown of external plugins and added routes for listing running plugins.
- Documented the new external plugin features in the CHANGELOG.md.
@dviejokfs dviejokfs merged commit 65b2e46 into main Feb 27, 2026
9 checks passed
@dviejokfs dviejokfs deleted the feat/add-otel branch April 3, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant