Skip to content

[WEB-7447] feat: migrate CE telemetry from OTLP traces to OTLP metrics#9156

Merged
sriramveeraghanta merged 3 commits into
previewfrom
feat/telemetry-traces-to-metrics
May 28, 2026
Merged

[WEB-7447] feat: migrate CE telemetry from OTLP traces to OTLP metrics#9156
sriramveeraghanta merged 3 commits into
previewfrom
feat/telemetry-traces-to-metrics

Conversation

@mguptahub
Copy link
Copy Markdown
Contributor

@mguptahub mguptahub commented May 28, 2026

Summary

  • Replaces span-based OTLP tracing (tracer.py) with OTLP metrics gauges, mirroring the approach already in plane-ee
  • Adds otlp_endpoints.py — shared gRPC/HTTP endpoint helpers (same as EE)
  • Adds telemetry_metrics.pypush_instance_metrics Celery task using MeterProvider + observable gauges (service name: plane-ce-api)
  • User count excludes bot accounts (is_bot=False)
  • Page count excludes bot-owned private pages only (owned_by__is_bot=True, access=1)
  • Domain derived from WEB_URL env var instead of hardcoded empty field
  • Celery beat entry updated to timedelta schedule with configurable METRICS_PUSH_INTERVAL_MINUTES env var (default 360 min)
  • Adds explicit opentelemetry-exporter-otlp-proto-grpc==1.28.1 dependency
  • Deletes tracer.py and telemetry.py (no longer needed)

Test plan

  • Set OTLP_ENDPOINT=https://telemetry.plane.town, METRICS_PUSH_INTERVAL_MINUTES=1
  • Trigger task: from plane.license.bgtasks.telemetry_metrics import _collect_and_push_metrics; _collect_and_push_metrics()
  • Confirm log: Successfully pushed metrics to OTEL collector
  • Verify user count excludes bots, page count excludes bot-owned private pages
  • Confirm metrics appear in Grafana under PLANE_COMMUNITY with correct values

Closes WEB-7447

Summary by CodeRabbit

  • New Features

    • Periodic instance and workspace metrics collection and export via OpenTelemetry (OTLP), including counts of users, workspaces, projects, issues, modules, cycles, and pages.
    • Configurable metrics push interval via environment variable.
  • Chores

    • Replaced trace-focused background job with metrics-focused export and updated startup task loading.
    • Added OTLP endpoint utilities and added OTLP metrics exporter to runtime requirements.
  • Removed

    • Legacy tracing initialization utilities and tracer background task.

Review Change Stack

Replace span-based tracing (tracer.py) with OTLP observable gauges,
mirroring the approach already used in plane-ee. Key changes:

- Add otlp_endpoints.py — shared gRPC/HTTP endpoint helpers
- Add telemetry_metrics.py — push_instance_metrics task using
  MeterProvider + observable gauges (service name: plane-ce-api)
- User count excludes bots (is_bot=False)
- Page count excludes bot-owned private pages only
- Domain derived from WEB_URL env var
- Celery beat entry replaced with timedelta schedule +
  configurable METRICS_PUSH_INTERVAL_MINUTES (default 360 min)
- Add explicit opentelemetry-exporter-otlp-proto-grpc dep
- Delete tracer.py and telemetry.py (no longer needed)

Co-authored-by: Plane AI <noreply@plane.so>
Copilot AI review requested due to automatic review settings May 28, 2026 10:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32aa8cfc-c819-402d-be0b-a7c9ca060213

📥 Commits

Reviewing files that changed from the base of the PR and between 841561e and 389d4ac.

📒 Files selected for processing (1)
  • apps/api/plane/celery.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/plane/celery.py

📝 Walkthrough

Walkthrough

This PR replaces scheduled OpenTelemetry tracing with a configurable periodic OTLP metrics exporter: it adds OTLP endpoint helpers, a Celery task that registers observable gauges for instance and workspace counts and exports them, and updates Celery scheduling and wiring to run the new task.

Changes

Telemetry Metrics Collection

Layer / File(s) Summary
OTLP Endpoint Utilities
apps/api/plane/utils/otlp_endpoints.py, apps/api/requirements/base.txt
New helpers to derive OTLP gRPC host:port and HTTP metrics URL; adds opentelemetry-exporter-otlp-proto-grpc==1.28.1.
Metrics Collection and Export Implementation
apps/api/plane/license/bgtasks/telemetry_metrics.py
New Celery task module that builds an OTLP metrics exporter, constructs Resource/attribute sets from Instance/WEB_URL, registers observable gauges for instance and workspace metrics, forces synchronous flush with timeout, logs errors, and always shuts down the provider.
Celery Beat Schedule Configuration
apps/api/plane/celery.py
Add schedule and timedelta usage, parse METRICS_PUSH_INTERVAL_MINUTES from env (default 360) with validation/cap, and schedule push-instance-metrics on a fixed interval.
Settings and Management Command Wiring
apps/api/plane/settings/common.py, apps/api/plane/license/management/commands/register_instance.py
Update CELERY_IMPORTS to load plane.license.bgtasks.telemetry_metrics and enqueue push_instance_metrics during instance registration instead of the removed tracing task.

Sequence Diagram

sequenceDiagram
  participant CeleryBeat as Celery Beat
  participant PushTask as push_instance_metrics
  participant Collector as _collect_and_push_metrics
  participant DB as Database
  participant MeterProvider as MeterProvider
  participant Exporter as OTLP Exporter
  
  CeleryBeat->>PushTask: Trigger
  PushTask->>Collector: Collect and push metrics
  Collector->>DB: Check Instance and telemetry enabled
  Collector->>DB: Query counts
  Collector->>MeterProvider: Register gauges
  MeterProvider->>Exporter: Force flush
  Exporter-->>Collector: Exported metrics
  Collector-->>PushTask: Complete or log exception
  PushTask-->>CeleryBeat: Done
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 From traces to metrics, I hop and sing,
Gauges count the things that make the system spring.
OTLP listens, exporters hum,
Celery drums as intervals come.
Flush, log, and sleep — hop, hop — hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: migrating CE telemetry from OTLP traces to OTLP metrics, which is exactly what the PR accomplishes across all modified files.
Description check ✅ Passed The description includes most template sections with substantial detail (Summary, Test plan, References) but is missing explicit checkboxes for Type of Change and lacks dedicated Screenshots/Test Scenarios sections, though testing details are integrated into the Summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/telemetry-traces-to-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@makeplane
Copy link
Copy Markdown

makeplane Bot commented May 28, 2026

Linked to Plane Work Item(s)

This comment was auto-generated by Plane

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Plane Community Edition telemetry from OTLP trace spans to OTLP metrics (observable gauges) and updates Celery scheduling and dependencies to support periodic metrics export to an OTEL collector.

Changes:

  • Replaces the former span-based telemetry task with a new push_instance_metrics OTLP metrics task (instance + workspace gauges, bot filtering, domain derivation).
  • Adds shared OTLP endpoint helpers for consistent gRPC/HTTP targeting.
  • Updates Celery beat scheduling and adds an explicit OTLP gRPC exporter dependency.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
apps/api/requirements/base.txt Adds explicit OTLP gRPC exporter dependency.
apps/api/plane/utils/telemetry.py Removes old tracer initialization utilities.
apps/api/plane/utils/otlp_endpoints.py Adds helpers to derive OTLP gRPC and HTTP metrics endpoints.
apps/api/plane/settings/common.py Updates Celery imports from tracer task to metrics task.
apps/api/plane/license/management/commands/register_instance.py Triggers metrics push after instance registration instead of traces.
apps/api/plane/license/bgtasks/tracer.py Removes old span-based telemetry task.
apps/api/plane/license/bgtasks/telemetry_metrics.py Adds new OTLP metrics collection/export task (instance + workspace gauges).
apps/api/plane/celery.py Updates beat schedule to configurable interval for metrics push.

Comment thread apps/api/plane/utils/otlp_endpoints.py
Comment thread apps/api/plane/license/bgtasks/telemetry_metrics.py Outdated
Comment thread apps/api/plane/license/bgtasks/telemetry_metrics.py Outdated
Comment thread apps/api/plane/license/bgtasks/telemetry_metrics.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/plane/celery.py`:
- Around line 24-27: Replace the direct int(...) cast for
METRICS_PUSH_INTERVAL_MINUTES with guarded parsing: read
os.environ.get("METRICS_PUSH_INTERVAL_MINUTES"), try to convert to int inside a
try/except ValueError (and TypeError), enforce a minimum of 1 (clamp values <=0
to 1), and fall back to the safe default 360 if parsing fails; also emit a
warning via the module logger indicating the bad value and the default being
used. Use the METRICS_PUSH_INTERVAL_MINUTES symbol to assign the final validated
integer and keep scheduling logic unchanged.

In `@apps/api/plane/license/bgtasks/telemetry_metrics.py`:
- Around line 218-224: The loop is issuing six separate count queries per
Workspace (Project, Issue, Module, Cycle, WorkspaceMember, Page) causing N+1
load; instead perform grouped counts up-front using Django ORM aggregations
(e.g., Model.objects.filter(...).values("workspace").annotate(cnt=Count("id"))
or conditional Count for Page with exclude filter) to build maps from workspace
id to counts, then iterate the Workspace queryset
(Workspace.objects.all()[:WORKSPACE_METRICS_LIMIT]) and emit metrics using the
precomputed counts; reference the models Project, Issue, Module, Cycle,
WorkspaceMember, Page and the constant WORKSPACE_METRICS_LIMIT in
telemetry_metrics.py when replacing per-workspace .filter(...).count() calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aeda61c4-82a4-4e7b-b4bb-13a1d2aa99ae

📥 Commits

Reviewing files that changed from the base of the PR and between 0acb32e and 71a05e7.

📒 Files selected for processing (8)
  • apps/api/plane/celery.py
  • apps/api/plane/license/bgtasks/telemetry_metrics.py
  • apps/api/plane/license/bgtasks/tracer.py
  • apps/api/plane/license/management/commands/register_instance.py
  • apps/api/plane/settings/common.py
  • apps/api/plane/utils/otlp_endpoints.py
  • apps/api/plane/utils/telemetry.py
  • apps/api/requirements/base.txt
💤 Files with no reviewable changes (2)
  • apps/api/plane/utils/telemetry.py
  • apps/api/plane/license/bgtasks/tracer.py

Comment thread apps/api/plane/celery.py
Comment thread apps/api/plane/license/bgtasks/telemetry_metrics.py Outdated
- harden grpc_endpoint_from_url for scheme-less OTLP_ENDPOINT values
  (e.g. "telemetry.plane.so:4317") by prepending "//" before urlparse
- fix WEB_URL domain extraction for scheme-less values with same approach
- replace N+1 workspace count queries (6×N) with 6 batched annotate(Count)
  aggregation queries — reduces DB load significantly at WORKSPACE_METRICS_LIMIT
- add deterministic ordering (order_by created_at) to workspace slice
- harden METRICS_PUSH_INTERVAL_MINUTES env parsing with try/except guard
  and positive-value validation to avoid crash on malformed input

Co-authored-by: Plane AI <noreply@plane.so>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/plane/celery.py`:
- Around line 26-34: The helper _get_metrics_push_interval_minutes currently
parses METRICS_PUSH_INTERVAL_MINUTES with int(raw) but does not cap excessively
large values, allowing an oversized integer to later raise OverflowError in
timedelta(minutes=...) (see Line 50); update _get_metrics_push_interval_minutes
to validate the parsed value is within a sane range (e.g., >0 and <= a defined
MAX_INTERVAL_MINUTES such as 525600 or another safe upper bound), and if it is
out of range or parsing fails, return the safe default 360 so
METRICS_PUSH_INTERVAL_MINUTES never causes timedelta to overflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89232e22-0954-4355-9783-9b9f2ec24827

📥 Commits

Reviewing files that changed from the base of the PR and between 71a05e7 and 841561e.

📒 Files selected for processing (3)
  • apps/api/plane/celery.py
  • apps/api/plane/license/bgtasks/telemetry_metrics.py
  • apps/api/plane/utils/otlp_endpoints.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/api/plane/utils/otlp_endpoints.py
  • apps/api/plane/license/bgtasks/telemetry_metrics.py

Comment thread apps/api/plane/celery.py
Add upper-bound check (10_000_000 minutes) and catch OverflowError alongside
ValueError so an arbitrarily large env value cannot crash worker startup via
timedelta(minutes=...) OverflowError.

Co-authored-by: Plane AI <noreply@plane.so>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@sriramveeraghanta sriramveeraghanta merged commit 095b1aa into preview May 28, 2026
13 checks passed
@sriramveeraghanta sriramveeraghanta deleted the feat/telemetry-traces-to-metrics branch May 28, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants