[WEB-7447] feat: migrate CE telemetry from OTLP traces to OTLP metrics#9156
Conversation
Replace span-based tracing (tracer.py) with OTLP observable gauges, mirroring the approach already used in plane-ee. Key changes: - Add otlp_endpoints.py — shared gRPC/HTTP endpoint helpers - Add telemetry_metrics.py — push_instance_metrics task using MeterProvider + observable gauges (service name: plane-ce-api) - User count excludes bots (is_bot=False) - Page count excludes bot-owned private pages only - Domain derived from WEB_URL env var - Celery beat entry replaced with timedelta schedule + configurable METRICS_PUSH_INTERVAL_MINUTES (default 360 min) - Add explicit opentelemetry-exporter-otlp-proto-grpc dep - Delete tracer.py and telemetry.py (no longer needed) Co-authored-by: Plane AI <noreply@plane.so>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR replaces scheduled OpenTelemetry tracing with a configurable periodic OTLP metrics exporter: it adds OTLP endpoint helpers, a Celery task that registers observable gauges for instance and workspace counts and exports them, and updates Celery scheduling and wiring to run the new task. ChangesTelemetry Metrics Collection
Sequence DiagramsequenceDiagram
participant CeleryBeat as Celery Beat
participant PushTask as push_instance_metrics
participant Collector as _collect_and_push_metrics
participant DB as Database
participant MeterProvider as MeterProvider
participant Exporter as OTLP Exporter
CeleryBeat->>PushTask: Trigger
PushTask->>Collector: Collect and push metrics
Collector->>DB: Check Instance and telemetry enabled
Collector->>DB: Query counts
Collector->>MeterProvider: Register gauges
MeterProvider->>Exporter: Force flush
Exporter-->>Collector: Exported metrics
Collector-->>PushTask: Complete or log exception
PushTask-->>CeleryBeat: Done
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Linked to Plane Work Item(s) This comment was auto-generated by Plane |
There was a problem hiding this comment.
Pull request overview
This PR migrates Plane Community Edition telemetry from OTLP trace spans to OTLP metrics (observable gauges) and updates Celery scheduling and dependencies to support periodic metrics export to an OTEL collector.
Changes:
- Replaces the former span-based telemetry task with a new
push_instance_metricsOTLP metrics task (instance + workspace gauges, bot filtering, domain derivation). - Adds shared OTLP endpoint helpers for consistent gRPC/HTTP targeting.
- Updates Celery beat scheduling and adds an explicit OTLP gRPC exporter dependency.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/api/requirements/base.txt | Adds explicit OTLP gRPC exporter dependency. |
| apps/api/plane/utils/telemetry.py | Removes old tracer initialization utilities. |
| apps/api/plane/utils/otlp_endpoints.py | Adds helpers to derive OTLP gRPC and HTTP metrics endpoints. |
| apps/api/plane/settings/common.py | Updates Celery imports from tracer task to metrics task. |
| apps/api/plane/license/management/commands/register_instance.py | Triggers metrics push after instance registration instead of traces. |
| apps/api/plane/license/bgtasks/tracer.py | Removes old span-based telemetry task. |
| apps/api/plane/license/bgtasks/telemetry_metrics.py | Adds new OTLP metrics collection/export task (instance + workspace gauges). |
| apps/api/plane/celery.py | Updates beat schedule to configurable interval for metrics push. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/plane/celery.py`:
- Around line 24-27: Replace the direct int(...) cast for
METRICS_PUSH_INTERVAL_MINUTES with guarded parsing: read
os.environ.get("METRICS_PUSH_INTERVAL_MINUTES"), try to convert to int inside a
try/except ValueError (and TypeError), enforce a minimum of 1 (clamp values <=0
to 1), and fall back to the safe default 360 if parsing fails; also emit a
warning via the module logger indicating the bad value and the default being
used. Use the METRICS_PUSH_INTERVAL_MINUTES symbol to assign the final validated
integer and keep scheduling logic unchanged.
In `@apps/api/plane/license/bgtasks/telemetry_metrics.py`:
- Around line 218-224: The loop is issuing six separate count queries per
Workspace (Project, Issue, Module, Cycle, WorkspaceMember, Page) causing N+1
load; instead perform grouped counts up-front using Django ORM aggregations
(e.g., Model.objects.filter(...).values("workspace").annotate(cnt=Count("id"))
or conditional Count for Page with exclude filter) to build maps from workspace
id to counts, then iterate the Workspace queryset
(Workspace.objects.all()[:WORKSPACE_METRICS_LIMIT]) and emit metrics using the
precomputed counts; reference the models Project, Issue, Module, Cycle,
WorkspaceMember, Page and the constant WORKSPACE_METRICS_LIMIT in
telemetry_metrics.py when replacing per-workspace .filter(...).count() calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aeda61c4-82a4-4e7b-b4bb-13a1d2aa99ae
📒 Files selected for processing (8)
apps/api/plane/celery.pyapps/api/plane/license/bgtasks/telemetry_metrics.pyapps/api/plane/license/bgtasks/tracer.pyapps/api/plane/license/management/commands/register_instance.pyapps/api/plane/settings/common.pyapps/api/plane/utils/otlp_endpoints.pyapps/api/plane/utils/telemetry.pyapps/api/requirements/base.txt
💤 Files with no reviewable changes (2)
- apps/api/plane/utils/telemetry.py
- apps/api/plane/license/bgtasks/tracer.py
- harden grpc_endpoint_from_url for scheme-less OTLP_ENDPOINT values (e.g. "telemetry.plane.so:4317") by prepending "//" before urlparse - fix WEB_URL domain extraction for scheme-less values with same approach - replace N+1 workspace count queries (6×N) with 6 batched annotate(Count) aggregation queries — reduces DB load significantly at WORKSPACE_METRICS_LIMIT - add deterministic ordering (order_by created_at) to workspace slice - harden METRICS_PUSH_INTERVAL_MINUTES env parsing with try/except guard and positive-value validation to avoid crash on malformed input Co-authored-by: Plane AI <noreply@plane.so>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/plane/celery.py`:
- Around line 26-34: The helper _get_metrics_push_interval_minutes currently
parses METRICS_PUSH_INTERVAL_MINUTES with int(raw) but does not cap excessively
large values, allowing an oversized integer to later raise OverflowError in
timedelta(minutes=...) (see Line 50); update _get_metrics_push_interval_minutes
to validate the parsed value is within a sane range (e.g., >0 and <= a defined
MAX_INTERVAL_MINUTES such as 525600 or another safe upper bound), and if it is
out of range or parsing fails, return the safe default 360 so
METRICS_PUSH_INTERVAL_MINUTES never causes timedelta to overflow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 89232e22-0954-4355-9783-9b9f2ec24827
📒 Files selected for processing (3)
apps/api/plane/celery.pyapps/api/plane/license/bgtasks/telemetry_metrics.pyapps/api/plane/utils/otlp_endpoints.py
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/api/plane/utils/otlp_endpoints.py
- apps/api/plane/license/bgtasks/telemetry_metrics.py
Add upper-bound check (10_000_000 minutes) and catch OverflowError alongside ValueError so an arbitrarily large env value cannot crash worker startup via timedelta(minutes=...) OverflowError. Co-authored-by: Plane AI <noreply@plane.so>
|
Actionable comments posted: 0 |
Summary
tracer.py) with OTLP metrics gauges, mirroring the approach already in plane-eeotlp_endpoints.py— shared gRPC/HTTP endpoint helpers (same as EE)telemetry_metrics.py—push_instance_metricsCelery task usingMeterProvider+ observable gauges (service name:plane-ce-api)is_bot=False)owned_by__is_bot=True, access=1)WEB_URLenv var instead of hardcoded empty fieldtimedeltaschedule with configurableMETRICS_PUSH_INTERVAL_MINUTESenv var (default 360 min)opentelemetry-exporter-otlp-proto-grpc==1.28.1dependencytracer.pyandtelemetry.py(no longer needed)Test plan
OTLP_ENDPOINT=https://telemetry.plane.town,METRICS_PUSH_INTERVAL_MINUTES=1from plane.license.bgtasks.telemetry_metrics import _collect_and_push_metrics; _collect_and_push_metrics()Successfully pushed metrics to OTEL collectorPLANE_COMMUNITYwith correct valuesCloses WEB-7447
Summary by CodeRabbit
New Features
Chores
Removed