Skip to content

Add anonymous telemetry heartbeat with PostHog#485

Merged
gjkim42 merged 1 commit intomainfrom
kelos-task-390
Mar 1, 2026
Merged

Add anonymous telemetry heartbeat with PostHog#485
gjkim42 merged 1 commit intomainfrom
kelos-task-390

Conversation

@kelos-bot
Copy link

@kelos-bot kelos-bot bot commented Feb 28, 2026

Summary

  • Add internal/telemetry package that collects anonymous aggregate data (task counts, feature adoption, scale metrics, token/cost usage) and sends it to PostHog
  • Add --telemetry-report, --telemetry-endpoint, and --telemetry-environment flags to the controller
  • Add --disable-heartbeat flag to kelos install for opt-out
  • Disable heartbeat in e2e CI
  • Add telemetry documentation to docs/reference.md

Data collected

All data is anonymous aggregates — no PII, repo URLs, prompts, or secrets:

Data Description
Installation ID Random UUID, generated once per cluster
Kelos version Installed controller version
Kubernetes version Cluster K8s version
Environment Configurable label (e.g., production, development)
Task counts Total tasks, breakdown by type and phase
Feature adoption Number of TaskSpawners, AgentConfigs, Workspaces, and source types in use
Scale Number of namespaces with Kelos resources
Usage totals Aggregate cost (USD), input tokens, and output tokens

Opting out

kelos install --disable-heartbeat

Test plan

  • make verify passes
  • make test passes — unit tests covering collection, sending, installation ID management, error handling, --disable-heartbeat flag, and CronJob filtering
  • CI e2e tests (heartbeat disabled)

Closes #390

🤖 Generated with Claude Code

@gjkim42
Copy link
Collaborator

gjkim42 commented Feb 28, 2026

/retest

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/telemetry/telemetry_test.go">

<violation number="1" location="internal/telemetry/telemetry_test.go:232">
P2: Bug: `t.Fatalf` called from httptest handler goroutine. `t.Fatal`/`t.Fatalf` must only be called from the goroutine running the test. In HTTP handlers spawned by `httptest.NewServer`, use `t.Errorf` instead, or capture the error and assert it in the test goroutine.</violation>
</file>

<file name="internal/telemetry/telemetry.go">

<violation number="1" location="internal/telemetry/telemetry.go:226">
P2: The error response body is discarded, making it hard to debug telemetry send failures. Read and include the response body in the error message, and drain it on the success path for proper connection cleanup.</violation>
</file>

<file name="internal/manifests/install.yaml">

<violation number="1" location="internal/manifests/install.yaml:312">
P2: The Job template is missing `activeDeadlineSeconds`. Without it, if the telemetry collection or HTTP send hangs (e.g., API server degradation, DNS issues), the job has no upper time bound and will keep retrying up to `backoffLimit` with no per-attempt or total deadline. Consider adding `activeDeadlineSeconds: 300` (or similar) to the `jobTemplate.spec` to ensure the job is terminated if it runs too long.</violation>
</file>

<file name="cmd/kelos-controller/main.go">

<violation number="1" location="cmd/kelos-controller/main.go:97">
P2: The one-shot telemetry context has no timeout. If the Kubernetes API server is slow or unresponsive, the process could hang indefinitely. Add a context with a deadline (e.g., 2 minutes) to bound the total runtime of the telemetry report.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@kelos-bot kelos-bot bot force-pushed the kelos-task-390 branch from b4f8352 to 30f384c Compare March 1, 2026 14:50
@kelos-bot kelos-bot bot changed the title Add anonymous phone-home telemetry via daily CronJob Add anonymous phone-home telemetry via PostHog and daily CronJob Mar 1, 2026
@gjkim42 gjkim42 force-pushed the kelos-task-390 branch 2 times, most recently from c68520b to 7073e2c Compare March 1, 2026 23:10
@gjkim42 gjkim42 added priority/important-longterm triage-accepted kind/feature Categorizes issue or PR as related to a new feature labels Mar 1, 2026
@github-actions github-actions bot removed needs-triage needs-kind Indicates an issue or PR lacks a kind/* label needs-priority labels Mar 1, 2026
@gjkim42 gjkim42 enabled auto-merge March 1, 2026 23:12
@gjkim42 gjkim42 changed the title Add anonymous phone-home telemetry via PostHog and daily CronJob Add anonymous telemetry heartbeat with PostHog Mar 1, 2026
Collect anonymous, aggregate usage data (task counts, feature adoption,
scale metrics) via a daily CronJob and send it to PostHog. No personal
data is collected.

- Add telemetry package with collection, PostHog sending, and installation ID persistence
- Add --telemetry-report, --telemetry-endpoint, and --telemetry-environment flags to the controller
- Add --disable-heartbeat flag to kelos install to opt out
- Disable heartbeat in e2e CI
- Add telemetry documentation to docs/reference.md
@gjkim42 gjkim42 disabled auto-merge March 1, 2026 23:17
@gjkim42 gjkim42 enabled auto-merge March 1, 2026 23:18
@gjkim42 gjkim42 added this pull request to the merge queue Mar 1, 2026
Merged via the queue into main with commit 76acd31 Mar 1, 2026
6 checks passed
@gjkim42 gjkim42 deleted the kelos-task-390 branch March 1, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

telemetry for real usage detection

1 participant