Skip to content

Conversation

@lujunsan
Copy link
Contributor

PR description:

Problem

Operator-deployed MCP server proxies were not sending tool count metrics to the usage API, despite:

  • The usage metrics middleware being properly configured and active
  • Periodic 15-minute flushes working correctly for local deployments
  • The operator controller itself successfully sending metrics

Root Cause

Operator-deployed proxies run in distroless containers with read-only root filesystems and cannot read the ~/.local/share/toolhive/updates.json file where anonymous IDs are stored.

The flow was:

  1. TryGetAnonymousID() attempts to read the file but it doesn't exist
  2. Returns empty string (doesn't generate ID - read-only operation)
  3. Empty string gets cached in NewClient() at startup
  4. SendMetrics() sees empty anonymous_id and returns nil (skips sending)
  5. Metrics never reach the API

Solution

Modified pkg/usagemetrics/client.go to detect operator environments using Kubernetes environment variables. When running in K8s and anonymous_id is empty, use the default value "operator-proxy" instead of skipping the send.

This allows operator-deployed proxies to send metrics with a consistent identifier while local deployments still wait for the version check to create the anonymous ID file.

This also means we lose traceability of operator anonymous_ids for usage metrics; anonymous_id isn't a mandatory field to understand usage metrics, but can still be useful. A better solution might exist, but this unblocks usage metrics gathering for operator.

Testing

Tested locally with Kind cluster:

  1. Deployed operator and fetch server
  2. Made tool calls via Cursor
  3. Verified metrics appeared in update service with anonymous_id=operator-proxy
  4. Status 200 responses confirmed successful delivery

Signed-off-by: lujunsan <luisjuncaldev@gmail.com>
@lujunsan lujunsan requested review from ChrisJBurns and dmjb November 18, 2025 11:38
@lujunsan
Copy link
Contributor Author

@ChrisJBurns would appreciate a validation and look into this one, can you think of a better solution that isn't overly complex?

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 62.50000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.45%. Comparing base (63c8c63) to head (d7cbb74).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/usagemetrics/client.go 62.50% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2635      +/-   ##
==========================================
- Coverage   55.47%   55.45%   -0.03%     
==========================================
  Files         312      312              
  Lines       29714    29717       +3     
==========================================
- Hits        16485    16479       -6     
- Misses      11789    11801      +12     
+ Partials     1440     1437       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lujunsan lujunsan merged commit 564c44a into main Nov 18, 2025
84 of 89 checks passed
@lujunsan lujunsan deleted the fix-operator-proxy-usage-metrics branch November 18, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants