Skip to content

feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI#320

Open
nuthalapativarun wants to merge 7 commits intomicrosoft:mainfrom
nuthalapativarun:feat/cost-tracking-issue-2
Open

feat: wire LLM cost tracking end-to-end — backend, API, WebSocket, and CostDashboard UI#320
nuthalapativarun wants to merge 7 commits intomicrosoft:mainfrom
nuthalapativarun:feat/cost-tracking-issue-2

Conversation

@nuthalapativarun
Copy link
Copy Markdown

Feature: LLM Cost & Token Usage Tracking

This is PR 2 of 2. Depends on #319 which adds LLMCallEvent and CostThresholdExceededEvent to the event bus. Merge #319 first.

UFO already computes cost per LLM call in ufo/llm/base.py and returns it from every provider — but the value is discarded by callers. This PR wires that existing data all the way from the LLM call layer through the event bus, into a metrics observer, out through a FastAPI REST API and WebSocket, and into a live-updating React dashboard in the Galaxy Web UI.


What this PR does

Backend — LLM call layer (ufo/llm/)

  • base.py: get_cost_estimator() now returns CostResult(cost, prompt_tokens, completion_tokens) (a NamedTuple) instead of a bare float. Backward-compatible — callers that only used .cost still work.
  • openai.py, claude.py, gemini.py: Updated to unpack CostResult from get_cost_estimator().
  • llm_call.py: After each provider call, measures wall-clock duration and emits LLMCallEvent on the Galaxy event bus. Falls back gracefully (logs at DEBUG) if Galaxy is not running.

Backend — Observer (galaxy/session/observers/base_observer.py)

  • SessionMetricsObserver now subscribes to LLM_CALL_COMPLETED events and accumulates:
    • Running totals: cost, prompt tokens, completion tokens, API call count
    • Per-agent and per-model cost breakdowns
    • Rolling call log (capped at last 500 entries)
  • Configurable cost_alert_threshold (default 0.0 = disabled): emits CostThresholdExceededEvent once when the session total exceeds the threshold.

Backend — API (galaxy/webui/)

  • models/responses.py: Adds LLMCallRecord and SessionCostSummary Pydantic response models.
  • services/metrics_service.py: Thin service that reads from SessionMetricsObserver.metrics["llm_metrics"] and serialises to JSON or CSV.
  • routers/metrics.py: New FastAPI router:
    • GET /api/metrics/cost — returns SessionCostSummary for the active session
    • GET /api/metrics/cost/export?format=json|csv — downloads the full call log
  • Registered in server.py alongside existing routers.

Backend — WebSocket (galaxy/webui/websocket_observer.py)

  • EventSerializer extended to handle LLMCallEvent (broadcast as message_type: "llm_metrics_update") and CostThresholdExceededEvent (broadcast as message_type: "cost_alert").

Frontend — Zustand store (src/store/galaxyStore.ts)

  • New LLMMetrics interface and LLMCallRecord type.
  • llmMetrics slice with setLLMMetrics and appendLLMCall actions. appendLLMCall incrementally updates all aggregates client-side (no full-fetch needed on each event).
  • rightPanelTab union extended with 'cost' tab.

Frontend — WebSocket handlers (src/main.tsx)

  • handleLLMMetricsUpdate: maps llm_metrics_update WebSocket messages to appendLLMCall.
  • handleCostAlert: maps cost_alert messages to a pushNotification with severity: "warning".
  • GalaxyEvent interface extended with LLM-specific fields (removes all as any casts).

Frontend — UI (src/components/metrics/)

  • CostByModelChart.tsx: Tailwind-only horizontal bar chart, sorted by cost descending. Reused for both model and agent breakdowns.
  • CostDashboard.tsx: Live-updating panel showing:
    • Summary row: total cost, prompt tokens, completion tokens, API call count
    • Cost by model bar chart
    • Cost by agent bar chart
    • Collapsible recent-calls table (last 50, newest first)
    • Dismissible amber alert banner on cost_alert notifications
  • RightPanel.tsx: Adds a tab bar ("Constellation" / "Cost") to switch between the existing constellation view and the new cost dashboard.

Architecture

LLM Provider (openai / claude / gemini)
  └─ CostResult(cost, prompt_tokens, completion_tokens)
       └─ llm_call.py → emits LLMCallEvent on EventBus
            └─ SessionMetricsObserver
                 ├─ accumulates llm_metrics dict
                 └─ emits CostThresholdExceededEvent (if threshold set)
                      └─ WebSocketObserver → broadcasts to frontend
                           ├─ "llm_metrics_update" → appendLLMCall (Zustand)
                           └─ "cost_alert"         → pushNotification (Zustand)
                                └─ CostDashboard (React) — live updates
GET /api/metrics/cost         → SessionCostSummary (REST poll fallback)
GET /api/metrics/cost/export  → JSON or CSV download

Testing

  • All existing tests pass (no breaking changes — cost float still returned from get_completions()).
  • GET /api/metrics/cost returns 404 when no session is active; 200 with live data during a session.
  • WebSocket cost updates verified by running a session and observing the CostDashboard panel update in real time.

- Add CostResult NamedTuple to base.py; get_cost_estimator now returns
  CostResult(cost, prompt_tokens, completion_tokens) instead of float
- Update openai.py (_chat_completion, _responses_completion,
  _chat_completion_operator, OperatorServicePreview) to return CostResult
- Update claude.py to accumulate tokens across n completions, return CostResult
- Update gemini.py to return CostResult
- In get_completions(), wrap chat_completion() with wall-time measurement
  and emit LLMCallEvent via Galaxy event bus (best-effort, non-blocking)
- Return cost float unchanged to callers for backward compatibility
- Add LLMCallEvent/CostThresholdExceededEvent dataclasses and new EventType
  values to galaxy/core/events.py (Issue 1 changes included for this branch)
- Add llm_metrics dict to SessionMetricsObserver.__init__()
  tracking total_cost, token counts, per-agent/model breakdowns,
  and a capped call log (last 500 entries)
- Handle LLMCallEvent in on_event() via _handle_llm_call_event()
- Add cost_alert_threshold param: emits CostThresholdExceededEvent
  once when total_cost exceeds threshold (one-shot, no spam)
Expose LLM cost and token metrics via two HTTP endpoints:
- GET /api/metrics/cost → SessionCostSummary (per-agent, per-model breakdown)
- GET /api/metrics/cost/export?format=json|csv → full call log download

MetricsService reads directly from SessionMetricsObserver._metrics_observer,
keeping the service layer thin. Both endpoints require X-API-Key auth.
…ocket

Add LLMCallEvent and CostThresholdExceededEvent handling to EventSerializer
so the WebSocketObserver forwards them to connected clients with the
frontend-specific message_type fields:
- LLMCallEvent → message_type: "llm_metrics_update"
- CostThresholdExceededEvent → message_type: "cost_alert"
Add LLMMetrics state to the Galaxy store with two actions:
- setLLMMetrics: replace full metrics snapshot
- appendLLMCall: incrementally update totals and per-agent/model costs

Handle two new WebSocket message types in main.tsx:
- "llm_metrics_update" → appendLLMCall (real-time accumulation)
- "cost_alert" → pushNotification with warning severity
New components under src/components/metrics/:
- CostByModelChart: pure-Tailwind horizontal bar chart (no extra deps)
- CostDashboard: summary row (cost/tokens/calls), cost-by-model chart,
  cost-by-agent chart, collapsible recent-calls table, cost alert banner

Mount in RightPanel via a two-tab bar (Constellation | Cost). Extends
rightPanelTab type with 'cost' variant to drive tab routing.
- Fix NameError in MetricsService.get_cost_summary() where observer
  variable was referenced before assignment; refetch from session directly
- Add missing threshold= arg to CostThresholdExceededEvent constructor
  in SessionMetricsObserver (would raise TypeError at runtime)
- Fix publish() -> publish_event() call on event bus in base_observer
- Add LLM event fields to GalaxyEvent interface; remove (event as any)
  casts in handleLLMMetricsUpdate and handleCostAlert handlers
- Log failed LLMCallEvent emissions at DEBUG instead of silently swallowing
- Use stable composite key in RecentCallsTable rows instead of array index
- Fix JSX formatting in RightPanel: </div>} -> </div>\n)} style
- Update CostByModelChart docstring to reflect generic key usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant