Skip to content

Add app-attributed usage display#292

Merged
lewtun merged 8 commits into
mainfrom
codex/usage-display
Jun 7, 2026
Merged

Add app-attributed usage display#292
lewtun merged 8 commits into
mainfrom
codex/usage-display

Conversation

@lewtun
Copy link
Copy Markdown
Member

@lewtun lewtun commented Jun 5, 2026

Adds ML Intern-attributed usage aggregation and a header popover showing current session, daily, and monthly spend split across Inference Providers and estimated HF Jobs usage.\n\nChecks run:\n- uv run ruff check .\n- uv run ruff format --check .\n- uv run pytest tests/unit/test_usage.py tests/unit/test_telemetry_usage.py\n- npm run build\n- Browser check for desktop/mobile header and zero-state popover

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Claude finished @lewtun's task in 3m 28s —— View job


Review: app-attributed usage display

No blocking issues — 2 P1, 3 P2. Solid PR: cross-user isolation is correct, auth is enforced on the session bucket, the live-update path and the persisted-truth refetch reconcile cleanly, and the price catalog is cached. Notes below.


P1 — Production (Mongo) usage path has no unit coverage

tests/unit/test_usage.py exercises only the in-memory runtime path (_NoopStore, test_usage.py:92). The path that actually runs in the deployed Space — MongoSessionStore.load_usage_events (session_persistence.py:375-408) — has zero coverage. That's where the non-trivial logic lives: the visibility != "deleted" filter, the dev short-circuit, the session_id → _id mapping, and the created_at range query. REVIEW.md calls out test coverage for new behavior as a focus area; this is the highest-value test to add. A fake-collection or mongomock-style test asserting the query shape and cross-user scoping would close the gap.

P1 — Month/session aggregation streams every event into Python and sorts for nothing

build_usage_response issues three separate load_usage_events calls (usage.py:246-265), and the Mongo implementation loads the full matching event set into a list and sums it in Python (usage.py:125-161). For a heavy user, the "month" bucket pulls every llm_call/hf_job_complete doc for the month over the wire on each popover open. A Mongo $group aggregation would do the summation server-side and return a handful of rows. Separately, event_cursor...sort("created_at", 1) at session_persistence.py:408 does no useful work — aggregate_usage_events is order-independent — so the sort is pure overhead and can be dropped now regardless of the larger refactor. Fix this →


P2 — HF Jobs cost uses wall-clock, including queue/startup

record_hf_job_complete derives billable_seconds from time.monotonic() - submit_ts (telemetry.py:193-194), which includes scheduling/startup time, not just billable compute. It's surfaced as "HF Jobs estimated" in the UI and the popover already disclaims it isn't your full invoice, so this is acceptable — just noting it will run slightly high versus the real Jobs bill.

P2 — Session-resume wipes runtime usage history (CLI/local only)

session_resume.py:263 resets logged_events to a single resumed_from marker. The in-memory usage path (_events_from_runtime_session, usage.py:182) reads logged_events, so after a resume the runtime totals reset. The deployed Space uses Mongo (events persist in session_events), so this only affects the no-persistence local/CLI mode. Worth a one-line comment near _events_from_runtime_session noting the limitation.

P2 — llm_call/hf_job_complete now handled in both SSE paths

Confirmed no streaming regression: these events were already flowing through send_event → broadcaster (session.py:160-178) and previously hit the transport's default case. Both the AI-SDK transport (sse-chat-transport.ts:320) and the reconnect path (useAgentChat.ts:607) now handle them, and neither enqueues a UI chunk, so text streaming is unaffected. Noting it only because it's easy to miss that the live increment and the onProcessingDone refetch don't double-count (the refetch replaces the whole object).


· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Claude finished @lewtun's task in 1m 43s —— View job


Re-review: app-attributed usage display (commit 788d2cc)

No blocking issues — no P0. This follow-up commit reworks usage-window bucketing for local timestamps. Per this repo's re-review policy I'm suppressing the P1/P2 items from the earlier review and reporting only new P0-class concerns on the delta. None found — the change is sound.

What the fix does (verified)

  • session.py:140,165,415 now stamp events with datetime.now().astimezone().isoformat(), so runtime trajectory events carry a tz-aware offset instead of a naive local string. In _normalize_event_timestamp (usage.py:56-66) these hit the dt.tzinfo is not None branch and convert cleanly to UTC, so new events bucket correctly regardless of the tz query param.
  • For genuinely naive timestamps (old events / tests), _normalize_event_timestamp interprets them in the browser timezone (usage.py:63-65) rather than blindly tagging them UTC — this is the actual bug being fixed (local-evening events previously leaking into the wrong day/month bucket). resolve_usage_windows (usage.py:96-116) computes the day/month boundaries in that same tz and converts to UTC, so window edges and event timestamps now share one frame of reference.
  • Invalid/empty tz is handled safely: _coerce_timezone (usage.py:47-53) and resolve_usage_windows (usage.py:102-105) both fall back to UTC on ZoneInfoNotFoundError/ValueError. The route passes tz straight from the query string (routes/agent.py:683,698) — no injection/SSRF surface since ZoneInfo validates against the system tzdata.

Consistency note (not blocking, not re-flagging)

The naive→browser-tz interpretation only runs on the in-memory runtime path; the Mongo path calls store.load_usage_events(...) without timezone_name (usage.py:250-256). That's correct — Mongo persists tz-aware UTC created_at, and the day/month window bounds handed to the range query are already UTC — so production bucketing stays right. Just flagging the asymmetry for future readers.

Couldn't run pytest here (sandbox denied the test command), but CI covers tests/unit/test_usage.py / test_telemetry_usage.py, and the new test_runtime_usage_interprets_naive_timestamps_in_browser_timezone case (test_usage.py:172-202) directly exercises the Europe/Zurich naive-timestamp path.

The two P1s from the prior review (Mongo-path test coverage, in-Python month aggregation) remain open as informational follow-ups — not blocking this PR.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Claude finished @lewtun's task in 2m 1s —— View job


Re-review: HF billing usage display (commit 41fc0d7)

No blocking issues — no P0. This follow-up adds authoritative HF account billing (usage-v2) alongside the existing app-attributed estimates. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • New _fetch_hf_billing_usage_v2 (usage.py:267-292) GETs a hardcoded endpoint (HF_BILLING_USAGE_V2_URL, usage.py:14) with only integer startDate/endDate params and a Bearer header — no user-controlled URL, host, or path, so no SSRF/injection surface. Errors are swallowed to None (httpx.HTTPError/ValueError), so a billing outage degrades to available: false rather than failing the whole /usage response.
  • The three window fetches (today/month/current_session) are created as tasks up front and awaited in a loop (usage.py:355-374), so they run concurrently. Each fetch catches its own exceptions, so the await task loop can't propagate and crash build_usage_response.
  • Nano vs micro scaling is correct and consistent with the field names: inference providers → usedNanoUsd / 1e9 (usage.py:62-63,243), jobs → usedMicroUsd / 1e6 (usage.py:66-67,247). Covered directly by test_account_bucket_from_hf_billing_usage_v2 (test_usage.py:82-105).

Auth — checked, not a concern

The token precedence on the usage route (agent.py:697-700) — user request token (no env fallback) → _user_hf_token(user) → request token with env fallback — is identical to the established dataset-upload pattern (agent.py:582-586), so this isn't a new approach. In the deployed Space each authenticated user carries their own cookie/header token, so the user's own billing is queried; the HF_TOKEN env fallback only applies in dev/server contexts. The route is auth-gated and session access is re-checked via _check_session_access before the persisted session-start lookup (agent.py:687-693), so the load_session in _load_persisted_session_started_at (usage.py:296-310) can't read another user's session.

Test coverage — adequate for new behavior

test_hf_account_usage_uses_session_start_for_current_delta (test_usage.py:233-296) monkeypatches the fetch and asserts window selection (session-start vs day vs month boundaries), per-window USD parsing, and the credits block — exactly the non-trivial new logic. The HTTP call itself is mocked, which is reasonable.

Notes (not blocking, not re-flagging)

  • current_session adds a third concurrent billing call and, on the Mongo path, an extra full load_session per popover open (usage.py:303) — consistent with the already-open in-Python-aggregation P1, not a new issue.
  • Field names (usedNanoUsd, usedMicroUsd, numRequests, totalMinutes) are inferred from the usage-v2 shape; if any drift, the bucket silently reads $0 rather than erroring — acceptable given the UI already disclaims these as estimates.

The two prior P1s (Mongo-path test coverage, in-Python month aggregation) remain open as informational follow-ups — not blocking.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 50s —— View job


Re-review: Simplify usage popover (commit 222c86e)

No blocking issues — no P0. This follow-up trims the popover to a single "Current session" section plus Inference credits, dropping the Today/This-month sections, the HF-account-total row, provider-request counts, and the calls/jobs row. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • AccountUsageSection loses the sessionDelta prop and its conditional "HF account delta/total" + "Provider requests" + "ML Intern calls / jobs" rows; "Inference Providers" is now the strong row (UsageMeter.tsx:91-101). Purely presentational — no data-flow change.
  • The render path now mounts one AccountUsageSection for current_session plus CreditsSection, removing the two extra sections and their <Divider/>s (UsageMeter.tsx:206-218).
  • No dead imports left: Divider is still used by CreditsSection (UsageMeter.tsx:111) and formatCount by the tokens row (UsageMeter.tsx:101), so tsc/npm run build won't trip on unused locals.
  • The pill still reads current_session total with the session telemetry fallback (UsageMeter.tsx:147-148) — unchanged.

Note (not blocking, not re-flagging)

The backend still computes today and month buckets in build_usage_response that the UI no longer renders — extra per-popover work that compounds the already-open in-Python month-aggregation P1 from the first review. Not a new issue and not worth re-flagging under convergence; if that aggregation P1 is picked up as a follow-up, dropping the now-unused buckets would be a natural part of it.

The two prior P1s (Mongo-path test coverage, in-Python month aggregation) remain open as informational follow-ups — not blocking this PR.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 33s —— View job


Re-review: Remove token row from usage popover (commit 94a6924)

No blocking issues — no P0. This follow-up drops the "ML Intern tokens" row and its now-unused formatCount helper from the popover. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • Removes the local formatCount helper (UsageMeter.tsx:33-35 pre-change) and its sole caller, the ML Intern tokens UsageRow (UsageMeter.tsx:101 pre-change). Pure presentational trim — no data-flow change.
  • No dead references left: grep for formatCount / total_tokens in UsageMeter.tsx returns nothing, and formatUsd remains in active use (UsageMeter.tsx:90,95,114,119,124,166), so tsc / npm run build won't trip on an unused local.
  • telemetry.total_tokens is still produced by the backend but simply no longer rendered — consistent with the prior "Simplify usage popover" commit's pattern of leaving backend buckets in place.

The two prior P1s (Mongo-path test coverage, in-Python month aggregation) remain open as informational follow-ups — not blocking this PR.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 25s —— View job


Re-review: Move session billing note to usage header (commit 3071aa6)

No blocking issues — no P0. This follow-up moves the "Session billing is inferred from HF account usage since session start" note from the popover body into the header caption. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • The header caption changes from "Current session billing and Inference Providers credits." to "Session billing is inferred from HF account usage since session start." (UsageMeter.tsx:193).
  • The duplicate caption that previously rendered in the body — gated on usage?.hf_account?.available (UsageMeter.tsx:208-212 pre-change) — is removed, so the note now lives in exactly one place.
  • Purely presentational — no data-flow, auth, or aggregation change. The only behavioral nuance is that the note is now unconditional (it previously only showed when HF billing was available); since it's a static disclaimer about how the session figure is derived, showing it in the zero/unavailable state is harmless and arguably clearer.
  • No dead references: the removed block was the sole second use of that string, and Typography/color remain in active use throughout the component, so tsc / npm run build won't trip on an unused local.

The two prior P1s (Mongo-path test coverage, in-Python month aggregation) remain open as informational follow-ups — not blocking this PR.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 1m 12s —— View job


Re-review: Optimize usage popover rollup loading (commit 9f84c9d)

No blocking issues — no P0. This follow-up adds an include_rollups flag so the popover (which now renders only the current-session figure + inference credits) stops paying for the unused today/month rollups, and adds the Mongo-path test coverage that was a prior open P1. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • New include_rollups: bool = True threads route → build_usage_response → _build_hf_account_usage (routes/agent.py:684,705, usage.py:510,551). The frontend always sends include_rollups=false (usageStore.ts:96), so production takes the compact path; the backend default stays True, so any other caller is unchanged — backward compatible.
  • When false, the two _load_usage_events rollup loads are skipped and today_events/month_events stay empty (usage.py:526-542), and the today HF billing fetch is dropped (usage.py:385-391). The month HF fetch is correctly retained because inference_providers_credits is derived from the month payload (usage.py:421-423) — the credits row the popover still shows. The session/current_session buckets are computed independently of the flag (usage.py:518-524, usage.py:392-398), so the pill's data is unaffected.
  • The skipped buckets degrade to zeros / None rather than missing keys (today/month still aggregate over empty lists at usage.py:564-575; hf_account.today is absent → serialized None), so the response shape is stable for any client still reading those fields. Directly asserted by test_compact_usage_skips_unused_rollup_loads (test_usage.py:362).
  • The sort("created_at", 1) removal at session_persistence.py:408 is safe — aggregate_usage_events is order-independent (confirmed in the first review), so this is pure overhead removal. The new test asserts cursors[0].sort_calls == [] (test_session_persistence.py:218).

Prior P1s — both addressed in this commit

  • Mongo-path test coverage (open since the first review): the new test_load_usage_events_* cases (test_session_persistence.py:170-239) exercise MongoSessionStore.load_usage_events against a fake collection, asserting the visibility != "deleted" filter, the user_id scoping vs. the dev short-circuit, the session_id → _id mapping, the created_at range query, and the dropped sort. This closes the gap.
  • In-Python month aggregation (open since the first review): not eliminated, but the heavy month/today rollup loads are now skipped on the hot popover path, which was the practical cost driver flagged. The summation itself remains in Python for callers that do request rollups — fine as a follow-up.

No new concerns on the delta. Nice to see both standing P1s knocked out here.
· branch codex/usage-display

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 37s —— View job


Re-review: Round usage costs to cents (commit 959ac11)

No blocking issues — no P0. This follow-up rounds all displayed USD figures to cents. Per this repo's re-review convergence policy I'm suppressing the earlier P1/P2 items and reporting only new P0-class concerns on the delta. None found.

What the commit does (verified)

  • formatUsd drops its compact parameter and now always uses maximumFractionDigits: 2 (UsageMeter.tsx:22-32), so sub-dollar figures that previously rendered with up to 4 decimals now render as cents. The < $0.01 floor is unchanged (UsageMeter.tsx:25), so tiny-but-nonzero amounts still read <$0.01 rather than $0.00 — no misleading "free" display.
  • The sole caller passing true — the session-total pill — is updated to the single-arg form (UsageMeter.tsx:166). No remaining references to the removed parameter: all other call sites (UsageMeter.tsx:90,95,114,119,124) already used the one-arg form, so tsc / npm run build won't trip on an arity mismatch or unused local.
  • Purely presentational — no data-flow, auth, or aggregation change.

The two prior P1s (Mongo-path test coverage, in-Python month aggregation) were addressed / reduced in commit 9f84c9d and remain non-blocking. Nothing new to flag on this delta.
· branch codex/usage-display

@lewtun lewtun merged commit 7d506e1 into main Jun 7, 2026
3 checks passed
@lewtun lewtun deleted the codex/usage-display branch June 7, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant