Skip to content

feat(trace): add langsmith trace stats command#117

Merged
Andy Young (ayoung19) merged 7 commits into
mainfrom
andy/trace-stats
May 8, 2026
Merged

feat(trace): add langsmith trace stats command#117
Andy Young (ayoung19) merged 7 commits into
mainfrom
andy/trace-stats

Conversation

@ayoung19
Copy link
Copy Markdown
Contributor

@ayoung19 Andy Young (ayoung19) commented May 8, 2026

Resurrects Palash's #84 (feat/trace-stats-clean) verbatim — same commits, merged with current main. Closing the loop on the slack thread: the issues-agent prompt was reverted to not use trace stats; once this lands the prompt will be flipped back on.

Adds langsmith trace stats — aggregate health metrics for a project's traces over a window: run count, error rate, latency p50/p99, prompt/completion/total tokens, error_rate, and feedback_stats. Optional --cmp-since / --cmp-before / --cmp-last-n-minutes fetch a comparison window side-by-side with delta + % change in pretty output.

langsmith trace stats --project my-app --last-n-minutes 60
langsmith trace stats --project my-app --since 2026-01-10 --cmp-since 2026-01-03 --cmp-before 2026-01-10
langsmith trace stats --project my-app --filter 'eq(status, "error")'

total_cost is null

Intentionally omitted from the SDK select. The Go SDK models total_cost as string, but the API returns it as a JSON number, which causes the RunStatsResponse union to discriminate as RunStatsResponseMap and zero out every field. Excluding total_cost keeps the flat-object response decodable. Pinging go-sdk owners to fix the union; once it lands we can add the field back. Documented inline at the call site.

Why a new PR

#114 reverted #84's SDK call back to raw HTTP, contrary to the original review consensus and Palash's note in the slack thread. Cleanest path was to copy #84's commits intact and integrate main via merge.

Known failing test (pre-existing)

TestTraceMessages_FeedbackStats (in internal/cmd/message_test.go, included on Palash's branch) mocks /v2/traces/messages returning {"traces": [...]}. Main has since renamed that response field to items (commit b36ab5a fix(trace messages): update to v2 API field names). Production code is correct and reads items; only the test mock is stale. Trivially fixable with a follow-up commit (rename tracesitems, cursorsnext_cursor) — happy to push that fix on top if you want CI green for the stamp.

Release Note

Added langsmith trace stats command for fetching aggregate run statistics (latency, tokens, cost, error rate, feedback) with optional period comparison.

Test Plan

  • Manual smoke against staging: langsmith trace stats --project <real> --last-n-minutes 60 (and pretty/compare modes)
  • Confirm feedback_stats keys come back populated for a project that has feedback
  • Verify total_cost shows as null (until SDK union is fixed)

Palash Shah (Palashio) and others added 7 commits April 29, 2026 12:30
fetchRootPreviews already queries root runs per batch; add
RunQueryParamsSelectFeedbackStats to the select params and propagate
FeedbackStats onto each trace map alongside root_inputs_preview /
root_outputs_preview. Every trace now carries feedback_stats (empty {}
when no feedback exists), making it possible to filter feedback traces
directly from batch files with jq without a separate trace list call.
…etching

The /v2/traces/messages endpoint already returns feedback_stats on each
trace. The previous approach fetched it again via /api/v1/runs/query and
overwrote the API value — adding latency and risking data loss if the
runs query failed.

Revert the fetchRootPreviews/attachRootIO changes and let the API
response pass through unchanged. Add a test confirming feedback_stats
is preserved end-to-end.
Adds `langsmith trace stats` — hits POST /api/v1/runs/stats to return
aggregate run count, latency percentiles, token usage, cost, error rate,
and feedback key distributions for a project window.

Supports an optional comparison window (--compare-since/--compare-before/
--compare-last-n-minutes) that fires a second request and renders a
Primary / Comparison / Delta table side-by-side.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The langsmith-go SDK's RunStatsResponseUnion has no discriminator field,
causing apijson to resolve the flat stats response as RunStatsResponseMap
instead of RunStatsResponseRunStats. Switch to c.RawPost with a local
runStats struct that directly matches the API's flat JSON response shape.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the hand-rolled RawPost + manual JSON decode with the typed
langsmith-go SDK call. The SDK's union discriminator picks
RunStatsResponseRunStats correctly when total_cost is excluded from the
select list — including it causes the API to return a JSON number (e.g.
8.2e-6) that can't decode into the SDK's string field with exact
exactness, causing the discriminator to fall through to
RunStatsResponseMap and produce all-zero results.
…erference

Add method checks to mock cases, a default 404 handler, and t.Setenv
guards for LANGSMITH_ENDPOINT and LANGSMITH_API_KEY. Mirrors the
pattern used by TestTraceMessages_Success, which consistently passes
in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ayoung19 Andy Young (ayoung19) merged commit 2bbcb19 into main May 8, 2026
5 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants