feat(trace): add langsmith trace stats command#117
Merged
Conversation
fetchRootPreviews already queries root runs per batch; add
RunQueryParamsSelectFeedbackStats to the select params and propagate
FeedbackStats onto each trace map alongside root_inputs_preview /
root_outputs_preview. Every trace now carries feedback_stats (empty {}
when no feedback exists), making it possible to filter feedback traces
directly from batch files with jq without a separate trace list call.
…etching The /v2/traces/messages endpoint already returns feedback_stats on each trace. The previous approach fetched it again via /api/v1/runs/query and overwrote the API value — adding latency and risking data loss if the runs query failed. Revert the fetchRootPreviews/attachRootIO changes and let the API response pass through unchanged. Add a test confirming feedback_stats is preserved end-to-end.
Adds `langsmith trace stats` — hits POST /api/v1/runs/stats to return aggregate run count, latency percentiles, token usage, cost, error rate, and feedback key distributions for a project window. Supports an optional comparison window (--compare-since/--compare-before/ --compare-last-n-minutes) that fires a second request and renders a Primary / Comparison / Delta table side-by-side. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The langsmith-go SDK's RunStatsResponseUnion has no discriminator field, causing apijson to resolve the flat stats response as RunStatsResponseMap instead of RunStatsResponseRunStats. Switch to c.RawPost with a local runStats struct that directly matches the API's flat JSON response shape. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the hand-rolled RawPost + manual JSON decode with the typed langsmith-go SDK call. The SDK's union discriminator picks RunStatsResponseRunStats correctly when total_cost is excluded from the select list — including it causes the API to return a JSON number (e.g. 8.2e-6) that can't decode into the SDK's string field with exact exactness, causing the discriminator to fall through to RunStatsResponseMap and produce all-zero results.
…erference Add method checks to mock cases, a default 404 handler, and t.Setenv guards for LANGSMITH_ENDPOINT and LANGSMITH_API_KEY. Mirrors the pattern used by TestTraceMessages_Success, which consistently passes in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Paarth Ahuja (paarth-a)
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resurrects Palash's #84 (
feat/trace-stats-clean) verbatim — same commits, merged with current main. Closing the loop on the slack thread: the issues-agent prompt was reverted to not usetrace stats; once this lands the prompt will be flipped back on.Adds
langsmith trace stats— aggregate health metrics for a project's traces over a window: run count, error rate, latency p50/p99, prompt/completion/total tokens, error_rate, and feedback_stats. Optional--cmp-since/--cmp-before/--cmp-last-n-minutesfetch a comparison window side-by-side with delta + % change in pretty output.langsmith trace stats --project my-app --last-n-minutes 60 langsmith trace stats --project my-app --since 2026-01-10 --cmp-since 2026-01-03 --cmp-before 2026-01-10 langsmith trace stats --project my-app --filter 'eq(status, "error")'total_costis nullIntentionally omitted from the SDK
select. The Go SDK modelstotal_costasstring, but the API returns it as a JSON number, which causes theRunStatsResponseunion to discriminate asRunStatsResponseMapand zero out every field. Excludingtotal_costkeeps the flat-object response decodable. Pinging go-sdk owners to fix the union; once it lands we can add the field back. Documented inline at the call site.Why a new PR
#114 reverted #84's SDK call back to raw HTTP, contrary to the original review consensus and Palash's note in the slack thread. Cleanest path was to copy #84's commits intact and integrate main via merge.
Known failing test (pre-existing)
TestTraceMessages_FeedbackStats(ininternal/cmd/message_test.go, included on Palash's branch) mocks/v2/traces/messagesreturning{"traces": [...]}. Main has since renamed that response field toitems(commitb36ab5afix(trace messages): update to v2 API field names). Production code is correct and readsitems; only the test mock is stale. Trivially fixable with a follow-up commit (renametraces→items,cursors→next_cursor) — happy to push that fix on top if you want CI green for the stamp.Release Note
Added
langsmith trace statscommand for fetching aggregate run statistics (latency, tokens, cost, error rate, feedback) with optional period comparison.Test Plan
langsmith trace stats --project <real> --last-n-minutes 60(and pretty/compare modes)feedback_statskeys come back populated for a project that has feedbacktotal_costshows as null (until SDK union is fixed)