feat(trace): add langsmith trace stats command#84
Closed
Palash Shah (Palashio) wants to merge 6 commits into
Closed
feat(trace): add langsmith trace stats command#84Palash Shah (Palashio) wants to merge 6 commits into
Palash Shah (Palashio) wants to merge 6 commits into
Conversation
fetchRootPreviews already queries root runs per batch; add
RunQueryParamsSelectFeedbackStats to the select params and propagate
FeedbackStats onto each trace map alongside root_inputs_preview /
root_outputs_preview. Every trace now carries feedback_stats (empty {}
when no feedback exists), making it possible to filter feedback traces
directly from batch files with jq without a separate trace list call.
…etching The /v2/traces/messages endpoint already returns feedback_stats on each trace. The previous approach fetched it again via /api/v1/runs/query and overwrote the API value — adding latency and risking data loss if the runs query failed. Revert the fetchRootPreviews/attachRootIO changes and let the API response pass through unchanged. Add a test confirming feedback_stats is preserved end-to-end.
Adds `langsmith trace stats` — hits POST /api/v1/runs/stats to return aggregate run count, latency percentiles, token usage, cost, error rate, and feedback key distributions for a project window. Supports an optional comparison window (--compare-since/--compare-before/ --compare-last-n-minutes) that fires a second request and renders a Primary / Comparison / Delta table side-by-side. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The langsmith-go SDK's RunStatsResponseUnion has no discriminator field, causing apijson to resolve the flat stats response as RunStatsResponseMap instead of RunStatsResponseRunStats. Switch to c.RawPost with a local runStats struct that directly matches the API's flat JSON response shape. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Quentin Brosse (QuentinBrosse)
requested changes
Apr 30, 2026
Replaces the hand-rolled RawPost + manual JSON decode with the typed langsmith-go SDK call. The SDK's union discriminator picks RunStatsResponseRunStats correctly when total_cost is excluded from the select list — including it causes the API to return a JSON number (e.g. 8.2e-6) that can't decode into the SDK's string field with exact exactness, causing the discriminator to fall through to RunStatsResponseMap and produce all-zero results.
4 tasks
…erference Add method checks to mock cases, a default 404 handler, and t.Setenv guards for LANGSMITH_ENDPOINT and LANGSMITH_API_KEY. Mirrors the pattern used by TestTraceMessages_Success, which consistently passes in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Anirudh Sriram (asrira428)
added a commit
that referenced
this pull request
May 7, 2026
Adds `trace stats` for aggregate health metrics over a project's traces: run_count, error_rate, latency p50/p99, total/prompt/completion tokens, total_cost, and feedback_stats (with per-key score distributions). Calls the SDK's `Runs.Stats` against `/api/v1/runs/stats` with `is_root=true` so aggregates are per-trace. Optional --compare-since / --compare-before / --compare-last-n-minutes fetch a second window side-by-side; the pretty renderer shows delta columns. Filters apply to both windows. `total_cost` is intentionally omitted from `select`: the API returns it as a JSON number while the SDK models it as string, which mis-discriminates the response union and zeroes out everything. Excluding it keeps the flat-object response decodable. Once the SDK is fixed we can add it back. Implementation lifted from Palash's #84 (which never landed because the branch went stale against main); rebased onto current main, gofmt'd, README docs added, and a flag-wiring test added in trace_test.go. Co-Authored-By: Palash Shah <palash@langchain.dev> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
langsmith trace stats— a new subcommand that fetches aggregate statistics for a project over a time window, with optional comparison to a prior window.POST /api/v1/runs/statsand returns run count, latency (p50/p99), token usage, total cost, error rate, and feedback stats--cmp-since/--cmp-before/--cmp-last-n-minutesflags enable side-by-side comparison with a prior period, including delta/pct-change columns in pretty output--since,--before,--last-n-minutes,--filter,--project)total_costfieldTest Plan
message_test.gocovering the stats command--projectand--cmp-sinceRelease Note
Added
langsmith trace statscommand for fetching aggregate run statistics (latency, tokens, cost, error rate, feedback) with optional period comparison.