Skip to content

feat(trace): add langsmith trace stats command#84

Closed
Palash Shah (Palashio) wants to merge 6 commits into
mainfrom
feat/trace-stats-clean
Closed

feat(trace): add langsmith trace stats command#84
Palash Shah (Palashio) wants to merge 6 commits into
mainfrom
feat/trace-stats-clean

Conversation

@Palashio
Copy link
Copy Markdown
Contributor

Summary

Adds langsmith trace stats — a new subcommand that fetches aggregate statistics for a project over a time window, with optional comparison to a prior window.

  • Calls POST /api/v1/runs/stats and returns run count, latency (p50/p99), token usage, total cost, error rate, and feedback stats
  • --cmp-since / --cmp-before / --cmp-last-n-minutes flags enable side-by-side comparison with a prior period, including delta/pct-change columns in pretty output
  • Supports all standard filter flags (--since, --before, --last-n-minutes, --filter, --project)
  • Uses raw HTTP to work around a union deserialization issue in the Go SDK for the total_cost field

Test Plan

  • New unit tests in message_test.go covering the stats command
  • Built and ran manually against production with --project and --cmp-since

Release Note

Added langsmith trace stats command for fetching aggregate run statistics (latency, tokens, cost, error rate, feedback) with optional period comparison.

Palash Shah (Palashio) and others added 4 commits April 29, 2026 12:30
fetchRootPreviews already queries root runs per batch; add
RunQueryParamsSelectFeedbackStats to the select params and propagate
FeedbackStats onto each trace map alongside root_inputs_preview /
root_outputs_preview. Every trace now carries feedback_stats (empty {}
when no feedback exists), making it possible to filter feedback traces
directly from batch files with jq without a separate trace list call.
…etching

The /v2/traces/messages endpoint already returns feedback_stats on each
trace. The previous approach fetched it again via /api/v1/runs/query and
overwrote the API value — adding latency and risking data loss if the
runs query failed.

Revert the fetchRootPreviews/attachRootIO changes and let the API
response pass through unchanged. Add a test confirming feedback_stats
is preserved end-to-end.
Adds `langsmith trace stats` — hits POST /api/v1/runs/stats to return
aggregate run count, latency percentiles, token usage, cost, error rate,
and feedback key distributions for a project window.

Supports an optional comparison window (--compare-since/--compare-before/
--compare-last-n-minutes) that fires a second request and renders a
Primary / Comparison / Delta table side-by-side.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The langsmith-go SDK's RunStatsResponseUnion has no discriminator field,
causing apijson to resolve the flat stats response as RunStatsResponseMap
instead of RunStatsResponseRunStats. Switch to c.RawPost with a local
runStats struct that directly matches the API's flat JSON response shape.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread internal/cmd/trace_stats.go Outdated
Replaces the hand-rolled RawPost + manual JSON decode with the typed
langsmith-go SDK call. The SDK's union discriminator picks
RunStatsResponseRunStats correctly when total_cost is excluded from the
select list — including it causes the API to return a JSON number (e.g.
8.2e-6) that can't decode into the SDK's string field with exact
exactness, causing the discriminator to fall through to
RunStatsResponseMap and produce all-zero results.
…erference

Add method checks to mock cases, a default 404 handler, and t.Setenv
guards for LANGSMITH_ENDPOINT and LANGSMITH_API_KEY. Mirrors the
pattern used by TestTraceMessages_Success, which consistently passes
in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Anirudh Sriram (asrira428) added a commit that referenced this pull request May 7, 2026
Adds `trace stats` for aggregate health metrics over a project's traces:
run_count, error_rate, latency p50/p99, total/prompt/completion tokens,
total_cost, and feedback_stats (with per-key score distributions). Calls
the SDK's `Runs.Stats` against `/api/v1/runs/stats` with `is_root=true`
so aggregates are per-trace.

Optional --compare-since / --compare-before / --compare-last-n-minutes
fetch a second window side-by-side; the pretty renderer shows delta
columns. Filters apply to both windows.

`total_cost` is intentionally omitted from `select`: the API returns it
as a JSON number while the SDK models it as string, which mis-discriminates
the response union and zeroes out everything. Excluding it keeps the
flat-object response decodable. Once the SDK is fixed we can add it back.

Implementation lifted from Palash's #84 (which never landed because the
branch went stale against main); rebased onto current main, gofmt'd,
README docs added, and a flag-wiring test added in trace_test.go.

Co-Authored-By: Palash Shah <palash@langchain.dev>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants