Skip to content

feat: add Langfuse metrics API for per-agent token/cost stats and trace tags#47

Merged
sks merged 15 commits intomainfrom
feature/langfuse-metrics-and-trace-tags
Mar 6, 2026
Merged

feat: add Langfuse metrics API for per-agent token/cost stats and trace tags#47
sks merged 15 commits intomainfrom
feature/langfuse-metrics-and-trace-tags

Conversation

@sks
Copy link
Contributor

@sks sks commented Mar 6, 2026

Summary

Adds the ability to query Langfuse for per-agent token usage and cost statistics, and improves trace tagging for better observability.

Langfuse Metrics API

  • New GetAgentStats method on langfuse.Client interface
  • Queries the Langfuse Metrics v1 API (/api/public/metrics) using the observations view grouped by traceName
  • Returns AgentUsageStats per agent: TotalCost, InputTokens, OutputTokens, TotalTokens, Count
  • Supports filtering by agent name and configurable lookback duration
  • Uses v1 endpoint (not v2) since v2 is cloud-only and unavailable on self-hosted Langfuse
  • Handles v1 API quirks: string-encoded token counts, nullable costs, aggregation_measure key format

Trace Tags

  • Added persona/agent name to langfuse.trace.tags in both AG-UI and messenger handler paths
  • AG-UI handler now creates a root span with langfuse.trace.name, langfuse.trace.input, and tags
  • Enables filtering traces by persona in the Langfuse dashboard

Vector Store Spans

  • Added parent spans for vectorstore.add and vectorstore.search operations
  • Embedding spans are now properly nested instead of appearing as orphaned root traces

Tests

  • 12 new Ginkgo test specs covering: multi-agent response, agent name filtering, null costs, empty results, error status codes, invalid JSON, parseAgentStats, numericFromMap, and global function behavior
  • All 54 specs pass (existing + new)

Usage Example

// Get stats for all agents in the last 24 hours
stats, err := langfuse.GetAgentStats(ctx, langfuse.GetAgentStatsRequest{
    Duration: 24 * time.Hour,
})

// Get stats for a specific agent
stats, err := langfuse.GetAgentStats(ctx, langfuse.GetAgentStatsRequest{
    Duration:  7 * 24 * time.Hour,
    AgentName: "codeowner.chat",
})

@sks sks requested a review from a team as a code owner March 6, 2026 00:52
Copilot AI review requested due to automatic review settings March 6, 2026 00:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Langfuse “metrics v1” querying to retrieve per-agent token/cost usage stats, and improves trace/span structure and tagging to make Langfuse observability more useful (persona filtering + proper span nesting for vectorstore ops).

Changes:

  • Add Client.GetAgentStats (+ global wrapper) and implement Langfuse Metrics v1 /api/public/metrics query/parse logic.
  • Add Ginkgo coverage for metrics querying/parsing and global behavior.
  • Improve Langfuse trace tagging (persona/agent) and add parent spans for vectorstore add/search to avoid orphan spans.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/memory/vector/store.go Adds parent spans for vectorstore.add / vectorstore.search and operation attributes.
pkg/langfuse/metrics.go Implements Langfuse Metrics v1 query + parsing helpers and new request/response types.
pkg/langfuse/metrics_test.go Adds test coverage for GetAgentStats, parsing, error cases, and global wrapper behavior.
pkg/langfuse/client.go Extends Client interface and adds package-level GetAgentStats + noop implementation.
pkg/langfuse/langfusefakes/fake_client.go Updates generated fake to include GetAgentStats.
pkg/expert/modelprovider/model.go Removes debug logging for token tailoring enabled.
pkg/app/app.go Adds persona tag to messenger traces and introduces AG-UI root span with Langfuse trace attributes/tags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sks sks force-pushed the feature/langfuse-metrics-and-trace-tags branch from 0853770 to 0e69215 Compare March 6, 2026 01:03
…ce tags

- Add GetAgentStats to langfuse.Client interface for querying per-agent
  token usage and cost statistics via the Langfuse Metrics v1 API
- New metrics.go with AgentUsageStats, GetAgentStatsRequest types and
  implementation using observations view grouped by traceName
- Handle v1 API quirks: string-encoded token counts, nullable costs,
  aggregation_measure key format
- Comprehensive test coverage (12 new specs) including null costs,
  string parsing, filtering, error handling
- Regenerate counterfeiter fakes for updated Client interface
- Add persona/agent name to langfuse.trace.tags in both AG-UI and
  messenger handler paths for dashboard filtering
- Add vectorstore.add and vectorstore.search parent spans for cleaner
  Langfuse trace nesting
Copilot AI review requested due to automatic review settings March 6, 2026 01:08
@sks sks force-pushed the feature/langfuse-metrics-and-trace-tags branch from 0e69215 to b0d77ea Compare March 6, 2026 01:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Use OTel Baggage to propagate langfuse.trace.tags to all spans in the
trace, including trpc-agent-go internal spans. The Langfuse exporter's
baggageBatchSpanProcessor copies baggage members onto every span at
start time, ensuring tags appear even when the library's own spans
become the trace root.

Also keep the explicit StringSlice attribute on our handler spans as
the primary/correct format (string[] vs comma-separated string).
@codecov
Copy link

codecov bot commented Mar 6, 2026

…g config

- Fix NewClient() not setting config field on client struct, causing
  GetAgentStats to generate URLs with no host (https:///api/...)
- Use actual agent name from context as Langfuse trace root span name
  in orchestrator.Chat() instead of hardcoded 'codeowner.chat'
- Update adaptive loop span in tree_v2 to use agent-prefixed name
  (e.g. 'devops-copilot.adaptive_loop') instead of 'reactree.adaptive_loop'
- Add trace-cost-calculator example that queries Langfuse API for
  per-agent usage stats
Copilot AI review requested due to automatic review settings March 6, 2026 01:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sks added 2 commits March 5, 2026 17:50
- Remove langfuse.trace.tags from vectorstore.add and vectorstore.search
  spans — these trace-level attributes caused Langfuse to promote child
  spans into independent root traces
- Remove langfuse.trace.input and langfuse.trace.tags from the adaptive
  loop span in tree_v2 — same promotion issue
- Use regular span attributes (vectorstore.agent, reactree.agent) instead
  for observability without trace splitting
- Restore agent name prefix on adaptive loop span name
Halguard's generateText() calls model.GenerateContent() directly,
bypassing the trpc-agent-go runner which normally instruments LLM calls.
This meant halguard pre-check and post-check model calls were invisible
in Langfuse traces.

- Add OTel span to generateText following 'chat {model}' naming convention
- Record gen_ai semantic attributes (request model, response model, tokens)
- Spans appear as children of halguard.PreCheck/PostCheck spans
Copilot AI review requested due to automatic review settings March 6, 2026 01:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sks and others added 2 commits March 5, 2026 18:06
- Fix metricsQuery doc comment: says v1 (not v2) endpoint
- Validate req.Duration > 0 in GetAgentStats to prevent zero/negative
  time window queries
- Fix GetAgentStats package-level docstring: says nil,nil (not empty
  slice) when no client is configured
- Add pii.Redact to messenger trace input for consistent PII handling
  (AG-UI path already redacted)
- Merge baggage instead of replacing: withLangfuseTraceBaggage now
  preserves existing upstream baggage values
Copilot AI review requested due to automatic review settings March 6, 2026 02:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sks and others added 2 commits March 5, 2026 18:45
Two-layer fix for sub-agents that output commands as text instead of
executing them via run_shell, or refuse with 'I don't know' when tools
are available.

Layer 1 - Stronger system prompt (subagent_instruction.go):
- Added TOOL USE IS MANDATORY directive
- Prohibit 'I don't know' / 'I don't have access' / 'I cannot' refusals
- Require executing embedded scripts via run_shell, not echoing as markdown
- Explicit instruction that tools exist to gather real data

Layer 2 - Runtime guard (create_agent.go):
- Zero-tool-use guard detects when toolCallCount==0 with tools available
- Annotates output with warning and sets status='tool_use_failure'
- Gives parent agent clear signal to re-spawn the sub-agent

Tests:
- buildSubAgentInstruction: tool-use enforcement, allowlist, core directives
- Zero-tool-use guard: table-driven condition tests, output annotation format
Copilot AI review requested due to automatic review settings March 6, 2026 02:48
sks added 2 commits March 5, 2026 18:50
Root spans in AGUI and messenger paths used otel.Tracer() which
resolves to the global OTel noop provider. The Langfuse exporter
registers its own TracerProvider on trace.Tracer but never calls
otel.SetTracerProvider(), so the root spans were never exported.
All downstream spans (orchestrator, tree, vectorstore) became
orphaned root-level Langfuse traces instead of children.

Switch both paths to trace.Tracer and inline SetAttributes into
Start() via oteltrace.WithAttributes for brevity.
Address PR review comments:
- Set langfuse.trace.name to a.displayName() (agent name) instead of
  'agui chat' / platform label so GetAgentStats can filter by agent.
  Channel/platform info is already in tags.
- Log warnings in withLangfuseTraceBaggage when baggage creation fails
  instead of silently returning the original context.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sks added 2 commits March 5, 2026 19:39
…ontext

- halguard.New: panic on nil textGenerator to fail fast instead of
  allowing a runtime nil-pointer dereference during PostCheck.
- create_agent: use toolNameList (from selectedTools) instead of
  scopedRegistry.ToolNames() in the zero-tool-use guard message so
  the reported tools match what the sub-agent actually had.
- app.go: set orchestratorcontext.WithAgent in the messenger path
  so downstream spans use the correct agent name instead of
  DefaultAgentName.
When a browser tab is present, bridgeBrowserTab creates a new context
derived from the chromedp tab context which has no OTel span context.
This caused child spans (adaptive_loop, invoke_agent, vectorstore.add)
to get new traceIDs and appear as separate top-level traces in Langfuse
instead of being nested under the parent trace.

Fix: inject the parent's OTel span and baggage into the bridged context
so that all downstream spans maintain the correct trace hierarchy and
inherit Langfuse attributes (userId, sessionId, tags, metadata).

Add tests verifying span context, baggage, per-request values, and
deadline propagation through bridgeBrowserTab.
Copilot AI review requested due to automatic review settings March 6, 2026 05:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +29
// singleModelProvider adapts a single model.Model into the
// modelprovider.ModelProvider interface so that expert.ExpertBio.ToExpert can
// use a specific, pre-selected model rather than task-type-based routing.
// This is necessary because halguard selects diverse models for cross-model
// verification and needs to call each one individually.
type singleModelProvider struct {
key string
model model.Model
}

// GetModel always returns the single wrapped model regardless of task type.
func (p *singleModelProvider) GetModel(_ context.Context, _ modelprovider.TaskType) (modelprovider.ModelMap, error) {
return modelprovider.ModelMap{p.key: p.model}, nil
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file defines a local singleModelProvider, but the PR also adds expert/modelprovider.NewSingleModelProvider(...) with the same purpose. Keeping both duplicates logic and invites divergence (e.g., interface changes). Prefer using the shared helper here (and drop the local type), or remove the new helper file if it’s not intended for reuse.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +40
instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +
"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +
"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +
"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +
"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +
"You have tools for a reason: USE THEM to gather real data, then summarize the results. " +
"\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSubAgentInstruction unconditionally instructs the sub-agent to "call run_shell" when the goal contains a shell command/script. When toolNames does not include run_shell (or when toolNames is nil/empty), this directive conflicts with the allowlist and can cause the sub-agent to attempt an unavailable tool. Suggest gating the run_shell-specific text on toolNames containing run_shell, or rewriting it to reference "an available shell tool" / "if run_shell is available".

Suggested change
instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +
"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +
"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +
"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +
"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +
"You have tools for a reason: USE THEM to gather real data, then summarize the results. " +
"\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +
// Detect whether run_shell is actually available so we don't instruct
// the sub-agent to call a tool that is not in its allowlist.
hasRunShell := false
for _, name := range toolNames {
if name == "run_shell" {
hasRunShell = true
break
}
}
if hasRunShell {
instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +
"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +
"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +
"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +
"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +
"You have tools for a reason: USE THEM to gather real data, then summarize the results. "
} else {
instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +
"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +
"If your goal describes a task and you have tools suitable for executing commands or scripts, use them to execute the task. " +
"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate available tool. " +
"If your goal contains a shell script or command to run, call an appropriate available shell-execution tool to EXECUTE it — do NOT echo or display the script as markdown. " +
"You have tools for a reason: USE THEM to gather real data, then summarize the results. "
}
instruction += "\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +

Copilot uses AI. Check for mistakes.
Comment on lines +310 to +317
shouldFire := func(gi guardInput) bool {
return gi.toolCallCount == 0 && gi.result != "" && gi.status != "error" && !gi.timedOut && gi.numSelectedTools > 0
}

DescribeTable("fires or skips based on conditions",
func(gi guardInput, expectFire bool) {
Expect(shouldFire(gi)).To(Equal(expectFire))
},
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added "zero-tool-use guard" specs mostly re-implement the guard predicate and string formatting rather than exercising the actual createAgentTool.executeInner behavior. This won’t catch regressions like the guard not firing due to event parsing changes, toolNameList mismatches, or status being overwritten later. Consider adding an integration-style unit test that runs executeInner with a stubbed runner/event stream to assert Status becomes tool_use_failure and Output is annotated only when toolCallCount == 0 and tools are available.

Copilot uses AI. Check for mistakes.
…_failure

Sub-agents (especially Gemini models) were echoing shell scripts as text
instead of calling run_shell. Two fixes:

1. Restructure subagent instruction: move MANDATORY tool-use rule to the
   very first sentence (LLMs attend most to prompt start). Add dedicated
   SCRIPT EXECUTION section targeting the exact failure mode of embedded
   bash scripts being echoed as markdown.

2. Auto-retry on tool_use_failure: when zero-tool-use guard fires, retry
   once with a reinforced prompt ('[RETRY] You MUST call run_shell...')
   before returning failure to the parent agent. Uses -retry agent name
   suffix for tracing.

Tests: fix 2 existing tests for new wording, add 5 new specs for SCRIPT
EXECUTION directive and auto-retry logic (149 specs pass).
@sks sks merged commit 8e02fa6 into main Mar 6, 2026
4 checks passed
@sks sks deleted the feature/langfuse-metrics-and-trace-tags branch March 6, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants