Skip to content

Proposal: optional Langfuse tracing for agent execution #114

@totoyang

Description

@totoyang

Today the only LLM-call observability is _write_llm_log (llmcore.py:790), which appends raw prompt/response to temp/model_responses/*.txt — usable for quick debug, but no way to group calls by task, replay bad cases, or compare model configs on the same run. I'd like to contribute an opt-in Langfuse integration: one agent_runner_loop call becomes a trace, each chat() becomes a generation (with token usage), and each tool dispatch becomes a tool span.

Why this helps

Cost visibility. Aggregate token usage per task / model / time range — see which tasks burn the most tokens and which model gets the best cache-hit rate.
Bad-case replay. Expand a trace tree in the UI with full context (messages, tool calls, outputs) instead of grepping 10 MB text logs.
Dataset curation. Successful traces export directly to eval sets or fine-tuning corpora — a natural data source for the agent's self-evolution loop.
Model/prompt comparison. Run the same prompt across GLM / MiniMax / Claude and diff the traces side-by-side.
Tool-level analytics. Per-tool failure rate, average latency, call frequency — direct input for evolving L2/L3 memory and Skills.
Collaboration. Paste a trace URL into chat instead of copy-pasting a whole conversation.
Design boundaries

Zero intrusion when disabled. No langfuse_config in mykey.py → langfuse is never imported. No new required dependency, no behavior change.
Minimal change surface. A few hook points in llmcore.py and agent_loop.py only. No new directory, no new module, no OpenTelemetry abstraction layer.
Failure isolation. Any Langfuse-side exception is swallowed; it never propagates to the agent loop.
Subagents are separate processes; v1 treats each as its own top-level trace (no cross-process parent linking). Happy to do that as a follow-up.
Verified locally in both enabled/disabled states. If the direction is acceptable I'll open the PR with UI screenshots.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions