Skip to content

Feature: Store simplified LLM session traces after all agentic tasks #752

@niti-go

Description

@niti-go

After an agentic task (sync, change, bug, fix), PDD should save a simplified version of the LLM's session to .pdd/agentic-traces/. These files will show the agent's entire thinking/reasoning process and tool call sequence during all pdd agentic tasks.

It will be helpful to have a standardized way of storing LLM internal thinking processes so we can reference them when iterating upon pdd agentic LLM prompts. For example, I'm currently improving one_session_agent_LLM.prompt, and I'm finding it helpful to read the LLM traces from one-session sync to identify where the agent is prone to getting stuck in loops and making incorrect assumptions. Then, I'm adjusting my prompt to prevent this.

Right now, the only method to view agentic traces is to manually dig into provider-specific folders (~/.claude/projects/, ~/.gemini/tmp/), and they contain a lot of unnecessary information (streaming progress events, full file contents, undo snapshots).

Proposed Design

Each trace will be a compact JSONL file in .pdd/agentic-traces/ containing:

  • Metadata: timestamp, label, provider, model, success, cost, duration
  • Agent thinking/reasoning: the full chain of thought (the primary signal for prompt improvement)
  • Tool calls: name + summarized args + status (not full file contents)
  • Error output: stderr and diagnostic lines from command output
  • File fingerprints: path + line count + sha256 for files read (not full contents)

What gets removed:

  • Streaming progress events (redundant with final messages)
  • Full file contents from reads/writes
  • Duplicate prompt text across subagent spawns (kept once in metadata)
  • File history snapshots / undo state
  • Full successful command stdout (only keep the tail and diagnostic lines)

Redaction:

  • Replaces absolute paths with $PROJECT_ROOT-relative paths
  • Scrubs API keys/tokens matching common patterns (sk-, AIza, ghp_, Bearer)
  • Stores the prompt as a file reference + sha256 rather than inline

The filename will encode all lookup dimensions: trace_20260321_1457_one_session_sync__task_scheduler_google.jsonl
This way, we can easily find relevant agentic traces by looking them up by time, operation, provider, and/or outcome:

  • By time: ls -lt .pdd/agentic-traces/
  • By operation: ls .pdd/agentic-traces/one_session_sync
  • By provider: ls .pdd/agentic-traces/google
  • By outcome: grep -l '"success":false' .pdd/agentic-traces/*.jsonl

Implementation:

  • We support the 3 main CLIs: Gemini, Claude, and Codex.
  • We would need per-provider distillation functions (_distill_gemini_trace, _distill_claude_trace) that will normalize raw provider-specific LLM trace formats into our desired common schema.
  • This would run automatically at the end of run_agentic_task inside agentic_common.py
  • We add .pdd/agentic-traces/ to .gitignore by default

Example trace file:

{"type":"meta","ts":"2026-03-21T14:57:45","label":"one_session_sync:task_scheduler","provider":"google","model":"gemini-2.5-pro","success":true,"cost_usd":1.75,"duration_sec":125,"prompt_file":"prompts/task_scheduler_python.prompt","prompt_sha256":"a1b2c3d4..."}
{"type":"thought","ts":"2026-03-21T14:55:46","text":"Reading the instruction file to understand the one-session sync task for task_scheduler."}
{"type":"tool","ts":"2026-03-21T14:55:47","name":"read_file","args_summary":"pdd/task_scheduler.py (248 lines)","status":"ok","sha256":"e5f6a7b8..."}
{"type":"thought","ts":"2026-03-21T14:55:52","text":"Identified exported components: TaskScheduler, Task, TaskStatus, ExecutionReport, CycleError. Now drafting the usage example."}
{"type":"tool","ts":"2026-03-21T14:55:58","name":"write_file","args_summary":"context/task_scheduler_example.py (62 lines)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:01","name":"run_shell_command","args_summary":"python context/task_scheduler_example.py","status":"error","output_tail":"File \"context/task_scheduler_example.py\", line 48\n    print(\"\n          ^\nSyntaxError: unterminated string literal"}
{"type":"thought","ts":"2026-03-21T14:56:03","text":"SyntaxError due to unescaped newlines in print statements. Will remove newlines from string literals."}
{"type":"tool","ts":"2026-03-21T14:56:05","name":"read_file","args_summary":"context/task_scheduler_example.py (62 lines)","status":"ok","sha256":"c9d0e1f2..."}
{"type":"tool","ts":"2026-03-21T14:56:08","name":"replace","args_summary":"context/task_scheduler_example.py (3 replacements)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:12","name":"run_shell_command","args_summary":"python context/task_scheduler_example.py","status":"ok","output_tail":"All tasks completed successfully.\nExecution time: 0.03s"}
{"type":"thought","ts":"2026-03-21T14:56:14","text":"Example runs without errors. Moving to test generation. Will read the spec to build a test plan first."}
{"type":"tool","ts":"2026-03-21T14:56:16","name":"read_file","args_summary":"prompts/task_scheduler_python.prompt (45 lines)","status":"ok","sha256":"a1b2c3d4..."}
{"type":"thought","ts":"2026-03-21T14:56:22","text":"Spec has 6 requirements. Building test plan: dependency ordering, cycle detection, parallel execution, rollback on failure, status reporting, timeout handling."}
{"type":"tool","ts":"2026-03-21T14:56:30","name":"write_file","args_summary":"tests/test_task_scheduler.py (189 lines)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:35","name":"run_shell_command","args_summary":"python -m pytest tests/test_task_scheduler.py -v","status":"error","output_tail":"FAILED tests/test_task_scheduler.py::test_rollback_order - AssertionError: ['c', 'b', 'a'] != ['a', 'b', 'c']\n2 failed, 4 passed in 0.8s"}
{"type":"thought","ts":"2026-03-21T14:56:38","text":"test_rollback_order fails because rollback runs in alphabetical order instead of reverse dependency order. This is a code bug — spec says rollback should reverse. Will fix code."}

All thinking/reasoning text will be kept in full, exactly as the provider recorded it. The only things that get shortened are tool call arguments (replaced with a summary like "task_scheduler.py (248 lines)" instead of the full file contents) and command output (keep the tail and error lines instead of 500 lines of stdout).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions