After an agentic task (sync, change, bug, fix), PDD should save a simplified version of the LLM's session to .pdd/agentic-traces/. These files will show the agent's entire thinking/reasoning process and tool call sequence during all pdd agentic tasks.
It will be helpful to have a standardized way of storing LLM internal thinking processes so we can reference them when iterating upon pdd agentic LLM prompts. For example, I'm currently improving one_session_agent_LLM.prompt, and I'm finding it helpful to read the LLM traces from one-session sync to identify where the agent is prone to getting stuck in loops and making incorrect assumptions. Then, I'm adjusting my prompt to prevent this.
Right now, the only method to view agentic traces is to manually dig into provider-specific folders (~/.claude/projects/, ~/.gemini/tmp/), and they contain a lot of unnecessary information (streaming progress events, full file contents, undo snapshots).
Proposed Design
Each trace will be a compact JSONL file in .pdd/agentic-traces/ containing:
- Metadata: timestamp, label, provider, model, success, cost, duration
- Agent thinking/reasoning: the full chain of thought (the primary signal for prompt improvement)
- Tool calls: name + summarized args + status (not full file contents)
- Error output: stderr and diagnostic lines from command output
- File fingerprints: path + line count + sha256 for files read (not full contents)
What gets removed:
- Streaming progress events (redundant with final messages)
- Full file contents from reads/writes
- Duplicate prompt text across subagent spawns (kept once in metadata)
- File history snapshots / undo state
- Full successful command stdout (only keep the tail and diagnostic lines)
Redaction:
- Replaces absolute paths with $PROJECT_ROOT-relative paths
- Scrubs API keys/tokens matching common patterns (sk-, AIza, ghp_, Bearer)
- Stores the prompt as a file reference + sha256 rather than inline
The filename will encode all lookup dimensions: trace_20260321_1457_one_session_sync__task_scheduler_google.jsonl
This way, we can easily find relevant agentic traces by looking them up by time, operation, provider, and/or outcome:
- By time: ls -lt .pdd/agentic-traces/
- By operation: ls .pdd/agentic-traces/one_session_sync
- By provider: ls .pdd/agentic-traces/google
- By outcome: grep -l '"success":false' .pdd/agentic-traces/*.jsonl
Implementation:
- We support the 3 main CLIs: Gemini, Claude, and Codex.
- We would need per-provider distillation functions (_distill_gemini_trace, _distill_claude_trace) that will normalize raw provider-specific LLM trace formats into our desired common schema.
- This would run automatically at the end of run_agentic_task inside agentic_common.py
- We add .pdd/agentic-traces/ to .gitignore by default
Example trace file:
{"type":"meta","ts":"2026-03-21T14:57:45","label":"one_session_sync:task_scheduler","provider":"google","model":"gemini-2.5-pro","success":true,"cost_usd":1.75,"duration_sec":125,"prompt_file":"prompts/task_scheduler_python.prompt","prompt_sha256":"a1b2c3d4..."}
{"type":"thought","ts":"2026-03-21T14:55:46","text":"Reading the instruction file to understand the one-session sync task for task_scheduler."}
{"type":"tool","ts":"2026-03-21T14:55:47","name":"read_file","args_summary":"pdd/task_scheduler.py (248 lines)","status":"ok","sha256":"e5f6a7b8..."}
{"type":"thought","ts":"2026-03-21T14:55:52","text":"Identified exported components: TaskScheduler, Task, TaskStatus, ExecutionReport, CycleError. Now drafting the usage example."}
{"type":"tool","ts":"2026-03-21T14:55:58","name":"write_file","args_summary":"context/task_scheduler_example.py (62 lines)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:01","name":"run_shell_command","args_summary":"python context/task_scheduler_example.py","status":"error","output_tail":"File \"context/task_scheduler_example.py\", line 48\n print(\"\n ^\nSyntaxError: unterminated string literal"}
{"type":"thought","ts":"2026-03-21T14:56:03","text":"SyntaxError due to unescaped newlines in print statements. Will remove newlines from string literals."}
{"type":"tool","ts":"2026-03-21T14:56:05","name":"read_file","args_summary":"context/task_scheduler_example.py (62 lines)","status":"ok","sha256":"c9d0e1f2..."}
{"type":"tool","ts":"2026-03-21T14:56:08","name":"replace","args_summary":"context/task_scheduler_example.py (3 replacements)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:12","name":"run_shell_command","args_summary":"python context/task_scheduler_example.py","status":"ok","output_tail":"All tasks completed successfully.\nExecution time: 0.03s"}
{"type":"thought","ts":"2026-03-21T14:56:14","text":"Example runs without errors. Moving to test generation. Will read the spec to build a test plan first."}
{"type":"tool","ts":"2026-03-21T14:56:16","name":"read_file","args_summary":"prompts/task_scheduler_python.prompt (45 lines)","status":"ok","sha256":"a1b2c3d4..."}
{"type":"thought","ts":"2026-03-21T14:56:22","text":"Spec has 6 requirements. Building test plan: dependency ordering, cycle detection, parallel execution, rollback on failure, status reporting, timeout handling."}
{"type":"tool","ts":"2026-03-21T14:56:30","name":"write_file","args_summary":"tests/test_task_scheduler.py (189 lines)","status":"ok"}
{"type":"tool","ts":"2026-03-21T14:56:35","name":"run_shell_command","args_summary":"python -m pytest tests/test_task_scheduler.py -v","status":"error","output_tail":"FAILED tests/test_task_scheduler.py::test_rollback_order - AssertionError: ['c', 'b', 'a'] != ['a', 'b', 'c']\n2 failed, 4 passed in 0.8s"}
{"type":"thought","ts":"2026-03-21T14:56:38","text":"test_rollback_order fails because rollback runs in alphabetical order instead of reverse dependency order. This is a code bug — spec says rollback should reverse. Will fix code."}
All thinking/reasoning text will be kept in full, exactly as the provider recorded it. The only things that get shortened are tool call arguments (replaced with a summary like "task_scheduler.py (248 lines)" instead of the full file contents) and command output (keep the tail and error lines instead of 500 lines of stdout).
After an agentic task (sync, change, bug, fix), PDD should save a simplified version of the LLM's session to
.pdd/agentic-traces/. These files will show the agent's entire thinking/reasoning process and tool call sequence during all pdd agentic tasks.It will be helpful to have a standardized way of storing LLM internal thinking processes so we can reference them when iterating upon pdd agentic LLM prompts. For example, I'm currently improving
one_session_agent_LLM.prompt, and I'm finding it helpful to read the LLM traces from one-session sync to identify where the agent is prone to getting stuck in loops and making incorrect assumptions. Then, I'm adjusting my prompt to prevent this.Right now, the only method to view agentic traces is to manually dig into provider-specific folders (
~/.claude/projects/,~/.gemini/tmp/), and they contain a lot of unnecessary information (streaming progress events, full file contents, undo snapshots).Proposed Design
Each trace will be a compact JSONL file in .pdd/agentic-traces/ containing:
What gets removed:
Redaction:
The filename will encode all lookup dimensions: trace_20260321_1457_one_session_sync__task_scheduler_google.jsonl
This way, we can easily find relevant agentic traces by looking them up by time, operation, provider, and/or outcome:
Implementation:
Example trace file:
All thinking/reasoning text will be kept in full, exactly as the provider recorded it. The only things that get shortened are tool call arguments (replaced with a summary like "task_scheduler.py (248 lines)" instead of the full file contents) and command output (keep the tail and error lines instead of 500 lines of stdout).