Problem
When a PDD command fails, the debug snapshot saved to .pdd/core_dumps/ doesn't contain enough information to diagnose what went wrong. The current message says "attach when reporting bugs," but the snapshot often lacks the detail needed to actually debug the issue.
I think this is high priority, not only to help developers debug failures, but also to brainstorm improvements to the tool.
Example: pdd sync summarize_directory fails with "5 consecutive fix operations" loop
The core dump shows:
"errors": [] — empty, even though the command failed
"steps": [{"step": 1, "command": "sync", "cost": 0.74, "model": ""}] — one opaque entry, no per-attempt breakdown
- No indication of which tests failed or why generated code was rejected
- No LLM prompts/responses captured
What's missing
| Gap |
Impact |
Workflow failures not captured in errors |
errors only logs Python exceptions, not logical failures (fix loops, test failures, budget exhaustion). A "Failed" sync produces errors: []. |
| No per-step breakdown for compound commands |
sync internally runs generate→test→fix→test→... but only records one step. Users can't see which sub-operation failed. |
Empty model field |
The model used is recorded as "", making it impossible to know which LLM was involved. |
| No LLM request/response pairs |
The actual prompts sent to and responses received from the LLM are not captured. This is the single most important diagnostic for generation failures. |
| No test output per attempt |
When tests fail and trigger fix loops, the test stderr/assertion messages aren't included. |
| Operation log not bundled |
PDD already writes rich per-operation logs to .pdd/meta/{basename}_{lang}_sync.log (with timestamps, costs, success/failure per operation), but this file isn't included in file_contents. |
| Raw ANSI escapes in terminal_output |
Makes the output unreadable when reviewing the JSON manually. |
Proposed improvements
- Capture LLM request/response pairs — at minimum, log the final prompt sent and the raw LLM response for each failed operation. This is critical for diagnosing "the LLM keeps generating broken code" scenarios.
- Include operation sync log in
file_contents — this already exists and contains per-operation success/failure/cost/model data. Also, capture how much time each operation took so developers know which operations are high-priority to improve speed.
- Strip ANSI escapes from
terminal_output before saving
- Record logical failures in
errors — if a sync/fix loop terminates due to max retries, budget exhaustion, or other non-exception failures, log them as structured error entries
- Populate the
model field in step records
- Expand
steps for sync — record each internal operation (generate, test, fix) as its own step entry with: operation type, model, cost, success/failure, and a summary of the failure reason
- Capture test output — include the test runner's stderr/stdout for each failed test attempt (truncated to ~5KB)
- Include generated code diffs — show what changed between fix attempts
- Include
.pddrc and llm_model.csv configs (if not already done for the specific run)
Context
Related: #230 (add LLM model CSV to core dump), #391 (confusing error output in core dumps)
Problem
When a PDD command fails, the debug snapshot saved to
.pdd/core_dumps/doesn't contain enough information to diagnose what went wrong. The current message says "attach when reporting bugs," but the snapshot often lacks the detail needed to actually debug the issue.I think this is high priority, not only to help developers debug failures, but also to brainstorm improvements to the tool.
Example:
pdd sync summarize_directoryfails with "5 consecutive fix operations" loopThe core dump shows:
"errors": []— empty, even though the command failed"steps": [{"step": 1, "command": "sync", "cost": 0.74, "model": ""}]— one opaque entry, no per-attempt breakdownWhat's missing
errorserrorsonly logs Python exceptions, not logical failures (fix loops, test failures, budget exhaustion). A "Failed" sync produceserrors: [].syncinternally runs generate→test→fix→test→... but only records one step. Users can't see which sub-operation failed.modelfield"", making it impossible to know which LLM was involved..pdd/meta/{basename}_{lang}_sync.log(with timestamps, costs, success/failure per operation), but this file isn't included infile_contents.Proposed improvements
file_contents— this already exists and contains per-operation success/failure/cost/model data. Also, capture how much time each operation took so developers know which operations are high-priority to improve speed.terminal_outputbefore savingerrors— if a sync/fix loop terminates due to max retries, budget exhaustion, or other non-exception failures, log them as structured error entriesmodelfield in step recordsstepsfor sync — record each internal operation (generate, test, fix) as its own step entry with: operation type, model, cost, success/failure, and a summary of the failure reason.pddrcandllm_model.csvconfigs (if not already done for the specific run)Context
Related: #230 (add LLM model CSV to core dump), #391 (confusing error output in core dumps)