Skip to content

Debug snapshots (core dumps) lack actionable diagnostic information for debugging failures #680

@niti-go

Description

@niti-go

Problem

When a PDD command fails, the debug snapshot saved to .pdd/core_dumps/ doesn't contain enough information to diagnose what went wrong. The current message says "attach when reporting bugs," but the snapshot often lacks the detail needed to actually debug the issue.

I think this is high priority, not only to help developers debug failures, but also to brainstorm improvements to the tool.

Example: pdd sync summarize_directory fails with "5 consecutive fix operations" loop

The core dump shows:

  • "errors": [] — empty, even though the command failed
  • "steps": [{"step": 1, "command": "sync", "cost": 0.74, "model": ""}] — one opaque entry, no per-attempt breakdown
  • No indication of which tests failed or why generated code was rejected
  • No LLM prompts/responses captured

What's missing

Gap Impact
Workflow failures not captured in errors errors only logs Python exceptions, not logical failures (fix loops, test failures, budget exhaustion). A "Failed" sync produces errors: [].
No per-step breakdown for compound commands sync internally runs generate→test→fix→test→... but only records one step. Users can't see which sub-operation failed.
Empty model field The model used is recorded as "", making it impossible to know which LLM was involved.
No LLM request/response pairs The actual prompts sent to and responses received from the LLM are not captured. This is the single most important diagnostic for generation failures.
No test output per attempt When tests fail and trigger fix loops, the test stderr/assertion messages aren't included.
Operation log not bundled PDD already writes rich per-operation logs to .pdd/meta/{basename}_{lang}_sync.log (with timestamps, costs, success/failure per operation), but this file isn't included in file_contents.
Raw ANSI escapes in terminal_output Makes the output unreadable when reviewing the JSON manually.

Proposed improvements

  1. Capture LLM request/response pairs — at minimum, log the final prompt sent and the raw LLM response for each failed operation. This is critical for diagnosing "the LLM keeps generating broken code" scenarios.
  2. Include operation sync log in file_contents — this already exists and contains per-operation success/failure/cost/model data. Also, capture how much time each operation took so developers know which operations are high-priority to improve speed.
  3. Strip ANSI escapes from terminal_output before saving
  4. Record logical failures in errors — if a sync/fix loop terminates due to max retries, budget exhaustion, or other non-exception failures, log them as structured error entries
  5. Populate the model field in step records
  6. Expand steps for sync — record each internal operation (generate, test, fix) as its own step entry with: operation type, model, cost, success/failure, and a summary of the failure reason
  7. Capture test output — include the test runner's stderr/stdout for each failed test attempt (truncated to ~5KB)
  8. Include generated code diffs — show what changed between fix attempts
  9. Include .pddrc and llm_model.csv configs (if not already done for the specific run)

Context

Related: #230 (add LLM model CSV to core dump), #391 (confusing error output in core dumps)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions