Skip to content

Richer core dumps: structured errors, sync_steps, LLM trace, ANSI strip#712

Merged
gltanaka merged 14 commits into
promptdriven:mainfrom
vishalramvelu:feature/711
Apr 9, 2026
Merged

Richer core dumps: structured errors, sync_steps, LLM trace, ANSI strip#712
gltanaka merged 14 commits into
promptdriven:mainfrom
vishalramvelu:feature/711

Conversation

@vishalramvelu
Copy link
Copy Markdown
Contributor

Summary

Improves PDD debug snapshots (core dumps written to .pdd/core_dumps/pdd-core-*.json) so failures are easier to diagnose without log spelunking.

  • Structured errors: Non-exception failures can be recorded via record_core_dump_error and appear under errors in the dump.
  • Dump schema: Bumps to schema_version 2; aligns steps with invoked subcommands and defaults missing model to "unknown".
  • Context in the JSON: Auto-includes relevant .pdd/meta/*_sync.log / *_run.json content, derives sync_steps from sync logs (skips noise where appropriate), and keeps file_contents for tracked / meta paths.
  • Terminal output: Strips CSI + OSC (and related) sequences so terminal_output is readable plain text.
  • Sync failures: On relevant failure paths, attaches details.llm_trace (last prompt/response, redacted/truncated) and test_output_excerpt where applicable.
  • Budget exhaustion: Surfaces as a structured error suitable for the dump.

Test Results

Test_Core_Dump - Passed
Sync_Orchestration - Passed
Test_Core_Errors - Passed
Test_Cli - Passed

Manual testing

  • From a small fixture repo (or this repo), run a command that fails during sync (e.g. intentional test failure on fix) with core dumps enabled (default on unless --no-core-dump).
  • Open the newest .pdd/core_dumps/pdd-core-*.json and confirm:
    schema_version is 2, errors includes expected structured entries when no Python traceback exists, steps length/order matches the run; missing models show "unknown", terminal_output has no obvious ANSI/OSC garbage, sync_steps present when meta sync logs exist; entries look like the failing operations, on LLM-using failure paths, details.llm_trace appears when designed
  • Run with --no-core-dump and confirm no new dump file (or no write, per existing behavior).

Checklist

  • Code changes limited to PDD CLI / sync / dump behavior (no PDD_CAP prompt files in this PR).
  • uv run pytest passes for the suites/commands above (or full CI-equivalent run).
  • No secrets or raw API keys in llm_trace / file snapshots (redaction/truncation behavior reviewed).
  • Manual spot-check of a real .pdd/core_dumps/*.json after a failed sync.
  • Prompts submitted to PDD_CAP Repo as PR

Fixes #710

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves PDD “core dump” debug snapshots to make sync failures diagnosable from the JSON alone (structured non-exception errors, per-operation sync steps derived from meta logs, failure-only LLM/test traces, and ANSI/OSC stripping for captured terminal output).

Changes:

  • Add structured core-dump errors via record_core_dump_error and record logical/non-exception failure paths in sync.
  • Bump core dump schema to v2 and enrich dumps with auto-included meta artifacts plus derived sync_steps.
  • Capture failure-only debugging context (ANSI/OSC-clean terminal output, truncated test output excerpts, and last LLM prompt/response pair).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pdd/core/errors.py Adds structured core-dump error recording API.
pdd/core/dump.py Bumps schema to v2; auto-includes meta sync/run files; derives sync_steps; ensures steps[*].model defaults to "unknown".
pdd/core/cli.py Expands ANSI stripping to cover CSI + OSC sequences.
pdd/core/llm_trace.py Introduces lightweight, redacted/truncated LLM prompt/response trace capture by operation.
pdd/llm_invoke.py Records best-effort LLM traces for cloud + LiteLLM paths.
pdd/sync_orchestration.py Records logical failures as structured core-dump errors; captures truncated test output excerpts; attaches failure-only LLM traces to operation log entries.
pdd/sync_main.py Records budget exhaustion as a structured core-dump error.
tests/test_core_dump.py Updates expectations for schema v2; adds tests for meta sync/run auto-inclusion, derived sync_steps, and model defaulting.
tests/core/test_cli.py Adds tests covering OSC/cursor-sequence stripping.
tests/test_core_errors.py Adds unit test for structured error recording.
tests/test_sync_orchestration.py Adds tests validating logical-failure error recording, test output excerpt truncation, and LLM trace attachment.
pdd/simple_math.py Adds a new module (appears unrelated to the PR’s stated scope).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pdd/sync_orchestration.py
Comment thread pdd/simple_math.py Outdated
Comment thread pdd/llm_invoke.py Outdated
@gltanaka
Copy link
Copy Markdown
Contributor

target 3/24

@vishalramvelu
Copy link
Copy Markdown
Contributor Author

Units Tests failing because TestConvergencePromptRequirements expects literal substrings in agentic_e2e_fix_orchestrator_python.prompt, but updating that prompt caused a merge conflict with main on the same file. Leaving the prompt as-is for now. We can align prompt text with Issue #903 tests in a follow-up once the branch is rebased cleanly or the file is de-conflicted upstream.

@gltanaka
Copy link
Copy Markdown
Contributor

Hey @vishalramvelu — CI is failing because the PR's version of pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt is missing the Requirement 5a/5b convergence sections ("empty dev-units short-circuit" and "per-cycle file-hash comparison") that the updated tests assert on.

The two failing tests:

  • test_orchestrator_prompt_contains_empty_dev_units_requirement
  • test_orchestrator_prompt_contains_file_hash_comparison_requirement

These requirements exist in upstream main but weren't picked up by the fork. Could you rebase onto the latest upstream main? That should bring in the prompt content and resolve the merge conflicts as well.

@gltanaka
Copy link
Copy Markdown
Contributor

target 3/31

Made-with: Cursor

# Conflicts:
#	pdd/prompts/agentic_bug_step11_e2e_test_LLM.prompt
#	pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt
#	tests/test_issue_633_reproduction.py
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vishalramvelu — the core changes here look solid and well-tested. Just 3 files that need to be removed before we can merge:

  1. pdd/core_dump_smoke.py — PDD-generated test file that from solution import add and from z3 import .... Neither exists in the package, so this will break imports. Same issue Copilot flagged for simple_math.py (which you removed) — this one slipped through.

  2. context/simple_math_example.py — Another PDD-generated artifact, unrelated to this PR's scope.

  3. uv.lock — The project doesn't use uv (no references in Makefile, CI, or pyproject.toml). This looks like it was committed from your local setup. 3,637 lines we don't need.

Once those are removed, this is good to merge.

@vishalramvelu
Copy link
Copy Markdown
Contributor Author

Hey @vishalramvelu — the core changes here look solid and well-tested. Just 3 files that need to be removed before we can merge:

  1. pdd/core_dump_smoke.py — PDD-generated test file that from solution import add and from z3 import .... Neither exists in the package, so this will break imports. Same issue Copilot flagged for simple_math.py (which you removed) — this one slipped through.
  2. context/simple_math_example.py — Another PDD-generated artifact, unrelated to this PR's scope.
  3. uv.lock — The project doesn't use uv (no references in Makefile, CI, or pyproject.toml). This looks like it was committed from your local setup. 3,637 lines we don't need.

Once those are removed, this is good to merge.

Ok just removed those files.

…ase conflict

- Combine TestCommand (command + optional cwd) from main with
  _run_fix_operation_test_subprocess for fix-phase test capture.
- Update test_sync_orchestration mocks: TestCommand instances, patch
  _run_fix_operation_test_subprocess where fix flow replaced subprocess.run.

Made-with: Cursor
@gltanaka
Copy link
Copy Markdown
Contributor

gltanaka commented Apr 6, 2026

target 4/7

Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior review feedback addressed. Verified end-to-end on an isolated cherry-pick branch against gltanaka/pdd main:

  • 359/359 unit tests pass (core dump, errors, CLI, sync orchestration, update)
  • 74/75 cloud batch tasks pass (1 failure is a prompt include tag assertion unrelated to this PR)
  • All 8 Issue #710 acceptance criteria covered and verified

LGTM - ready to merge.

@gltanaka gltanaka merged commit 21e5f0a into promptdriven:main Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve core dumps: per-op sync steps, structured failures, and failure-only LLM/test traces

3 participants