feat(worker-synthesis): store synthesis errors and add regeneration by charlie83Gs · Pull Request #58 · openktree/knowledge-tree

charlie83Gs · 2026-03-29T21:57:38Z

Summary

When a synthesizer agent fails to produce text, the workflow now stores the failure as an error state (metadata["synthesis_error"]) instead of a fake fallback string. The original input is preserved in metadata["synthesis_input"] for regeneration.
Adds POST /syntheses/{id}/regenerate API endpoint that dispatches a new regenerate_synthesis_wf Hatchet workflow to re-run the agent on the existing node in-place.
For sub-syntheses: regenerating automatically triggers recombine_supersynthesis_wf on the parent super-synthesis via parent_supersynthesis_id back-references.
Frontend shows an error banner with "Regenerate" button on failed syntheses, a "Failed" badge in the investigations list, and polls for regeneration progress.

Changes

Backend

_helpers.py (new) — Extracted run_synthesis_agent(), process_and_store_synthesis(), store_synthesis_error(), and run_super_synthesis_combine() helpers to share between original and regeneration workflows
regenerate.py (new) — regenerate_synthesis_wf and recombine_supersynthesis_wf Hatchet workflows
synthesizer.py — Uses helpers, stores error state on failure instead of fallback text
super_synthesizer.py — Same error handling, writes parent_supersynthesis_id back to sub-syntheses
syntheses.py (API) — New regenerate endpoint, status/error_message fields in response schemas
models.py (kt-hatchet) — RegenerateSynthesisInput, RecombineSuperSynthesisInput

Frontend

Error banner with regenerate button in SynthesisDocument.tsx
"Failed" badge in investigations list
Regeneration progress polling in detail page

Test plan

Frontend: lint, type-check, and 123 tests pass
Backend: worker-synthesis 9 tests pass, API 92 tests pass
Manual: trigger a synthesis failure (tiny budget, no matching nodes), verify error state is shown, click regenerate, verify document is produced

🤖 Generated with Claude Code

When a synthesizer agent fails to produce text, instead of storing a fake fallback string as the definition, the workflow now: - Creates the node with no definition - Stores error info in metadata["synthesis_error"] - Stores the original SynthesizerInput in metadata["synthesis_input"] for later regeneration Adds two new Hatchet workflows: - regenerate_synthesis_wf: re-runs the agent on a failed synthesis node in-place, preserving the node ID. If the node is a sub-synthesis with a parent super-synthesis, automatically dispatches recombine. - recombine_supersynthesis_wf: re-runs only the combine step on an existing super-synthesis using its current sub-synthesis documents. The super-synthesizer now writes parent_supersynthesis_id back to each sub-synthesis node's metadata, enabling the cascade. API: adds POST /syntheses/{id}/regenerate endpoint and status/error_message fields to response schemas. Frontend: shows error banner with regenerate button on failed syntheses, error badge in the investigations list, and polls for regeneration progress. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

charlie83Gs · 2026-04-19T23:54:04Z

abanding stale work based on an old version

github-actions · 2026-04-19T23:54:19Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

charlie83Gs · 2026-04-19T23:55:40Z

Closed in favor of #229 — a narrower, additive re-scoping of the same functionality against current main. Reference commit 98d7a3e preserved on feat/synthesis-error-state-regenerate.

…d refresh, typed status, tiebreak Addresses the second-pass review's four flagged items: - **find_in_flight_for_graph()** (🔴): new method filtering status IN ('pending', 'running') — the honest variant of most_recent_for_graph for the Phase 7 #58 auto-dispatch "is this graph already processing a migration?" question. most_recent_for_graph keeps its name but the docstring now warns "regardless of status" and points callers asking about in-flight state at the new method. - **mark_running workflow_run_id refresh** (🟡): optional ``workflow_run_id`` param refreshes the column when re-dispatching a failed hop under a new Hatchet run. Omitting it preserves the existing pointer (retry-without-redispatch). Audit row → live workflow navigation survives the re-dispatch flow. - **ORDER BY tiebreak** (🟡): list_for_graph + most_recent_for_graph + find_in_flight_for_graph all add id.desc() as secondary sort. Batch inserts with microsecond-resolution collisions now return deterministic orderings. - **MigrationRunStatus Literal type** (🟡): new type alias in ``kt_db.models`` narrowing status to the closed set ``{'pending','running','success','failed','skipped'}``. GraphMigrationRun column is now ``Mapped[MigrationRunStatus]`` so pyright catches ``"succeded"``-class typos at every write site. No DB migration needed — column stays VARCHAR(16); adding a new state is a code-only change (update Literal + worker). Tests (7 new): find_in_flight returns None / pending / running, skips terminal states, prefers newest when multiple; mark_running refreshes workflow_run_id when provided, preserves when omitted. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three bugs from PR review: 1. **mark_failed audit row never persisted on hop crash.** ``run_hop`` flushes the failed row but doesn't commit — committing is the caller's job. The workflow's outer ``async with`` closed on the exception path before we committed, rolling back the flushed write, so the history API never surfaced failures from crashes. Fix: catch inside the async-with, commit, then re-raise into the outer handler that flips ``fail_migration`` on the graph row. New test ``test_failed_hop_persists_failed_audit_row`` reads the audit row from a fresh session to prove durability. 2. **Target-ahead-of-plugin silently stamped wrong version.** If the dispatcher asked for v3 but the plugin topped out at v2, the workflow trimmed the plan to v1→v2 yet still stamped ``graph_type_version=3`` at commit — sync worker would read v3 on v2 data. Fix: abort up-front when ``target_version > plugin.current_version``, leaving the graph untouched. Updated test ``test_target_ahead_of_plugin_aborts_before_any_hop`` asserts the abort path end-to-end (no hop invoked, no version bump, no read_only flip). 3. **Misleading "per-hop refresh" comment.** Removed. There was no ``ctx.refresh_timeout`` call to justify the comment. Per-hop timeout refresh is a future enhancement, can be wired via a callback if/when we see real long-running hops. Reviewer's minor items left for follow-ups: - Advisory-lock gap between begin/commit/fail is noted for #58/#59 integration (``find_in_flight_for_graph`` mitigates at dispatch time). - ``repr(exc)`` sanitization on the SSE stream — fine as-is for the internal operator view; expose-sanitized version if/when we surface this on a user-facing stream.

… (#269)

charlie83Gs closed this Apr 19, 2026

charlie83Gs mentioned this pull request Apr 19, 2026

Synthesis failure UX: error-state storage + regeneration workflows #229

Open

This was referenced Apr 20, 2026

feat(kt-config): GraphMigration planner (Phase 7 #56) #254

Merged

feat(kt-db): GraphMigrationRunRepository for audit-table access #258

Merged

charlie83Gs added a commit that referenced this pull request Apr 20, 2026

feat(kt-db): find_out_of_date_graphs helper (Phase 7 #58) (#260)

6665254

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

charlie83Gs added a commit that referenced this pull request Apr 20, 2026

feat(kt-db): plan_auto_dispatch helper (Phase 7 #58) (#263)

f2049b8

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

charlie83Gs added a commit that referenced this pull request Apr 20, 2026

feat(kt-db): startup auto-dispatch for out-of-date graphs (Phase 7 #58)…

693bacb

… (#269)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(worker-synthesis): store synthesis errors and add regeneration#58

feat(worker-synthesis): store synthesis errors and add regeneration#58
charlie83Gs wants to merge 1 commit intomainfrom
feat/synthesis-error-state-regenerate

charlie83Gs commented Mar 29, 2026

Uh oh!

charlie83Gs commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

charlie83Gs commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlie83Gs commented Mar 29, 2026

Summary

Changes

Backend

Frontend

Test plan

Uh oh!

charlie83Gs commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

charlie83Gs commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant