feat(worker-synthesis): store synthesis errors and add regeneration#58
Closed
charlie83Gs wants to merge 1 commit intomainfrom
Closed
feat(worker-synthesis): store synthesis errors and add regeneration#58charlie83Gs wants to merge 1 commit intomainfrom
charlie83Gs wants to merge 1 commit intomainfrom
Conversation
When a synthesizer agent fails to produce text, instead of storing a
fake fallback string as the definition, the workflow now:
- Creates the node with no definition
- Stores error info in metadata["synthesis_error"]
- Stores the original SynthesizerInput in metadata["synthesis_input"]
for later regeneration
Adds two new Hatchet workflows:
- regenerate_synthesis_wf: re-runs the agent on a failed synthesis node
in-place, preserving the node ID. If the node is a sub-synthesis with
a parent super-synthesis, automatically dispatches recombine.
- recombine_supersynthesis_wf: re-runs only the combine step on an
existing super-synthesis using its current sub-synthesis documents.
The super-synthesizer now writes parent_supersynthesis_id back to each
sub-synthesis node's metadata, enabling the cascade.
API: adds POST /syntheses/{id}/regenerate endpoint and status/error_message
fields to response schemas.
Frontend: shows error banner with regenerate button on failed syntheses,
error badge in the investigations list, and polls for regeneration progress.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
abanding stale work based on an old version |
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
Contributor
Author
This was referenced Apr 20, 2026
charlie83Gs
added a commit
that referenced
this pull request
Apr 20, 2026
…d refresh, typed status, tiebreak
Addresses the second-pass review's four flagged items:
- **find_in_flight_for_graph()** (🔴): new method filtering status IN
('pending', 'running') — the honest variant of most_recent_for_graph
for the Phase 7 #58 auto-dispatch "is this graph already processing
a migration?" question. most_recent_for_graph keeps its name but
the docstring now warns "regardless of status" and points callers
asking about in-flight state at the new method.
- **mark_running workflow_run_id refresh** (🟡): optional
``workflow_run_id`` param refreshes the column when re-dispatching
a failed hop under a new Hatchet run. Omitting it preserves the
existing pointer (retry-without-redispatch). Audit row → live
workflow navigation survives the re-dispatch flow.
- **ORDER BY tiebreak** (🟡): list_for_graph + most_recent_for_graph +
find_in_flight_for_graph all add id.desc() as secondary sort.
Batch inserts with microsecond-resolution collisions now return
deterministic orderings.
- **MigrationRunStatus Literal type** (🟡): new type alias in
``kt_db.models`` narrowing status to the closed set
``{'pending','running','success','failed','skipped'}``. GraphMigrationRun
column is now ``Mapped[MigrationRunStatus]`` so pyright catches
``"succeded"``-class typos at every write site. No DB migration
needed — column stays VARCHAR(16); adding a new state is a
code-only change (update Literal + worker).
Tests (7 new): find_in_flight returns None / pending / running,
skips terminal states, prefers newest when multiple; mark_running
refreshes workflow_run_id when provided, preserves when omitted.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
charlie83Gs
added a commit
that referenced
this pull request
Apr 20, 2026
charlie83Gs
added a commit
that referenced
this pull request
Apr 20, 2026
This was referenced Apr 20, 2026
charlie83Gs
added a commit
that referenced
this pull request
Apr 20, 2026
Three bugs from PR review: 1. **mark_failed audit row never persisted on hop crash.** ``run_hop`` flushes the failed row but doesn't commit — committing is the caller's job. The workflow's outer ``async with`` closed on the exception path before we committed, rolling back the flushed write, so the history API never surfaced failures from crashes. Fix: catch inside the async-with, commit, then re-raise into the outer handler that flips ``fail_migration`` on the graph row. New test ``test_failed_hop_persists_failed_audit_row`` reads the audit row from a fresh session to prove durability. 2. **Target-ahead-of-plugin silently stamped wrong version.** If the dispatcher asked for v3 but the plugin topped out at v2, the workflow trimmed the plan to v1→v2 yet still stamped ``graph_type_version=3`` at commit — sync worker would read v3 on v2 data. Fix: abort up-front when ``target_version > plugin.current_version``, leaving the graph untouched. Updated test ``test_target_ahead_of_plugin_aborts_before_any_hop`` asserts the abort path end-to-end (no hop invoked, no version bump, no read_only flip). 3. **Misleading "per-hop refresh" comment.** Removed. There was no ``ctx.refresh_timeout`` call to justify the comment. Per-hop timeout refresh is a future enhancement, can be wired via a callback if/when we see real long-running hops. Reviewer's minor items left for follow-ups: - Advisory-lock gap between begin/commit/fail is noted for #58/#59 integration (``find_in_flight_for_graph`` mitigates at dispatch time). - ``repr(exc)`` sanitization on the SSE stream — fine as-is for the internal operator view; expose-sanitized version if/when we surface this on a user-facing stream.
charlie83Gs
added a commit
that referenced
this pull request
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
metadata["synthesis_error"]) instead of a fake fallback string. The original input is preserved inmetadata["synthesis_input"]for regeneration.POST /syntheses/{id}/regenerateAPI endpoint that dispatches a newregenerate_synthesis_wfHatchet workflow to re-run the agent on the existing node in-place.recombine_supersynthesis_wfon the parent super-synthesis viaparent_supersynthesis_idback-references.Changes
Backend
_helpers.py(new) — Extractedrun_synthesis_agent(),process_and_store_synthesis(),store_synthesis_error(), andrun_super_synthesis_combine()helpers to share between original and regeneration workflowsregenerate.py(new) —regenerate_synthesis_wfandrecombine_supersynthesis_wfHatchet workflowssynthesizer.py— Uses helpers, stores error state on failure instead of fallback textsuper_synthesizer.py— Same error handling, writesparent_supersynthesis_idback to sub-synthesessyntheses.py(API) — New regenerate endpoint,status/error_messagefields in response schemasmodels.py(kt-hatchet) —RegenerateSynthesisInput,RecombineSuperSynthesisInputFrontend
SynthesisDocument.tsxTest plan
🤖 Generated with Claude Code