Make StepFailed pickles portable across processes (closes #110)#129
Make StepFailed pickles portable across processes (closes #110)#129
Conversation
rafacm
left a comment
There was a problem hiding this comment.
Looks correct in the runtime path, but I would fix the session transcript documentation before merging.
Correctness
episodes/workflows.py:82-88 builds Exception.args from one formatted string, and episodes/workflows.py:90-114 now reduces every StepFailed subclass to builtins.RuntimeError(message). I checked the DBOS CLI path: workflow steps calls DBOSClient.list_workflow_steps(), DBOS safe-deserializes the error, and the CLI prints JSON with a default encoder that falls back to str(obj). It does not inspect episode_id, error_message, or subclass type, so RuntimeError(str(self)) is sufficient for the CLI/Conductor portability goal.
I also checked typed-shape usage on origin/main. There are no except FetchDetailsFailed / subclass control-flow catches; the only typed checks are tests around _raise_if_failed and class inheritance. Admin rendering also uses string output, not exception class branching.
Minor concern: the legacy fallback at episodes/admin.py:71-75 no longer shows a raw 4 KB blob, but it also does not recover the readable legacy error message from pre-fix pickle bytes. Issue #110's body says legacy rows should be handled gracefully, “probably via a fallback that extracts the readable substring from the stored bytes when full unpickle fails.” This implementation gives an opaque base64 preview inside <could not deserialize: ...>. That is better than the previous wall of base64, but it is not equivalent to rendering the old human-readable failure message.
Tests
The new test at episodes/tests/test_admin.py:132-153 is meaningful: if someone reverts __reduce__ to (self.__class__, (...)), the type(rehydrated) is RuntimeError assertion fails even in-process. I also manually disassembled a representative pickle and confirmed it contains builtins.RuntimeError, not episodes.workflows.
I ran the three touched admin tests locally:
uv run python manage.py test episodes.tests.test_admin.EpisodeAdminTests.test_step_failed_pickle_round_trip_yields_runtime_error episodes.tests.test_admin.EpisodeAdminTests.test_decode_dbos_payload_unpickles_step_output episodes.tests.test_admin.EpisodeAdminTests.test_decode_dbos_payload_passthrough_for_plain_valuesThey pass. I did not rerun the full 372-test suite.
Suggested test follow-up: add a small subprocess/no-source-path assertion or pickle disassembly assertion so the cross-process property is explicit rather than inferred from type(...) is RuntimeError.
Documentation
The required files exist: plan (doc/plans/2026-05-04-stepfailed-portable-pickle.md), feature doc (doc/features/2026-05-04-stepfailed-portable-pickle.md), both session transcripts, and the changelog entry at CHANGELOG.md:7-11.
Blocking documentation issue: the session transcripts do not include verbatim user messages. The planning transcript's user section is a parenthesized summary at doc/sessions/2026-05-04-stepfailed-portable-pickle-planning-session.md:13-15, and the implementation transcript's user section is another summary/reference at doc/sessions/2026-05-04-stepfailed-portable-pickle-implementation-session.md:13-15. The repo instructions require user messages to be “verbatim, unedited text.” These should be replaced with the actual prompt text, or explicitly corrected if the exact text is unrecoverable.
Other findings
No unused imports or obvious regression risks in the code diff. The gAS pickle-looking heuristic at episodes/admin.py:67 is consistent with DBOS's current pickle.dumps(...) default protocol behavior in this environment.
Suggested follow-ups:
- Replace the summarized transcript user sections with the actual verbatim user messages before merge.
- Consider extracting printable strings from failed legacy pickle bytes so old rows show the original step failure text, not only a base64 preview.
- Add an explicit subprocess or pickle-symbol test for the no-
episodes.workflowsimport scenario.
Given the transcript requirement is explicit repo policy for feature PRs, I would not merge until that documentation issue is fixed. The runtime change itself looks sound.
— Codex review (independent second-opinion pass)
5ed7a35 to
3d2c71e
Compare
AI Checks summary❌ 1 fail · ✅ 6 pass · ⏭️ 0 skip ❌ Branching & PR Strategy
PR is from a feature branch and violates the branching strategy. DetailsThe PR branch is not up-to-date with the current Other checks (6 passing · 0 skipped)Show details
|
- Replace summarized user messages in session transcripts with verbatim text. Implementation session declared as agent-orchestrated; parent prompt reproduced verbatim. - Add subprocess pickle-portability test asserting StepFailed deserializes to RuntimeError without episodes.workflows on sys.path. - CHANGELOG: BREAKING note that pre-fix StepFailed pickles will not render human-readable text after this change. Codex follow-up #2 (extract printable strings from legacy pickle bytes) intentionally not addressed — user will clear the dev DB and accepts the breaking-change tradeoff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Follow-up commit addressing the codex review:
Codex follow-up #2 (extract printable strings from legacy pickle bytes) intentionally skipped. |
AGENTS.md's transcript policy assumed the human-with-Claude workflow where every session has real `### User` messages. Conductor's parallel- launch model breaks that assumption: the implementation session has no direct user, only an agent-orchestrated launching prompt. Add a new "Agent-orchestrated sessions" subsection to AGENTS.md and a matching note to the Feature PR Documentation Bundle AI check. The new convention: use `### Parent agent (orchestrator)` headings instead of `### User`, reproduce the parent-agent prompt verbatim, and declare the session as agent-orchestrated at the top of `## Detailed conversation`. Same verbatim rule applies — summarized parent prompts are still rejected. This eliminates a recurring false-negative on the docs check that was raised on PRs #129, #130, #131, #133. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DBOS persists per-step error payloads as base64-encoded pickle bytes. The standalone ``dbos workflow steps`` CLI and DBOS Conductor run outside the Django process and can't import ``episodes.workflows``, so unpickling a typed ``StepFailed`` subclass raised ``ModuleNotFoundError: No module named 'episodes'`` and the CLI printed "exception object could not be deserialized". Switch ``StepFailed.__reduce__`` to return ``(RuntimeError, (str(self),))``. The on-the-wire pickle is now a plain stdlib ``RuntimeError`` carrying the formatted message — any Python process can deserialize it. Worker-side semantics unchanged: ``raise DownloadStepFailed(...)`` still matches ``except StepFailed`` because ``__reduce__`` only kicks in at pickle time, after the typed exception has done its job. Verified no caller catches by typed subclass — the typed hierarchy is for log readability, not control flow. Admin's ``_decode_dbos_payload()`` previously returned the raw b64 blob when ``pickle.loads`` failed. Now returns ``<could not deserialize: <80-char preview>>`` so legacy rows whose typed-class wire format can no longer be rehydrated render an explanation instead of a wall of base64. Tests: new pickle round-trip asserting RuntimeError-on-the-wire; existing legacy-row test updated for the new fallback message. 372 tests passing. Closes #110. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Replace summarized user messages in session transcripts with verbatim text. Implementation session declared as agent-orchestrated; parent prompt reproduced verbatim. - Add subprocess pickle-portability test asserting StepFailed deserializes to RuntimeError without episodes.workflows on sys.path. - CHANGELOG: BREAKING note that pre-fix StepFailed pickles will not render human-readable text after this change. Codex follow-up #2 (extract printable strings from legacy pickle bytes) intentionally not addressed — user will clear the dev DB and accepts the breaking-change tradeoff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ebff7f5 to
9d1897f
Compare
Merging with
|
- Trim verbose multi-paragraph comments in episodes/apps.py:_init_dbos per .ai-checks/comment-discipline.md (one-line WHY only). - Add test_apps_dbos_init.py covering role detection for uvicorn, runserver, submit_episode, enrich_entities, migrate, test, shell. - Merge duplicate ### Fixed section under 2026-05-04 in CHANGELOG.md after rebasing onto main (PR #129's StepFailed fix).
- Trim verbose multi-paragraph comments in episodes/apps.py:_init_dbos per .ai-checks/comment-discipline.md (one-line WHY only). - Add test_apps_dbos_init.py covering role detection for uvicorn, runserver, submit_episode, enrich_entities, migrate, test, shell. - Merge duplicate ### Fixed section under 2026-05-04 in CHANGELOG.md after rebasing onto main (PR #129's StepFailed fix).
Summary
StepFailed.__reduce__now returns(RuntimeError, (str(self),))so DBOS-stored exceptions deserialize as a plain stdlibRuntimeErrorin any process — the standalonedbos workflow stepsCLI and DBOS Conductor can rehydrate them without importingepisodes.workflows. Worker-side typed hierarchy unchanged (raise DownloadStepFailed(...)still matchesexcept StepFailed)._decode_dbos_payload()legacy-row fallback now renders<could not deserialize: <80-char preview>>instead of the raw b64 blob.Builds on the partial fix from
9ff6719. Closes #110.Test plan
uv run python manage.py test— 372 tests passingpickle.loads(pickle.dumps(DownloadStepFailed(...)))returns aRuntimeErrorcarrying the formatted messageuv run dbos workflow steps episode-<id>-run-<n>against a freshly-failed workflow renders the formatted error with no "could not be deserialized" warning (not exercised in this PR — would require a live DBOS instance)🤖 Generated with Claude Code