Conversation
… UNAVAILABLE An OperatorNode whose cache_mode=OFF never writes any records to the pipeline database. Previously, from_descriptor() would set LoadStatus.READ_ONLY whenever pipeline_db was not None — regardless of whether any data was ever persisted. This caused the downstream FunctionNode to treat the operator as a usable stream (FULL mode) and attempt computation, which failed because the operator had no live data stream. The fix: only promote to READ_ONLY when the operator has actually persisted data (cache_mode=LOG or REPLAY). A cache_mode=OFF operator remains UNAVAILABLE even when a pipeline_db is present. With the operator correctly UNAVAILABLE, _load_function_node enters the UNAVAILABLE branch and wires the function pod in CACHE_ONLY mode, allowing it to serve all previously cached results from the DB without touching the unavailable operator or sources. Changes: - OperatorNode.from_descriptor: guard READ_ONLY promotion with `cache_mode != CacheMode.OFF` - Update three existing tests whose expectations were wrong (uncached Join in read_only and full mode → UNAVAILABLE, not READ_ONLY) - Add TestPLT1158UncachedOperatorStatus with five regression tests, including the pipeline from the issue (two DictSources → join → adder) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Fixes PLT-1158 by correcting how deserialized uncached operator nodes (e.g., implicit Joins) resolve their LoadStatus when upstream sources are UNAVAILABLE, ensuring downstream function nodes correctly fall back to CACHE_ONLY instead of attempting computation.
Changes:
- Update
OperatorNode.from_descriptor()to promote toREAD_ONLYonly whenpipeline_dbexists andcache_mode != CacheMode.OFF. - Update 3 existing serialization tests to expect uncached operators to load as
UNAVAILABLE. - Add a new regression test suite covering the PLT-1158 reproduction pipeline and the downstream
CACHE_ONLYcascade.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/orcapod/core/nodes/operator_node.py |
Fixes operator LoadStatus derivation on load by gating READ_ONLY on non-OFF cache modes. |
tests/test_pipeline/test_serialization.py |
Aligns existing expectations with corrected semantics and adds PLT-1158 regression coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loaded = Pipeline.load(json_path, mode="full") | ||
|
|
||
| # The implicit join operator has no DB caching → UNAVAILABLE | ||
| join_node = loaded.compiled_nodes["adder"] |
There was a problem hiding this comment.
In test_uncached_operator_is_unavailable_not_read_only, join_node = loaded.compiled_nodes["adder"] is both unused and misleading (it points to the function node, not the join/operator). Consider removing it or renaming/using it to assert on the function node explicitly, and keep operator assertions scoped to the actual join node if possible to reduce brittleness.
| join_node = loaded.compiled_nodes["adder"] |
There was a problem hiding this comment.
Fixed — the assignment is removed. The operator nodes are now found exclusively via from , with the comment updated to make that clear.
Remove the spurious `join_node = loaded.compiled_nodes["adder"]` assignment that was both unused and misleading (it referenced the function node, not the operator/join). The comment above it is also tightened to clarify that operator nodes are found via `node_type`, not by label. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review round addressedUnused variable removed in The stale |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Review round 2 — no new commentsCopilot's second pass produced no additional comments. The PR is ready for human review. |
Summary
Fixes PLT-1158: an implicit operator node (e.g. a
Joinbetween two source nodes) with no explicit database caching incorrectly resolved toREAD_ONLYon pipeline load, causing downstream function nodes to fail when attempting computation.OperatorNode.from_descriptor()promoted any node toREAD_ONLYwhen apipeline_dbwas present — regardless ofcache_mode. An uncached operator (cache_mode=OFF) never writes records to the database, so there is nothing to read back;UNAVAILABLEis the correct status.READ_ONLYpromotion oncache_mode != CacheMode.OFF. Only operators that actively persist results (LOGorREPLAYmode) are promoted toREAD_ONLY.UNAVAILABLE,_load_function_nodeenters the existingUNAVAILABLEbranch and wires the downstream function pod inCACHE_ONLYmode, allowing it to serve all previously cached results from the DB without touching the unavailable operator or sources.Changes
src/orcapod/core/nodes/operator_node.pyREAD_ONLYpromotion withcache_mode != CacheMode.OFFinfrom_descriptor()tests/test_pipeline/test_serialization.pyJoin→UNAVAILABLE); addTestPLT1158UncachedOperatorStatuswith 5 regression testsTest plan
test_full_mode_operator_degrades_when_sources_unavailable,test_read_only_operator_with_join,test_load_multi_source_operator_pipeline_read_onlyTestPLT1158UncachedOperatorStatus:test_uncached_operator_is_unavailable_not_read_only— operator status isUNAVAILABLEtest_source_nodes_are_unavailable— pre-condition checktest_function_node_gets_cache_only_when_operator_is_unavailable— downstream cascadetest_cache_only_function_node_serves_cached_results— end-to-end: cached results served correctlytest_read_only_mode_uncached_operator_is_also_unavailable— consistent across load modesDictSources → uncachedJoin→adderfunction pod)Closes PLT-1158
🤖 Generated with Claude Code