Skip to content

feat(sources): SpiralDBTableSource with PK as default tag columns#111

Merged
eywalker merged 2 commits intodevfrom
eywalker/plt-1073-implement-source-based-on-spiraldb-with-pk-as-default-tag
Mar 26, 2026
Merged

feat(sources): SpiralDBTableSource with PK as default tag columns#111
eywalker merged 2 commits intodevfrom
eywalker/plt-1073-implement-source-based-on-spiraldb-with-pk-as-default-tag

Conversation

@kurodo3
Copy link
Copy Markdown
Contributor

@kurodo3 kurodo3 Bot commented Mar 26, 2026

Summary

  • Implements SpiralDBTableSource, a read-only RootSource backed by a SpiralDB table
  • PK (key-schema) columns are used as tag columns by default; explicit tag_columns override at construction time
  • Tables with no key schema and no explicit tag_columns raise ValueError (no implicit ROWID fallback — SpiralDB has no equivalent)
  • Config round-trip (to_config / from_config) supported; registered as "spiraldb_table" in the pipeline serialization registry
  • Follows the same SQLiteTableSource pattern: opens SpiralDBConnector, delegates to DBTableSource, closes connector immediately after eager load

Files changed

File Change
src/orcapod/core/sources/spiraldb_table_source.py NewSpiralDBTableSource class
src/orcapod/core/sources/__init__.py Export + __all__ entry
src/orcapod/pipeline/serialization.py Register "spiraldb_table" in source registry
tests/test_core/sources/test_spiraldb_table_source.py New — 50 tests (all passing)

Test plan

  • All 50 new tests pass (mocked SpiralDB connector, no live instance required)
  • Full test suite (2718 tests) passes with no regressions
  • Covers: import/export, protocol conformance, PK as default tags, composite PK, explicit tag override, no-PK error, empty/missing table errors, connector lifecycle, stream behaviour, deterministic hashing, config round-trip, pipeline integration

Closes PLT-1073

🤖 Generated with Claude Code

…olumns

Add SpiralDBTableSource — a read-only RootSource backed by a SpiralDB
table. PK (key-schema) columns are used as tag columns by default;
explicit tag_columns override this at construction time. Tables with no
key schema and no explicit tag_columns raise ValueError.

The class follows the same pattern as SQLiteTableSource: it opens a
SpiralDBConnector, delegates to DBTableSource for fetching and stream
building, then closes the connector immediately (eager load). Config
round-trip (to_config / from_config) and serialization registration
under the "spiraldb_table" key are included.

50 unit and integration tests added, all passing.

Closes PLT-1073

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eywalker eywalker requested a review from Copilot March 26, 2026 02:55
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 46.42857% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/orcapod/core/sources/spiraldb_table_source.py 42.30% 15 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class support for reading SpiralDB tables as OrcaPod sources, integrating the new source into the existing source ecosystem and pipeline serialization.

Changes:

  • Introduces SpiralDBTableSource, a read-only DBTableSource wrapper that opens a SpiralDBConnector, eagerly loads a table, and closes the connector immediately.
  • Registers the new source type as "spiraldb_table" in the pipeline source registry and exports it from orcapod.core.sources (and thus orcapod.sources).
  • Adds a comprehensive new test suite covering schema/tag behavior, error cases, hashing, config round-trip, and pipeline integration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/orcapod/core/sources/spiraldb_table_source.py New SpiralDB-backed table source with config serialization and connector lifecycle handling.
src/orcapod/core/sources/__init__.py Exports SpiralDBTableSource via module imports and __all__.
src/orcapod/pipeline/serialization.py Registers "spiraldb_table" in SOURCE_REGISTRY so pipelines can deserialize it.
tests/test_core/sources/test_spiraldb_table_source.py New test coverage validating behavior and integration (mocked connector).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +150
def to_config(self) -> dict[str, Any]:
"""Serialize source configuration to a JSON-compatible dict."""
base = super().to_config()
base.pop("connector", None)
return {
**base,
"source_type": "spiraldb_table",
"project_id": self._project_id,
"dataset": self._dataset,
"overrides": self._overrides,
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpiralDBTableSource.to_config() currently calls super().to_config(), which invokes DBTableSource.to_config() and therefore calls self._connector.to_config(). SpiralDBConnector.to_config() raises RuntimeError after the connector is closed, but SpiralDBTableSource closes the connector in init. This makes to_config (and thus pipeline serialization) fail for real SpiralDBTableSource instances. Fix by overriding to_config to build the dict without calling DBTableSource.to_config()/connector.to_config(), or capture the connector config while still open and reuse it without requiring an open connector.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SpiralDBTableSource.to_config() no longer calls super().to_config() (and thus no longer calls the closed connector). The method now builds the dict directly from the already-captured instance attributes (self._table_name, self._tag_columns, etc.) plus self._identity_config(), bypassing the connector entirely. See spiraldb_table_source.py for the updated docnote explaining the reasoning.

Comment on lines +534 to +550
class TestConfigSerialization:
def test_to_config_source_type(self):
from orcapod.core.sources import SpiralDBTableSource

connector = _make_mock_connector()
with _patch_connector(connector):
src = SpiralDBTableSource(_PROJECT_ID, _TABLE_NAME)
assert src.to_config()["source_type"] == "spiraldb_table"

def test_to_config_has_no_connector_key(self):
from orcapod.core.sources import SpiralDBTableSource

connector = _make_mock_connector()
with _patch_connector(connector):
src = SpiralDBTableSource(_PROJECT_ID, _TABLE_NAME)
assert "connector" not in src.to_config()

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_config tests currently use a plain MagicMock connector, so they won't catch that the real SpiralDBConnector.to_config() raises once the connector is closed (and SpiralDBTableSource closes it during init). Consider using a small fake connector (or configuring the mock) where close() flips a flag and to_config() raises when closed, and assert SpiralDBTableSource.to_config() still works and does not call connector.to_config().

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. _make_mock_connector() now simulates the real SpiralDBConnector lifecycle: close() flips a _closed flag and to_config() raises RuntimeError when closed. Added test_to_config_works_after_connector_is_closed as an explicit regression guard — it asserts the connector is closed after __init__ and then calls to_config() to verify it does not raise.

SpiralDBTableSource.to_config() was delegating to super().to_config()
(DBTableSource), which called self._connector.to_config(). Because
SpiralDBConnector.to_config() raises RuntimeError once closed, and the
connector is always closed in __init__ after the eager data load, every
call to to_config() on a real SpiralDBTableSource instance would raise.

Fix: override to_config() to build the dict directly from the already-
captured instance attributes, bypassing the closed connector entirely.

Also update _make_mock_connector() in the test suite so close() flips a
flag and to_config() raises RuntimeError when closed, matching real
SpiralDBConnector behaviour. Add test_to_config_works_after_connector_is_closed
as an explicit regression guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3
Copy link
Copy Markdown
Contributor Author

kurodo3 Bot commented Mar 26, 2026

Review round 1 — changes made

Two issues addressed in commit 7c117c9:

Bug fix — to_config() calling closed connector

SpiralDBTableSource.to_config() was delegating to super().to_config()DBTableSource.to_config()self._connector.to_config(). Since SpiralDBConnector.to_config() calls _require_open() and the connector is always closed during __init__, every invocation of to_config() on a real instance would raise RuntimeError. Fixed by overriding to_config() to build the dict directly from the stored instance attributes (self._table_name, self._tag_columns, self._system_tag_columns, self._record_id_column, self.source_id, self._identity_config()), with no connector involvement.

Test — lifecycle-aware mock

Updated _make_mock_connector() so close() flips a _closed flag and to_config() raises RuntimeError when closed — matching the real SpiralDBConnector. Added test_to_config_works_after_connector_is_closed as a regression guard that verifies the connector is closed after __init__ and that to_config() still succeeds without raising.

All 51 tests pass.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eywalker eywalker merged commit 72c4373 into dev Mar 26, 2026
8 of 9 checks passed
@eywalker eywalker deleted the eywalker/plt-1073-implement-source-based-on-spiraldb-with-pk-as-default-tag branch March 26, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants