Skip to content

feat(spiraldb): implement SpiralDBConnector (PLT-1074)#109

Merged
eywalker merged 24 commits intodevfrom
feat/plt-1074-spiraldb-connector
Mar 25, 2026
Merged

feat(spiraldb): implement SpiralDBConnector (PLT-1074)#109
eywalker merged 24 commits intodevfrom
feat/plt-1074-spiraldb-connector

Conversation

@kurodo3
Copy link
Copy Markdown
Contributor

@kurodo3 kurodo3 Bot commented Mar 25, 2026

Summary

  • Implements SpiralDBConnector — a DBConnectorProtocol backed by SpiralDB (pyspiral), enabling ConnectorArrowDatabase and DBTableSource to work with SpiralDB
  • Connector is dataset-scoped: a (project_id, dataset) pair is fixed at construction; all table names are plain strings within that dataset
  • Full Arrow-native pipeline: schema introspection, full-table scans via scan().to_record_batches(), upsert-by-key writes, and skip_existing=True via scan+filter approximation

What's in this PR

  • src/orcapod/databases/spiraldb_connector.py (new) — SpiralDBConnector with lazy spiral module loading, all DBConnectorProtocol methods, documented known limitations (timezone stripping, O(n) skip_existing, single-column PK creation)
  • src/orcapod/databases/__init__.py (updated) — exports SpiralDBConnector
  • tests/test_databases/test_spiraldb_connector.py (new) — 45 unit tests, fully mocked (no network; spiral module is patched out entirely)
  • tests/test_databases/test_spiraldb_connector_integration.py (new) — 3 integration tests against test-orcapod-362211 dev project, skipped unless SPIRAL_INTEGRATION_TESTS=1

Test Plan

  • 45 unit tests pass (mocked spiral, no network required — safe for CI)
  • 3 integration tests skip cleanly without SPIRAL_INTEGRATION_TESTS=1
  • Full tests/test_databases/ suite (290 tests) passes with no regressions
  • Run integration tests with SPIRAL_INTEGRATION_TESTS=1 after spiral login against dev project test-orcapod-362211

Known Limitations (documented in code)

  • Timezone info silently dropped from timestamp PK columns by SpiralDB at storage time (affects skip_existing=True reliability for tz-aware timestamps)
  • skip_existing=True requires full table scan (O(n))
  • create_table_if_not_exists supports single-column PKs only; composite-PK tables must be created externally

Related Issues

  • Closes PLT-1074
  • Depends on PLT-1163 (SpiralDB devenv exploration — completed)
  • Follow-on: PLT-1167 (dataset-routing connector variant)

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 24.57627% with 89 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/orcapod/databases/spiraldb_connector.py 23.93% 89 Missing ⚠️

📢 Thoughts on this report? Let us know!

kurodo3 Bot and others added 19 commits March 25, 2026 17:33
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…w feedback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or (PLT-1074)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or (PLT-1074)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or (PLT-1074)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… (PLT-1074)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… __init__

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pk_columns, get_column_info)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…TION_TESTS=1)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… PK skip_existing limitation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3 kurodo3 Bot force-pushed the feat/plt-1074-spiraldb-connector branch from b39d051 to 7d7f743 Compare March 25, 2026 17:33
@eywalker eywalker requested a review from Copilot March 25, 2026 17:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new SpiralDB-backed implementation of DBConnectorProtocol so existing database abstractions (ConnectorArrowDatabase, DBTableSource) can operate on SpiralDB via Arrow-native scans/writes, alongside unit + opt-in integration tests and supporting design/plan docs.

Changes:

  • Implement SpiralDBConnector with lazy spiral loading, schema introspection, full-table scans, table creation, upsert, and config serialization.
  • Export SpiralDBConnector from orcapod.databases.
  • Add mocked unit tests plus env-gated live integration tests; add design spec and implementation plan docs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/orcapod/databases/spiraldb_connector.py New SpiralDB connector implementing DBConnectorProtocol methods (scan/write/introspection/config).
src/orcapod/databases/__init__.py Re-export SpiralDBConnector for public import path.
tests/test_databases/test_spiraldb_connector.py Mocked unit tests covering protocol conformance and connector behavior.
tests/test_databases/test_spiraldb_connector_integration.py Opt-in integration tests against the dev SpiralDB project.
superpowers/specs/2026-03-25-spiraldb-connector-design.md Design spec documenting intended behavior/limitations.
superpowers/plans/2026-03-25-spiraldb-connector.md Implementation plan for the connector and tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +308 to +311
mask = pa.array([
tuple(row[k] for k in pk_cols) not in existing_keys
for row in records.to_pylist()
])
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When records is empty and skip_existing=True, records.to_pylist() produces an empty list so pa.array([]) will infer a null-typed array. records.filter(mask) then fails because the filter must be boolean. Consider early-returning when records.num_rows == 0 and/or constructing the mask with an explicit boolean type (e.g., type=pa.bool_()).

Suggested change
mask = pa.array([
tuple(row[k] for k in pk_cols) not in existing_keys
for row in records.to_pylist()
])
mask = pa.array(
[
tuple(row[k] for k in pk_cols) not in existing_keys
for row in records.to_pylist()
],
type=pa.bool_(),
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added type=pa.bool_() to the pa.array() call so the filter always receives a boolean array, even when records is empty.

Comment on lines +3 to +7
These tests are skipped unless ``SPIRAL_INTEGRATION_TESTS=1`` env var is set
AND valid credentials are present in ``~/.config/pyspiral/auth.json``
(obtained via ``spiral login``). If the env var is set but credentials are
absent, the tests fail rather than skip — the operator is expected to ensure
auth is in place when enabling integration tests.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says tests are skipped unless both SPIRAL_INTEGRATION_TESTS=1 and credentials are present, but the actual skip condition only checks the env var. If credentials are missing and the env var is set, these tests will fail (likely during connector init / first network call). Please adjust the docstring to match the implemented behavior (skipped only by env var; fails if auth is missing).

Suggested change
These tests are skipped unless ``SPIRAL_INTEGRATION_TESTS=1`` env var is set
AND valid credentials are present in ``~/.config/pyspiral/auth.json``
(obtained via ``spiral login``). If the env var is set but credentials are
absent, the tests fail rather than skipthe operator is expected to ensure
auth is in place when enabling integration tests.
These tests are skipped unless the ``SPIRAL_INTEGRATION_TESTS=1`` env var is set.
They require valid credentials in ``~/.config/pyspiral/auth.json`` (obtained via
``spiral login``). If the env var is set but credentials are absent, the tests
fail rather than skipthe operator is expected to ensure auth is in place
when enabling integration tests.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Rewrote the opening sentence to state the skip condition clearly (env var only), then described the credential requirement separately. The clarifying sentence about fail-vs-skip was already present and is retained.

## Overview

Implement `SpiralDBConnector`, a concrete implementation of `DBConnectorProtocol`
backed by [pyspiral](https://pypi.org/project/pyspiral/) (`spiral` module, `>=0.11.6`).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec states pyspiral requirement as >=0.11.6, but pyproject.toml currently defines the spiraldb extra as pyspiral>=0.11.0. Please align the spec with the repo’s actual dependency constraint (or update the constraint in pyproject.toml in the same change if 0.11.6 is truly required).

Suggested change
backed by [pyspiral](https://pypi.org/project/pyspiral/) (`spiral` module, `>=0.11.6`).
backed by [pyspiral](https://pypi.org/project/pyspiral/) (`spiral` module, `>=0.11.0`).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated both occurrences in the spec (>=0.11.6>=0.11.0) to match the constraint in pyproject.toml.


**Architecture:** `SpiralDBConnector` bridges the SQL-style `DBConnectorProtocol` interface to SpiralDB's expression-based scan/write API. It is dataset-scoped — a `(project_id, dataset)` pair is fixed at construction. All other layers (`ConnectorArrowDatabase`, `DBTableSource`) are unmodified; only the connector varies. `pyspiral` (`spiral` module) is lazy-loaded so `import orcapod` does not fail without it.

**Tech Stack:** Python ≥ 3.10, `pyspiral >= 0.11.6` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan lists the tech stack as “Python ≥ 3.10”, but the project’s pyproject.toml specifies requires-python = ">=3.11.0". Please update this document to reflect the actual minimum supported Python version to avoid confusion for anyone following the plan.

Suggested change
**Tech Stack:** Python ≥ 3.10, `pyspiral >= 0.11.6` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`
**Tech Stack:** Python ≥ 3.11, `pyspiral >= 0.11.6` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated Python version from 3.10 → 3.11 to match requires-python = ">=3.11.0" in pyproject.toml.


**Architecture:** `SpiralDBConnector` bridges the SQL-style `DBConnectorProtocol` interface to SpiralDB's expression-based scan/write API. It is dataset-scoped — a `(project_id, dataset)` pair is fixed at construction. All other layers (`ConnectorArrowDatabase`, `DBTableSource`) are unmodified; only the connector varies. `pyspiral` (`spiral` module) is lazy-loaded so `import orcapod` does not fail without it.

**Tech Stack:** Python ≥ 3.10, `pyspiral >= 0.11.6` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan references pyspiral >= 0.11.6, but the repo currently constrains the spiraldb extra to pyspiral>=0.11.0 in pyproject.toml. Please align the plan with the actual dependency constraint (or update the constraint if 0.11.6 is required).

Suggested change
**Tech Stack:** Python ≥ 3.10, `pyspiral >= 0.11.6` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`
**Tech Stack:** Python ≥ 3.10, `pyspiral >= 0.11.0` (optional extra `orcapod[spiraldb]`), `pyarrow`, `orcapod.utils.lazy_module.LazyModule`, `pytest`, `unittest.mock`

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated pyspiral >= 0.11.6>= 0.11.0 to match the actual constraint in pyproject.toml.

- Fix potential pa.array null-type crash when records is empty and
  skip_existing=True by passing explicit type=pa.bool_() to pa.array()
- Correct integration test docstring: skip condition is env var only;
  clarify that missing credentials cause failure, not skip
- Align pyspiral version in spec and plan docs to >=0.11.0 (matches
  pyproject.toml), not >=0.11.6
- Correct Python minimum version in plan doc from 3.10 to 3.11 (matches
  pyproject.toml requires-python)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3
Copy link
Copy Markdown
Contributor Author

kurodo3 Bot commented Mar 25, 2026

Review round addressed (commit 91be361)

All 5 Copilot comments resolved in a single commit:

# File Change
1 src/orcapod/databases/spiraldb_connector.py Bug fix: Added type=pa.bool_() to pa.array() in upsert_records — prevents null-type inference crash when records is empty and skip_existing=True
2 tests/test_databases/test_spiraldb_connector_integration.py Doc fix: Rewrote module docstring opening to separate the skip condition (env var only) from the credential requirement; eliminates the misleading "AND" that implied credentials were part of the skip logic
3 superpowers/specs/2026-03-25-spiraldb-connector-design.md Doc fix: Corrected two occurrences of >=0.11.6>=0.11.0 to match pyproject.toml
4 superpowers/plans/2026-03-25-spiraldb-connector.md Doc fix: Corrected Python version from ≥ 3.10≥ 3.11 to match requires-python = ">=3.11.0" in pyproject.toml
5 superpowers/plans/2026-03-25-spiraldb-connector.md Doc fix: Corrected pyspiral >= 0.11.6>= 0.11.0 to match pyproject.toml (same line as #4)

45 unit tests still pass.

kurodo3 Bot and others added 3 commits March 25, 2026 18:21
- Add spiral-integration job to run-tests.yml: runs on push/manual
  trigger only (not PRs), requires SPIRAL_WORKLOAD_ID secret, uses
  GitHub OIDC (permissions.id-token: write) to authenticate as workload
  work_fbjjcy (policy work_policy_p9cxfh, bound to nauticalab/orcapod-python)
- Parameterise integration test project/server via SPIRAL_PROJECT_ID and
  SPIRAL_SERVER_URL env vars (default to test-orcapod-362211 / api.spiraldb.dev)
  so CI can target the right environment without a code change
- Update integration test docstring to document both local-dev (auth.json)
  and CI (OIDC workload) auth paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- iter_batches: drop batch_size kwarg from to_record_batches() — the
  pyspiral Plan API does not accept batch_size; batch sizing is controlled
  by the Spiral execution engine. Log a warning if a non-default value is
  passed for protocol compliance.

- get_column_info: normalize pa.string() → pa.large_string() and
  pa.binary() → pa.large_binary(). SpiralDB stores string columns as
  string regardless of whether large_string was written, causing schema
  comparisons in ConnectorArrowDatabase.flush() to fail. Consistent with
  SQLiteConnector and PostgreSQLConnector which always return large_string.

Update unit tests to reflect the new behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents both issues in detail under a new "Discovered during live
integration testing" section in Known Limitations:

1. string/large_string mismatch: SpiralDB stores all string columns as
   utf8/pa.string() regardless of the written type. Without normalization,
   ConnectorArrowDatabase.flush() raised a false schema mismatch ValueError.
   Fix: get_column_info normalizes string→large_string, binary→large_binary.
   Note: iter_batches scan results are NOT normalized (intentional).

2. batch_size kwarg rejected by Plan.to_record_batches(): the pyspiral API
   does not accept batch_size at all. Fix: drop the kwarg; emit a warning
   if a non-default value is passed for protocol compliance.

Also updates:
- get_column_info code block to show normalization helper
- iter_batches section to document batch_size behaviour accurately
- Type Notes section split into raw SpiralDB types vs get_column_info
  normalization, with an explicit warning that iter_batches returns raw types
- Integration test section to document OIDC CI setup and that these tests
  were the source of both bug discoveries
- Unit test plan row for get_column_info

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eywalker eywalker requested a review from Copilot March 25, 2026 19:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +9
- CI (GitHub Actions): ``SPIRAL_WORKLOAD_ID=work_fbjjcy`` env var — workload
``work_fbjjcy`` (policy ``work_policy_p9cxfh``) has editor access on the dev
project and is bound to ``nauticalab/orcapod-python`` via GitHub OIDC.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring includes concrete workload/policy IDs (work_fbjjcy, work_policy_p9cxfh) and the repo name binding. These identifiers are likely to become stale and may be considered sensitive infrastructure details; it’s usually better to describe the requirement generically ("set SPIRAL_WORKLOAD_ID with a workload that has editor access") and keep specific IDs in a private runbook or CI configuration docs.

Suggested change
- CI (GitHub Actions): ``SPIRAL_WORKLOAD_ID=work_fbjjcy`` env var workload
``work_fbjjcy`` (policy ``work_policy_p9cxfh``) has editor access on the dev
project and is bound to ``nauticalab/orcapod-python`` via GitHub OIDC.
- CI (GitHub Actions): ``SPIRAL_WORKLOAD_ID`` env var pointing to a workload
that has editor access on the dev project and is bound to this repository via
GitHub OIDC (see internal CI/runbook docs for the exact workload/policy IDs).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced the specific workload/policy IDs and repository binding with a generic description: 'SPIRAL_WORKLOAD_ID env var pointing to a workload that has editor access on the dev project and is bound to this repository via GitHub OIDC (see internal CI/runbook docs for the exact workload/policy IDs).' The concrete IDs now live only in repo secrets/vars and internal runbook docs, not in version-controlled source.

Comment on lines +228 to +229
yield from reader

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter_batches() yields pyspiral-native string/binary column types, while ConnectorArrowDatabase writes and buffers records using large_string/large_binary (e.g. it injects __record_id as pa.large_string()) and both SQLite/PostgreSQL connectors also yield large_* types. This can break ConnectorArrowDatabase.get_all_records() / get_records_by_ids() when it needs to pa.concat_tables([committed, pending]), because committed batches from Spiral will be string but pending uses large_string, causing a schema mismatch error. Consider normalizing string→large_string and binary→large_binary in iter_batches output (e.g. cast per batch) so committed and pending tables remain schema-compatible.

Suggested change
yield from reader
# Normalize pyspiral-native string/binary columns to large_string/large_binary
# so that schemas remain compatible with ConnectorArrowDatabase and other
# connectors that use large_* types (important for pa.concat_tables).
for batch in reader:
schema = batch.schema
new_fields = []
needs_cast = False
for field in schema:
ftype = field.type
if ftype == pa.string():
new_fields.append(pa.field(field.name, pa.large_string(), field.nullable, field.metadata))
needs_cast = True
elif ftype == pa.binary():
new_fields.append(pa.field(field.name, pa.large_binary(), field.nullable, field.metadata))
needs_cast = True
else:
new_fields.append(field)
if needs_cast:
target_schema = pa.schema(new_fields, metadata=schema.metadata)
batch = batch.cast(target_schema)
yield batch

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. iter_batches now casts each batch before yielding: fields with pa.string() are cast to pa.large_string() and pa.binary() to pa.large_binary(). Only batches that actually contain these types incur the overhead (needs_cast guard). Consistent with normalization already applied in get_column_info, so ConnectorArrowDatabase pa.concat_tables([committed, pending]) will always see matching schemas. Added test_string_and_binary_normalized_to_large and updated test_full_scan_yields_batches to assert large_string output.

Comment on lines +42 to +57
spiral-integration:
runs-on: ubuntu-latest
# Only run on push/manual trigger — not on PRs — to avoid consuming SpiralDB
# resources on every review cycle. The workload work_fbjjcy is bound to
# nauticalab/orcapod-python via GitHub OIDC (policy work_policy_p9cxfh).
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
needs: test
permissions:
id-token: write # required to request a GitHub OIDC token for SpiralDB auth
contents: read
env:
SPIRAL_WORKLOAD_ID: ${{ secrets.SPIRAL_WORKLOAD_ID }}
SPIRAL_INTEGRATION_TESTS: "1"
SPIRAL_PROJECT_ID: ${{ vars.SPIRAL_PROJECT_ID }}
SPIRAL_SERVER_URL: ${{ vars.SPIRAL_SERVER_URL }}

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spiral-integration job will still run even when SPIRAL_WORKLOAD_ID / SPIRAL_PROJECT_ID / SPIRAL_SERVER_URL are unset; in GitHub Actions missing secrets.*/vars.* resolve to empty strings, which will override the test defaults and likely cause a hard failure. Consider adding an additional job-level if: guard requiring a non-empty secrets.SPIRAL_WORKLOAD_ID (and/or required vars), and/or providing expression fallbacks (e.g. ${{ vars.SPIRAL_SERVER_URL || 'http://api.spiraldb.dev' }}) so the env doesn’t get set to an empty string.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added secrets.SPIRAL_WORKLOAD_ID != '' to the job's if: condition so it skips gracefully when the secret is not configured. Also added || fallback defaults for SPIRAL_PROJECT_ID and SPIRAL_SERVER_URL so empty variables resolve to the dev-environment defaults (test-orcapod-362211 and http://api.spiraldb.dev) rather than empty strings.

…missing SPIRAL_WORKLOAD_ID, scrub stale workload IDs from docs

- `iter_batches`: per-batch cast of `pa.string()→pa.large_string()` and
  `pa.binary()→pa.large_binary()` so `ConnectorArrowDatabase.get_all_records()`
  and `get_records_by_ids()` can call `pa.concat_tables([committed, pending])`
  without a schema-mismatch error. Matches the normalization already applied
  in `get_column_info`.

- CI (`run-tests.yml`): add `secrets.SPIRAL_WORKLOAD_ID != ''` guard to the
  `spiral-integration` job `if:` condition so the job skips gracefully when the
  secret is not configured. Also add `||` fallback defaults for
  `SPIRAL_PROJECT_ID`/`SPIRAL_SERVER_URL` so an empty variable resolves to the
  dev-environment defaults rather than an empty string.

- Integration-test docstring: replace specific workload/policy IDs
  (`work_fbjjcy`, `work_policy_p9cxfh`) with a generic description to avoid
  stale references as CI infrastructure evolves.

- Tests: update `test_full_scan_yields_batches` to assert `large_string` output
  (not raw `string`); add `test_string_and_binary_normalized_to_large` to
  explicitly cover the per-batch cast path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kurodo3
Copy link
Copy Markdown
Contributor Author

kurodo3 Bot commented Mar 25, 2026

Round 3 review comments addressed

Three issues fixed in commit ae7e5d9:

1. iter_batches string/binary type normalization

iter_batches now casts each batch before yielding — pa.string()→pa.large_string() and pa.binary()→pa.large_binary() — matching the normalization already applied in get_column_info. This prevents the pa.concat_tables schema-mismatch crash that ConnectorArrowDatabase.get_all_records() / get_records_by_ids() would hit when mixing committed (Spiral-native string) and pending (large_string) records. Only batches that actually contain these types incur any overhead (needs_cast guard). New unit test test_string_and_binary_normalized_to_large covers this; test_full_scan_yields_batches updated accordingly.

2. CI guard for missing SPIRAL_WORKLOAD_ID

Added secrets.SPIRAL_WORKLOAD_ID != '' to the spiral-integration job's if: condition — the job now skips gracefully when the secret isn't configured rather than running and hard-failing. Also added || fallback defaults for SPIRAL_PROJECT_ID and SPIRAL_SERVER_URL so empty vars resolve to the dev-environment defaults.

3. Stale workload/policy IDs removed from integration test docstring

Replaced the specific IDs (work_fbjjcy, work_policy_p9cxfh) and repo-binding detail with a generic description. The concrete IDs now live only in repo secrets/vars and internal runbook docs, not in version-controlled source.

@eywalker eywalker merged commit 345acb4 into dev Mar 25, 2026
4 of 5 checks passed
@eywalker eywalker deleted the feat/plt-1074-spiraldb-connector branch March 25, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants