Conversation
Spec covers ENG-258: adding storage_options + UPath support to DeltaTableDatabase, test strategy using testcontainers/MinIO, and a new GitHub Actions CI workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Plan for ENG-258: covers TDD task breakdown, exact file paths, complete code snippets, and CI workflow setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rts, test coverage) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge in S3 fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add storage_options parameter and parse_base_path/is_cloud_uri integration - Replace _get_table_path with _get_table_uri (returns str, works for local and cloud) - Pass storage_options to DeltaTable() and write_deltalake() calls - Skip mkdir for cloud paths in flush_batch - Add list_sources() method (raises NotImplementedError for cloud paths) - Update S3 test to expect NotImplementedError instead of AttributeError Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
860ac75 to
57a42b5
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Colons appear in semantic-version hash paths (e.g. 'semantic_v0.1:abc123') used throughout the pipeline. The colon check in _validate_record_path was overly strict since _sanitize_path_component already handles colons on Windows by replacing them with '!'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… step bitnami/minio is no longer reliably available. Use the official minio/minio image started via 'docker run' with 'server /data' — this approach does not require service container syntax and gives full control over the start command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pygraphviz>=1.14 requires graphviz C headers (graphviz/cgraph.h) which are not present on ubuntu-latest by default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| def __init__( | ||
| self, | ||
| base_path: str | Path, | ||
| base_path: "str | Path | UPath", |
There was a problem hiding this comment.
remove "" around type hint -- make sure to use from __future__ import annotations so there's no need to put quotes
There was a problem hiding this comment.
Fixed — added from __future__ import annotations and removed all quoted annotations throughout the file.
| # For cloud paths: create_base_path is silently ignored (no directory needed). | ||
|
|
||
| # Batch management | ||
| self._delta_table_cache: dict[str, "deltalake.DeltaTable"] = {} |
| ) | ||
|
|
||
| def _get_delta_table(self, record_path: tuple[str, ...]) -> DeltaTable | None: | ||
| def _get_delta_table(self, record_path: tuple[str, ...]) -> "deltalake.DeltaTable | None": |
|
|
||
| def flush(self) -> None: | ||
| """Flush all pending batches.""" | ||
| # TODO: capture and re-raise exceptions at the end |
There was a problem hiding this comment.
remove this TODO if already addressed
There was a problem hiding this comment.
TODO removed — implemented collect-and-re-raise in flush(): all batches are attempted, errors are collected, and a RuntimeError is raised at the end listing the failed keys.
| record_path = tuple(record_key.split("/")) | ||
| try: | ||
| self.flush_batch(record_path) | ||
| except Exception as e: |
There was a problem hiding this comment.
this suppresses exception too broadly -- make it more specific to the expected case (i.e. failure to flush batch).
There was a problem hiding this comment.
Fixed together with the TODO above: flush() now collects exceptions across all batches and re-raises at the end, so nothing is silently suppressed.
|
|
||
| table_path = self._get_table_path(record_path) | ||
| table_path.mkdir(parents=True, exist_ok=True) | ||
| table_uri = self._get_table_uri(record_path) |
There was a problem hiding this comment.
consider incorporating path creation as part of the _get_table_uri so we can minimize the number of times self._is_cloud has to be explicitly checked.
There was a problem hiding this comment.
Done — added a create_dir: bool = False parameter to _get_table_uri that handles the mkdir internally for local paths (no-op for cloud). flush_batch now calls _get_table_uri(record_path, create_dir=True) and no longer checks self._is_cloud directly for this.
| return scheme in _CLOUD_SCHEMES | ||
|
|
||
|
|
||
| def _extract_upath_options(upath: "UPath") -> dict[str, str]: |
| is_upath = False | ||
|
|
||
| if is_upath: | ||
| uri = str(base_path) |
There was a problem hiding this comment.
extract the common component of uri = str(base_path) and then simplify to derived = _extract_upath... if is_upath else {}
There was a problem hiding this comment.
Fixed — extracted the common uri = str(base_path) and collapsed the branch to derived = _extract_upath_options(base_path) if is_upath else {}.
…sh collect+reraise, _get_table_uri create_dir Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Closes ENG-258.
storage_options: dict[str, str] | Noneparameter toDeltaTableDatabase, mirroring thedeltalakeAPIUPathasbase_path; credentials embedded in the UPath are auto-translated from fsspec (key/secret/endpoint_url) todeltalake'sAWS_*format via the newstorage_utils.pyhelpermkdir,Path.exists) onnot self._is_cloud_get_table_path→_get_table_urireturningstrfor both local and cloud pathslist_sources()method (local-only; raisesNotImplementedErrorfor cloud paths)test_storage_utils, local CRUD tests, and MinIO/S3 integration testsbitnami/minioservice container — no external S3 server neededTest plan
uv run pytest tests/test_databases/test_storage_utils.py— passes locallyuv run pytest tests/test_databases/test_delta_table_database.py— passes locallyuv run pytest tests/test_databases/test_delta_table_database_s3.py— passes with Docker (or in CI)🤖 Generated with Claude Code