Skip to content

Flesh out DuckDBTransformer as a full execution backend #151

@amc-corey-cox

Description

@amc-corey-cox

Context

DuckDBTransformer currently has:

  • map_database() — works, but delegates everything to SQLCompiler which supports a limited subset
  • map_object() — raises NotImplementedError

As the SQLCompiler grows (see #150), DuckDBTransformer should become a viable alternative
backend for executing transformations — particularly for large tabular datasets where set-based
SQL operations would be significantly faster than Python row-by-row interpretation.

Proposed work

  1. Wire up expanded SQLCompiler output — as Expand SQLCompiler to cover more of the TransformationSpecification #150 adds features, ensure map_database() exercises them correctly
  2. Consider map_object() semantics — does it make sense to support single-object transforms via DuckDB, or should this backend only operate at the database/table level? If database-only, make that explicit in the API rather than leaving a NotImplementedError.
  3. CLI integration — evaluate whether map-data should support a --backend duckdb flag for cases where input is a DuckDB database or large tabular dataset. This is a future consideration, not immediate work.
  4. Backend parity tracking — document which TransformationSpecification features are supported by which backend (Python vs SQL), possibly as a matrix in the docs or compliance suite.

Design notes

The SQL backend is fundamentally set-based (operates on tables) while the Python backend is
row-based (operates on individual objects). These are complementary rather than competing — the
right backend depends on the data source and scale. The TransformationSpecification should remain
the single source of truth regardless of backend.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions