You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
map_database() — works, but delegates everything to SQLCompiler which supports a limited subset
map_object() — raises NotImplementedError
As the SQLCompiler grows (see #150), DuckDBTransformer should become a viable alternative
backend for executing transformations — particularly for large tabular datasets where set-based
SQL operations would be significantly faster than Python row-by-row interpretation.
Consider map_object() semantics — does it make sense to support single-object transforms via DuckDB, or should this backend only operate at the database/table level? If database-only, make that explicit in the API rather than leaving a NotImplementedError.
CLI integration — evaluate whether map-data should support a --backend duckdb flag for cases where input is a DuckDB database or large tabular dataset. This is a future consideration, not immediate work.
Backend parity tracking — document which TransformationSpecification features are supported by which backend (Python vs SQL), possibly as a matrix in the docs or compliance suite.
Design notes
The SQL backend is fundamentally set-based (operates on tables) while the Python backend is
row-based (operates on individual objects). These are complementary rather than competing — the
right backend depends on the data source and scale. The TransformationSpecification should remain
the single source of truth regardless of backend.
Context
DuckDBTransformercurrently has:map_database()— works, but delegates everything toSQLCompilerwhich supports a limited subsetmap_object()— raisesNotImplementedErrorAs the
SQLCompilergrows (see #150),DuckDBTransformershould become a viable alternativebackend for executing transformations — particularly for large tabular datasets where set-based
SQL operations would be significantly faster than Python row-by-row interpretation.
Proposed work
map_database()exercises them correctlymap_object()semantics — does it make sense to support single-object transforms via DuckDB, or should this backend only operate at the database/table level? If database-only, make that explicit in the API rather than leaving aNotImplementedError.map-datashould support a--backend duckdbflag for cases where input is a DuckDB database or large tabular dataset. This is a future consideration, not immediate work.Design notes
The SQL backend is fundamentally set-based (operates on tables) while the Python backend is
row-based (operates on individual objects). These are complementary rather than competing — the
right backend depends on the data source and scale. The TransformationSpecification should remain
the single source of truth regardless of backend.
Related
src/linkml_map/transformer/duckdb_transformer.pysrc/linkml_map/compiler/sql_compiler.py