Skip to content

refactor(pkg-py): SQLAlchemySource.get_schema() to use ColumnMeta pattern#203

Merged
cpsievert merged 3 commits intomainfrom
refactor/sqlalchemy-get-schema
Jan 26, 2026
Merged

refactor(pkg-py): SQLAlchemySource.get_schema() to use ColumnMeta pattern#203
cpsievert merged 3 commits intomainfrom
refactor/sqlalchemy-get-schema

Conversation

@cpsievert
Copy link
Copy Markdown
Contributor

@cpsievert cpsievert commented Jan 26, 2026

Refactors SQLAlchemySource.get_schema() to use the same ColumnMeta + format_schema() pattern already used by IbisSource, DataFrameSource, and PolarsLazySource. (#193, #191)

Changes

  • Store column info at initialization: Added self._columns_info to cache inspector results
  • Add _make_column_meta() static method: Classifies SQLAlchemy types into ColumnMeta (kind + sql_type)
  • Add _add_column_stats() method: Collects statistics (min/max, distinct counts) via SQL
  • Add _fetch_categorical_values() helper: Fetches unique values for categorical columns
  • Simplify get_schema(): Now just 5 lines using the helper methods and format_schema()
  • Remove _get_sql_type_name(): Type classification now handled in _make_column_meta()
  • Fix format_schema() to handle None values: Only display Range when min/max are not None

Benefits

  • Consistency: All DataSource implementations now follow the same pattern
  • Maintainability: Schema generation logic is broken into focused, reusable methods
  • Preparation: Sets the foundation for future adapter-based abstractions (e.g., Snowflake semantic views)
  • Test coverage: All existing tests pass, verifying identical output to previous implementation

Testing

  • ✅ All Python tests pass (247 passed, 21 skipped)
  • ✅ Ruff checks pass
  • ✅ Verified identical schema output for various column types (INTEGER, FLOAT, NUMERIC, TEXT, DATE, TIMESTAMP, BOOLEAN)
  • ✅ Verified categorical value handling
  • ✅ Verified empty table handling (no "Range: None to None")

🤖 Generated with Claude Code

Refactor SQLAlchemySource.get_schema() to match the structure used by
IbisSource, DataFrameSource, and PolarsLazySource:

- Add _make_column_meta() for SQLAlchemy type classification
- Add _add_column_stats() for statistics collection via SQL
- Use shared format_schema() for output formatting
- Remove redundant _get_sql_type_name() method

This creates consistency across all DataSource implementations and
prepares for future adapter-based abstractions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This comment was marked as resolved.

Comment thread reprex.py Outdated
Comment thread pkg-r/tests/testthat/_problems/test-SnowflakeSource-107.R Outdated
Comment thread pkg-r/tests/testthat/_problems/test-SnowflakeSource-91.R Outdated
Remove reprex.py and test stub files that were accidentally included.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cpsievert cpsievert changed the title refactor: SQLAlchemySource.get_schema() to use ColumnMeta pattern refactor(pkg-py): SQLAlchemySource.get_schema() to use ColumnMeta pattern Jan 26, 2026
@cpsievert cpsievert changed the title refactor(pkg-py): SQLAlchemySource.get_schema() to use ColumnMeta pattern refactor(pkg-py): SQLAlchemySource.get_schema() to use ColumnMeta pattern Jan 26, 2026
@cpsievert cpsievert merged commit f3a4360 into main Jan 26, 2026
7 checks passed
@cpsievert cpsievert deleted the refactor/sqlalchemy-get-schema branch January 26, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants