Skip to content

Enhance SQLAlchemy engine with safe_execute and inspector methods for SnowFlake#8920

Merged
mscolnick merged 7 commits intomarimo-team:mainfrom
mauro-cerzosimo:mcerz/enhance-sqlalchemy-snowflake
Mar 31, 2026
Merged

Enhance SQLAlchemy engine with safe_execute and inspector methods for SnowFlake#8920
mscolnick merged 7 commits intomarimo-team:mainfrom
mauro-cerzosimo:mcerz/enhance-sqlalchemy-snowflake

Conversation

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor

📝 Summary

This pull request adds multi-database engine support, starting with Snowflake, and improves type safety and exception handling in the SQLAlchemy engine integration.

Continues the work from #8824.
Closes

🔍 Description of Changes

Multi-Database Support

  • Introduced _get_inspector context manager that yields an appropriate SQLAlchemy Inspector for a given database. For dialects that require a USE DATABASE command (e.g., Snowflake), it opens a connection, executes the command, and yields an inspector bound to that connection. For all other dialects, it falls back to the existing self.inspector.
  • Added _get_snowflake_database_names to fetch all available databases when the target database is not specified or not found.
  • Refactored existing methods (_get_schema_names, _get_table_names, _get_columns, _fetch_primary_keys, _fetch_indexes) to accept a database parameter and use _get_inspector, enabling per-database inspection across all operations.

Type Safety & Code Quality

  • Corrected type annotations for inspector-related methods, using SQLAlchemy's official ReflectedColumn and ReflectedIndex types to resolve mypy errors.
  • Improved exception handling in _get_snowflake_database_names by chaining exceptions with raise ... from err (B904 compliance).

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Tests have been added for the changes made.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Pull request title is a good summary of the changes — it will be used in the release notes.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 31, 2026 2:50pm

Request Review

Copy link
Copy Markdown
Collaborator

@Light2Dark Light2Dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thanks. Couple of comments

from sqlalchemy import inspect, text

_use_database_dialect_command: dict[str, str] = {
"snowflake": f"""USE DATABASE {f'"{database}"' if re.search(r"[^A-Za-z0-9_]", database) else database}""",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[^A-Za-z0-9_] looks like this is a common constant, let's extract it out and put at the top

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +69 to +80
def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
try:
return func(*args, **kwargs)
except silent_exceptions:
return fallback
except Exception:
getattr(LOGGER, log_level)(message, exc_info=True)
return fallback

return wrapper
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this strongly typed, maybe with Generics.

Else, we are overriding the function return types. I get this typecheck error

Function decorator obscures type of function because its type is Any basedpyright[reportAny]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds multi-database catalog discovery to the SQLAlchemy engine (starting with Snowflake) and introduces a safe_execute helper to standardize exception handling/logging during introspection.

Changes:

  • Added safe_execute decorator and applied it across schema/table/detail discovery helpers.
  • Implemented per-database inspection via _get_inspector, including Snowflake USE DATABASE switching and SHOW DATABASES discovery.
  • Expanded unit tests to cover safe_execute, inspector helpers, and Snowflake database listing behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
marimo/_sql/engines/sqlalchemy.py Adds safe_execute, Snowflake database discovery, and per-database inspector context manager used across introspection.
tests/_sql/test_sqlalchemy.py Adds tests for safe_execute, _get_inspector, per-database schema/table/detail helpers, and Snowflake database-name discovery.



# ------------------------
# Decoracotors
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in section header: "Decoracotors" should be "Decorators".

Suggested change
# Decoracotors
# Decorators

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +306 to +311
database_names = [
str(row[name_col_index])
if re.search(r"[^A-Za-z0-9_]", str(row[name_col_index]))
else str(row[name_col_index]).lower()
for row in result.fetchall()
]
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normalization lowercases any database name that doesn’t contain “special” characters, but Snowflake quoted identifiers can be mixed-case (e.g. MyDb) without special characters. Lowercasing those will change the identifier and can make subsequent USE DATABASE / inspection fail. Consider preserving the exact name returned by SHOW DATABASES and only applying case-insensitive matching for display/default selection, or detect mixed-case and avoid lowercasing/ensure quoting.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I think the current behavior is actually correct for Snowflake's semantics.

Snowflake stores unquoted identifiers internally as UPPERCASE. So when SHOW DATABASES returns DB_A, it means the database was created with an unquoted identifier (e.g., CREATE DATABASE db_a or CREATE DATABASE DB_A) — Snowflake normalized it to DB_A. Lowercasing it to db_a is safe because Snowflake resolves unquoted identifiers case-insensitively, so USE DATABASE db_a and USE DATABASE DB_A are equivalent.

Comment on lines +130 to +135
import re

from sqlalchemy import inspect, text

_use_database_dialect_command: dict[str, str] = {
"snowflake": f"""USE DATABASE {f'"{database}"' if re.search(r"[^A-Za-z0-9_]", database) else database}""",
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snowflake identifier quoting here is not robust: it doesn’t escape embedded double-quotes and it also treats mixed-case identifiers (e.g. a quoted Snowflake database like "MyDb") as safe to use unquoted, which can resolve to a different database (Snowflake uppercases unquoted identifiers). Prefer using SQLAlchemy’s identifier preparer (or a dedicated quoting helper) to quote/escape identifiers correctly rather than interpolating into the SQL string.

Suggested change
import re
from sqlalchemy import inspect, text
_use_database_dialect_command: dict[str, str] = {
"snowflake": f"""USE DATABASE {f'"{database}"' if re.search(r"[^A-Za-z0-9_]", database) else database}""",
from sqlalchemy import inspect, text
from sqlalchemy.sql.compiler import IdentifierPreparer
# Robustly quote the database identifier for Snowflake, preserving case
# and correctly escaping embedded double-quotes.
if self.dialect == "snowflake":
# Prefer SQLAlchemy's identifier preparer when available.
try:
preparer: IdentifierPreparer = self._connection.dialect.identifier_preparer # type: ignore[attr-defined]
quoted_database = preparer.quote(database)
except Exception:
# Fallback: minimal, but correct, SQL identifier quoting.
escaped_database = database.replace('"', '""')
quoted_database = f'"{escaped_database}"'
else:
quoted_database = database
_use_database_dialect_command: dict[str, str] = {
"snowflake": f"USE DATABASE {quoted_database}",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as befpre

Comment on lines +415 to +417
schema_names = self._get_schema_names(
database if database is not None else ""
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_schemas(database=None) (used by get_default_database() for connection validation) currently calls _get_schema_names(""), which for Snowflake will try to execute USE DATABASE with an empty identifier via _get_inspector and fail. This can cause Snowflake connections without a URL database to be treated as invalid. Consider letting _get_inspector accept Optional[str] and short-circuit to self.inspector (or skip USE DATABASE) when database is falsy, or adjust get_default_database()’s validity check for Snowflake (e.g., SHOW DATABASES).

Suggested change
schema_names = self._get_schema_names(
database if database is not None else ""
)
if database is None:
# When no database is specified, avoid issuing a "USE DATABASE"
# statement with an empty identifier (e.g., on Snowflake).
# In this context (e.g., connection validation), we treat the
# absence of an explicit database as having no enumerated schemas.
schema_names: list[str] = []
else:
schema_names = self._get_schema_names(database)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_lower = self.default_database.lower()
for db in database_names:
if db.lower() == default_lower:
return [str(db).lower()]
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the default database is present, this returns [str(db).lower()] even if the matched database name was preserved as case-sensitive (e.g., quoted/mixed-case or containing special characters). That lowercasing can make subsequent USE DATABASE fail for case-sensitive identifiers and is inconsistent with the earlier normalization rules. Return the matched db value as-is (or apply the same normalization logic used when building database_names).

Suggested change
return [str(db).lower()]
return [db]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

@Light2Dark I’ve refactored and improved the code in the latest commit, focusing on identifier quoting, normalization, and handling edge cases for database/schema retrieval in SQLAlchemyEngine. All recent changes are now ready for a new review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

except silent_exceptions:
return fallback
except Exception:
getattr(LOGGER, log_level)(message, exc_info=True)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_execute can raise unexpectedly if log_level is not a valid logger attribute: getattr(LOGGER, log_level)(...) will raise AttributeError, which bypasses the intended fallback behavior. Consider validating log_level (e.g., map allowed strings to bound methods, or default to LOGGER.warning) so the decorator remains safe even with unexpected input.

Suggested change
getattr(LOGGER, log_level)(message, exc_info=True)
log_method = getattr(LOGGER, log_level, LOGGER.warning)
log_method(message, exc_info=True)

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +138
def _quote_identifier(self, identifier: str) -> str:
"""Quote an identifier based on the SQL dialect's quoting rules."""
dialect_quoting: dict[str, tuple[re.Pattern[str], str, str]] = {
"snowflake": (_SNOWFLAKE_NEEDS_QUOTING_RE, '"', '"'),
}

if self.dialect not in dialect_quoting:
return identifier

pattern, open_quote, close_quote = dialect_quoting[self.dialect]
if pattern.search(identifier) or identifier != identifier.lower():
escaped = identifier.replace(
close_quote, close_quote + close_quote
)
return f"{open_quote}{escaped}{close_quote}"
return identifier
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snowflake quoting logic is inconsistent with _get_snowflake_database_names(): that method preserves case-sensitive identifiers as returned by SHOW DATABASES, including identifiers that may be all-lowercase (created via quoted identifiers). For such names, _quote_identifier currently returns the identifier unquoted (because it only quotes when identifier != identifier.lower()), which can make USE DATABASE <name> resolve to a different (uppercased) identifier or fail. Consider adjusting the normalization/quoting strategy so any case-sensitive Snowflake database name (including all-lowercase) is quoted when issuing USE DATABASE (e.g., keep raw SHOW names for execution, or track whether a name should be quoted).

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +367
def _get_database_names(self) -> list[str]:
"""Get database names using dialect-specific queries.

Returns a single-element list with the default database when
the dialect has no dedicated discovery mechanism.
"""
if self.dialect.lower() == "snowflake":
return self._get_snowflake_database_names()

return [self.default_database] if self.default_database else []
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_database_names() treats self.default_database == "" as absent because it uses a truthiness check. get_default_database() explicitly uses empty-string to represent a detached-but-valid connection, so this change can cause get_databases() to return an empty list for valid connections (e.g., sqlite:///). Consider checking is not None instead of truthiness so the empty-string sentinel still yields a single database entry.

Copilot uses AI. Check for mistakes.
@mscolnick mscolnick merged commit 4635204 into marimo-team:main Mar 31, 2026
44 checks passed
@github-actions
Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.21.2-dev100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants