feat: add ListSQLSchemas to support lazy schema fetching in datasource panel#8824
Conversation
- Introduced ListSQLSchemasCommand to retrieve schemas from SQL databases. - Implemented ListSQLSchemasRequest for handling schema list requests. - Added SQLSchemaListPreviewNotification for broadcasting schema list results. - Enhanced runtime to handle schema list requests and notifications. - Updated connection utilities to support schema list updates in connections. - Implemented get_schemas method in various SQL engines (Adbc, Clickhouse, DuckDB, Ibis, PyIceberg, Redshift, SQLAlchemy) to return schemas. - Added API endpoint for previewing SQL schema lists. - Updated OpenAPI specifications and generated schemas accordingly. - Added tests for schema list functionality, including performance benchmarks and validation of schema updates in connections.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
There was a problem hiding this comment.
Pull request overview
Adds a dedicated schema-listing request/notification flow (ListSQLSchemas) so the datasource panel can fetch schemas lazily per-database (instead of coupling schema discovery to the initial database fetch), improving scalability for multi-database engines like Snowflake.
Changes:
- Introduces
ListSQLSchemasCommand+SQLSchemaListPreviewNotificationand wires them through runtime, server API, OpenAPI, and frontend request registries/websocket handling. - Adds connection-state update plumbing for replacing a database’s schema list (
update_schema_list_in_connection) and consumes it in session view + frontend reducer. - Updates engine catalog interfaces to include
get_schemasand implements it for SQLAlchemy/Ibis (placeholders for other engines), with accompanying tests.
Reviewed changes
Copilot reviewed 43 out of 44 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_sql/test_sqlalchemy.py | Updates schema-discovery test to call public get_schemas. |
| tests/_sql/test_ibis.py | Updates schema-discovery test to call public get_schemas. |
| tests/_sql/test_connection_utils.py | Adds tests/bench for update_schema_list_in_connection. |
| tests/_session/state/test_session_view.py | Adds coverage for schema-list preview notifications updating connection state. |
| tests/_server/api/endpoints/test_datasources.py | Tests new /preview_sql_schema_list endpoint + auth behavior. |
| tests/_runtime/test_runtime_datasets.py | Adds runtime callback tests for schema-list preview outcomes/errors. |
| tests/_internal/snapshots/internal_api.txt | Updates internal API snapshot for new command/notification. |
| scripts/generate_schemas.py | Ensures schema generator includes new notification type. |
| packages/openapi/src/api.ts | Adds new endpoint/types for schema-list preview + optional SQLMetadata.schema. |
| packages/openapi/api.yaml | Updates OpenAPI spec for new request/notification and endpoint. |
| marimo/_sql/engines/types.py | Adds EngineCatalog.get_schemas abstract method. |
| marimo/_sql/engines/sqlalchemy.py | Renames _get_schemas → get_schemas and updates call sites. |
| marimo/_sql/engines/redshift.py | Aligns get_schemas signature with interface and updates internal calls. |
| marimo/_sql/engines/pyiceberg.py | Adds placeholder get_schemas implementation. |
| marimo/_sql/engines/ibis.py | Renames _get_schemas → get_schemas and updates call sites. |
| marimo/_sql/engines/duckdb.py | Adds placeholder get_schemas implementation. |
| marimo/_sql/engines/clickhouse.py | Adds placeholder get_schemas implementations. |
| marimo/_sql/engines/adbc.py | Adds placeholder get_schemas implementation. |
| marimo/_sql/connection_utils.py | Adds update_schema_list_in_connection helper. |
| marimo/_session/state/session_view.py | Applies schema-list preview notifications to connection state. |
| marimo/_server/models/models.py | Adds ListSQLSchemasRequest model for API dispatch. |
| marimo/_server/api/endpoints/datasources.py | Adds /preview_sql_schema_list endpoint. |
| marimo/_schemas/generated/notifications.yaml | Updates generated notification schemas for optional SQLMetadata.schema + new notification. |
| marimo/_runtime/runtime.py | Registers and implements preview_sql_schema_list callback. |
| marimo/_runtime/commands.py | Adds ListSQLSchemasCommand and includes it in the command union. |
| marimo/_messaging/notification.py | Makes SQLMetadata.schema optional and adds SQLSchemaListPreviewNotification. |
| marimo/_internal/notifications.py | Exports new notification from internal notification module. |
| marimo/_internal/commands.py | Exports new command from internal commands module. |
| marimo/_cli/development/commands.py | Includes new command/notification in generated server API schema. |
| frontend/src/core/websocket/useMarimoKernelConnection.tsx | Resolves sql-schema-list-preview deferred requests. |
| frontend/src/core/wasm/bridge.ts | Adds Pyodide control request for schema-list preview. |
| frontend/src/core/network/types.ts | Adds ListSQLSchemasRequest typing + request interface method. |
| frontend/src/core/network/requests-toasting.tsx | Adds toast message for schema-list preview failures. |
| frontend/src/core/network/requests-static.ts | Adds static stub for previewSQLSchemaList. |
| frontend/src/core/network/requests-network.ts | Adds network request for /preview_sql_schema_list. |
| frontend/src/core/network/requests-lazy.ts | Adds lazy action mapping for previewSQLSchemaList. |
| frontend/src/core/kernel/messages.ts | Adds SQLSchemaListPreview notification message type. |
| frontend/src/core/islands/main.ts | Allows island message routing for schema-list preview notifications. |
| frontend/src/core/islands/bridge.ts | Adds bridge stub for previewSQLSchemaList. |
| frontend/src/core/datasets/request-registry.ts | Adds PreviewSQLSchemaList deferred request registry. |
| frontend/src/core/datasets/data-source-connections.ts | Adds reducer action to inject schema lists into state. |
| frontend/src/core/datasets/tests/data-source.test.ts | Adds reducer tests for addSchemaList. |
| frontend/src/components/datasources/datasources.tsx | Lazily requests schemas when expanding a database with no schemas loaded. |
| frontend/src/mocks/requests.ts | Mocks previewSQLSchemaList for frontend tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
recheck |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 43 out of 44 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Custom loading state, we need to wait for the data to propagate once requested | ||
| // useAsyncData's loading state may return false before data has propagated | ||
| const [schemasLoading, setSchemasLoading] = React.useState(false); |
There was a problem hiding this comment.
If PreviewSQLSchemaList.request() throws or addSchemaList() throws, setSchemasLoading(false) is never called. Since the UI checks isPending || schemasLoading, the loading spinner persists forever instead of showing the error state.
setSchemasLoading(true);
try {
const previewSchemaList = await PreviewSQLSchemaList.request({...});
if (!previewSchemaList?.schemas) {
throw new Error("No schemas available");
}
addSchemaList({...});
} finally {
setSchemasLoading(false);
}There was a problem hiding this comment.
Would it be better to derive this state from !data or isFetching?
There was a problem hiding this comment.
fix: handle error when no schemas are available in SchemaList component
This change ensures setSchemasLoading(false) is always called, even if an error is thrown, so the UI never gets stuck in a loading state and can show errors.
Why I didnt derive loading from data/isFetching?
Manual loading state is needed here because async data may not be reflected in the UI immediately, and we want to show the spinner until schemas are actually added.
|
|
||
| if (!previewSchemaList?.schemas) { | ||
| setSchemasLoading(false); | ||
| throw new Error("No schemas available"); |
There was a problem hiding this comment.
what about databases that truly have no schema? should we be throwing an error?
There was a problem hiding this comment.
fix: handle case when no schemas are available in SchemaList component
Yes, that's right—I deleted the error throw so that databases with no schemas are handled gracefully. Now, the UI just shows the "No schemas available" state instead of an error. This supports schemaless and empty databases as valid cases.
|
|
||
| const newConn = updatedState.connectionsMap.get("conn1" as ConnectionName); | ||
| const newDb1 = newConn?.databases.find((db) => db.name === "db1"); | ||
| expect(newDb1?.schemas).toEqual(newSchemaList); |
There was a problem hiding this comment.
this test looks like it tests the same thing twice?
There was a problem hiding this comment.
The two expect assertions are structurally identical — they both check that db1.schemas equals the schema list that was just passed to addSchemaList. So they're testing the same shape of behavior.
However, they're not exactly the same test. The intent is:
- First call: Set schemas to
["public", "analytics"]→ verify they're stored. - Second call: Overwrite schemas with
["public", "sales"]→ verify the update replaced the old list.
The subtle thing being tested is that addSchemaList replaces (not appends to) the existing schema list. If addSchemaList had a bug where it appended instead of replacing, the second assertion would fail because the schemas would be ["public", "analytics", "public", "sales"].
marimo/_messaging/notification.py
Outdated
| connection: str | ||
| database: str | ||
| schema: str | ||
| schema: Optional[str] = None |
There was a problem hiding this comment.
why does this need to be optional now?
There was a problem hiding this comment.
feat: introduce SQLDatabaseMetadata for improved database metadata ha…
fix: update SQLMetadata schema attribute to be required
Right at the start I thought about reusing SqlMetadata with schema = None, but that felt off — it would push nullability checks to every consumer. A better solution is to create a dedicated SQLDatabaseMetadata class that owns that complexity.
|
Hi @mscolnick, thank you for the review! I've pushed a new commit addressing all your comments. |
| include_table_details: bool, | ||
| ) -> list[Schema]: | ||
| """Get schemas from the engine.""" | ||
|
|
There was a problem hiding this comment.
| # Redshift doesn't use the concept of databases, just catalogs | |
| catalog = database |
There was a problem hiding this comment.
fix: update RedshiftEngine to use catalogs instead of databases for s…
The last commit updates RedshiftEngine to use catalogs (not databases) for schema retrieval, fixing how schemas are fetched.
| const { addSchemaList } = useDataSourceActions(); | ||
| const [schemasRequested, setSchemasRequested] = React.useState(false); | ||
|
|
||
| // Custom loading state, we need to wait for the data to propagate once requested |
There was a problem hiding this comment.
I think I can take a pass to remove custom useStates in follow-ups. I see the same pattern used for addTableList
There was a problem hiding this comment.
Sounds good, makes sense to consolidate those custom loading states. Happy to help with that in a follow-up if needed.
|
Hi @Light2Dark , thank you for the review! I've pushed a new commit addressing your comment. 🙇 |
📝 Summary
Add dedicated
ListSQLSchemasfunctionality to decouple schema fetching from database fetching in the datasource panel. This is the first step toward a lazy-loading approach where databases are listed first, and schemas are fetched only when a user clicks on a specific database — improving performance for engines like Snowflake that span many databases.Closes #
🔍 Description of Changes
Problem
When working with Snowflake (and similar multi-database engines) without specifying a database in the SQLAlchemy connection, the datasource panel currently fetches all schemas and groups them under a single unnamed database. Ideally, the panel should display all available databases, but implementing that today (e.g., by modifying
sqlalchemy.py) would require looping over every database to collect their schemas upfront, leading to significant performance degradation.Approach
This PR introduces a dedicated schema listing step, decoupled from the existing database+schema fetch. This lays the groundwork for a future PR where the panel will:
By having this lazy-loading mechanism in place first, we avoid the performance hit of eagerly iterating over all databases. The same pattern can later be applied to other engines (Databricks, Trino, etc.).
Changes
ListSQLSchemasCommand,ListSQLSchemasRequest, andSQLSchemaListPreviewNotificationfor handling schema list retrieval and broadcasting.get_schemasmethod across multiple SQL engine backends. Currently, only SQLAlchemy and Ibis has a real implementation; for the other engines (Adbc, Clickhouse, DuckDB, PyIceberg, Redshift) I added a placeholder that returns[]. Real implementations for those can be added in follow-up PRs as needed.📋 Checklist