Skip to content

feat: add ListSQLSchemas to support lazy schema fetching in datasource panel#8824

Merged
mscolnick merged 13 commits intomarimo-team:mainfrom
mauro-cerzosimo:mcerz/panel-load-schemas
Mar 26, 2026
Merged

feat: add ListSQLSchemas to support lazy schema fetching in datasource panel#8824
mscolnick merged 13 commits intomarimo-team:mainfrom
mauro-cerzosimo:mcerz/panel-load-schemas

Conversation

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor

@mauro-cerzosimo mauro-cerzosimo commented Mar 23, 2026

📝 Summary

Add dedicated ListSQLSchemas functionality to decouple schema fetching from database fetching in the datasource panel. This is the first step toward a lazy-loading approach where databases are listed first, and schemas are fetched only when a user clicks on a specific database — improving performance for engines like Snowflake that span many databases.

Closes #

🔍 Description of Changes

Problem

When working with Snowflake (and similar multi-database engines) without specifying a database in the SQLAlchemy connection, the datasource panel currently fetches all schemas and groups them under a single unnamed database. Ideally, the panel should display all available databases, but implementing that today (e.g., by modifying sqlalchemy.py) would require looping over every database to collect their schemas upfront, leading to significant performance degradation.

Approach

This PR introduces a dedicated schema listing step, decoupled from the existing database+schema fetch. This lays the groundwork for a future PR where the panel will:

  1. First fetch and display all databases.
  2. Only when a user expands/clicks a specific database, fetch its schemas on demand.

By having this lazy-loading mechanism in place first, we avoid the performance hit of eagerly iterating over all databases. The same pattern can later be applied to other engines (Databricks, Trino, etc.).

Changes

  • New command/request/notification: Introduced ListSQLSchemasCommand, ListSQLSchemasRequest, and SQLSchemaListPreviewNotification for handling schema list retrieval and broadcasting.
  • Runtime updates: Enhanced the runtime to handle schema list requests and dispatch corresponding notifications.
  • Connection utilities: Updated connection utilities to support schema list updates in the connection state.
  • Engine implementations: Added a get_schemas method across multiple SQL engine backends. Currently, only SQLAlchemy and Ibis has a real implementation; for the other engines (Adbc, Clickhouse, DuckDB, PyIceberg, Redshift) I added a placeholder that returns []. Real implementations for those can be added in follow-up PRs as needed.
  • API: Added a new API endpoint for previewing SQL schema lists.
  • OpenAPI / generated schemas: Updated OpenAPI specifications and regenerated schemas accordingly.
  • Tests: Added tests for schema list functionality, including performance benchmarks and validation of schema updates in connections.

Note: Because this change introduces a new cross-cutting capability, it touches a number of files. Apologies for the PR size — I tried to keep things as focused as possible while ensuring all engines and layers are consistently updated.

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Tests have been added for the changes made.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Pull request title is a good summary of the changes - it will be used in the release notes.

- Introduced ListSQLSchemasCommand to retrieve schemas from SQL databases.
- Implemented ListSQLSchemasRequest for handling schema list requests.
- Added SQLSchemaListPreviewNotification for broadcasting schema list results.
- Enhanced runtime to handle schema list requests and notifications.
- Updated connection utilities to support schema list updates in connections.
- Implemented get_schemas method in various SQL engines (Adbc, Clickhouse, DuckDB, Ibis, PyIceberg, Redshift, SQLAlchemy) to return schemas.
- Added API endpoint for previewing SQL schema lists.
- Updated OpenAPI specifications and generated schemas accordingly.
- Added tests for schema list functionality, including performance benchmarks and validation of schema updates in connections.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 26, 2026 0:14am

Request Review

@github-actions github-actions bot added the bash-focus Area to focus on during release bug bash label Mar 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 23, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

recheck

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated schema-listing request/notification flow (ListSQLSchemas) so the datasource panel can fetch schemas lazily per-database (instead of coupling schema discovery to the initial database fetch), improving scalability for multi-database engines like Snowflake.

Changes:

  • Introduces ListSQLSchemasCommand + SQLSchemaListPreviewNotification and wires them through runtime, server API, OpenAPI, and frontend request registries/websocket handling.
  • Adds connection-state update plumbing for replacing a database’s schema list (update_schema_list_in_connection) and consumes it in session view + frontend reducer.
  • Updates engine catalog interfaces to include get_schemas and implements it for SQLAlchemy/Ibis (placeholders for other engines), with accompanying tests.

Reviewed changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/_sql/test_sqlalchemy.py Updates schema-discovery test to call public get_schemas.
tests/_sql/test_ibis.py Updates schema-discovery test to call public get_schemas.
tests/_sql/test_connection_utils.py Adds tests/bench for update_schema_list_in_connection.
tests/_session/state/test_session_view.py Adds coverage for schema-list preview notifications updating connection state.
tests/_server/api/endpoints/test_datasources.py Tests new /preview_sql_schema_list endpoint + auth behavior.
tests/_runtime/test_runtime_datasets.py Adds runtime callback tests for schema-list preview outcomes/errors.
tests/_internal/snapshots/internal_api.txt Updates internal API snapshot for new command/notification.
scripts/generate_schemas.py Ensures schema generator includes new notification type.
packages/openapi/src/api.ts Adds new endpoint/types for schema-list preview + optional SQLMetadata.schema.
packages/openapi/api.yaml Updates OpenAPI spec for new request/notification and endpoint.
marimo/_sql/engines/types.py Adds EngineCatalog.get_schemas abstract method.
marimo/_sql/engines/sqlalchemy.py Renames _get_schemasget_schemas and updates call sites.
marimo/_sql/engines/redshift.py Aligns get_schemas signature with interface and updates internal calls.
marimo/_sql/engines/pyiceberg.py Adds placeholder get_schemas implementation.
marimo/_sql/engines/ibis.py Renames _get_schemasget_schemas and updates call sites.
marimo/_sql/engines/duckdb.py Adds placeholder get_schemas implementation.
marimo/_sql/engines/clickhouse.py Adds placeholder get_schemas implementations.
marimo/_sql/engines/adbc.py Adds placeholder get_schemas implementation.
marimo/_sql/connection_utils.py Adds update_schema_list_in_connection helper.
marimo/_session/state/session_view.py Applies schema-list preview notifications to connection state.
marimo/_server/models/models.py Adds ListSQLSchemasRequest model for API dispatch.
marimo/_server/api/endpoints/datasources.py Adds /preview_sql_schema_list endpoint.
marimo/_schemas/generated/notifications.yaml Updates generated notification schemas for optional SQLMetadata.schema + new notification.
marimo/_runtime/runtime.py Registers and implements preview_sql_schema_list callback.
marimo/_runtime/commands.py Adds ListSQLSchemasCommand and includes it in the command union.
marimo/_messaging/notification.py Makes SQLMetadata.schema optional and adds SQLSchemaListPreviewNotification.
marimo/_internal/notifications.py Exports new notification from internal notification module.
marimo/_internal/commands.py Exports new command from internal commands module.
marimo/_cli/development/commands.py Includes new command/notification in generated server API schema.
frontend/src/core/websocket/useMarimoKernelConnection.tsx Resolves sql-schema-list-preview deferred requests.
frontend/src/core/wasm/bridge.ts Adds Pyodide control request for schema-list preview.
frontend/src/core/network/types.ts Adds ListSQLSchemasRequest typing + request interface method.
frontend/src/core/network/requests-toasting.tsx Adds toast message for schema-list preview failures.
frontend/src/core/network/requests-static.ts Adds static stub for previewSQLSchemaList.
frontend/src/core/network/requests-network.ts Adds network request for /preview_sql_schema_list.
frontend/src/core/network/requests-lazy.ts Adds lazy action mapping for previewSQLSchemaList.
frontend/src/core/kernel/messages.ts Adds SQLSchemaListPreview notification message type.
frontend/src/core/islands/main.ts Allows island message routing for schema-list preview notifications.
frontend/src/core/islands/bridge.ts Adds bridge stub for previewSQLSchemaList.
frontend/src/core/datasets/request-registry.ts Adds PreviewSQLSchemaList deferred request registry.
frontend/src/core/datasets/data-source-connections.ts Adds reducer action to inject schema lists into state.
frontend/src/core/datasets/tests/data-source.test.ts Adds reducer tests for addSchemaList.
frontend/src/components/datasources/datasources.tsx Lazily requests schemas when expanding a database with no schemas loaded.
frontend/src/mocks/requests.ts Mocks previewSQLSchemaList for frontend tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

recheck

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 44 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Custom loading state, we need to wait for the data to propagate once requested
// useAsyncData's loading state may return false before data has propagated
const [schemasLoading, setSchemasLoading] = React.useState(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PreviewSQLSchemaList.request() throws or addSchemaList() throws, setSchemasLoading(false) is never called. Since the UI checks isPending || schemasLoading, the loading spinner persists forever instead of showing the error state.

setSchemasLoading(true);
try {
  const previewSchemaList = await PreviewSQLSchemaList.request({...});

  if (!previewSchemaList?.schemas) {
    throw new Error("No schemas available");
  }

  addSchemaList({...});
} finally {
  setSchemasLoading(false);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to derive this state from !data or isFetching?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: handle error when no schemas are available in SchemaList component
This change ensures setSchemasLoading(false) is always called, even if an error is thrown, so the UI never gets stuck in a loading state and can show errors.

Why I didnt derive loading from data/isFetching?
Manual loading state is needed here because async data may not be reflected in the UI immediately, and we want to show the spinner until schemas are actually added.


if (!previewSchemaList?.schemas) {
setSchemasLoading(false);
throw new Error("No schemas available");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about databases that truly have no schema? should we be throwing an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: handle case when no schemas are available in SchemaList component
Yes, that's right—I deleted the error throw so that databases with no schemas are handled gracefully. Now, the UI just shows the "No schemas available" state instead of an error. This supports schemaless and empty databases as valid cases.


const newConn = updatedState.connectionsMap.get("conn1" as ConnectionName);
const newDb1 = newConn?.databases.find((db) => db.name === "db1");
expect(newDb1?.schemas).toEqual(newSchemaList);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test looks like it tests the same thing twice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two expect assertions are structurally identical — they both check that db1.schemas equals the schema list that was just passed to addSchemaList. So they're testing the same shape of behavior.

However, they're not exactly the same test. The intent is:

  1. First call: Set schemas to ["public", "analytics"] → verify they're stored.
  2. Second call: Overwrite schemas with ["public", "sales"] → verify the update replaced the old list.

The subtle thing being tested is that addSchemaList replaces (not appends to) the existing schema list. If addSchemaList had a bug where it appended instead of replacing, the second assertion would fail because the schemas would be ["public", "analytics", "public", "sales"].

connection: str
database: str
schema: str
schema: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be optional now?

Copy link
Copy Markdown
Contributor Author

@mauro-cerzosimo mauro-cerzosimo Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feat: introduce SQLDatabaseMetadata for improved database metadata ha…

fix: update SQLMetadata schema attribute to be required

Right at the start I thought about reusing SqlMetadata with schema = None, but that felt off — it would push nullability checks to every consumer. A better solution is to create a dedicated SQLDatabaseMetadata class that owns that complexity.

@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

Hi @mscolnick, thank you for the review! I've pushed a new commit addressing all your comments.

Light2Dark
Light2Dark previously approved these changes Mar 24, 2026
include_table_details: bool,
) -> list[Schema]:
"""Get schemas from the engine."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Redshift doesn't use the concept of databases, just catalogs
catalog = database

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: update RedshiftEngine to use catalogs instead of databases for s…

The last commit updates RedshiftEngine to use catalogs (not databases) for schema retrieval, fixing how schemas are fetched.

const { addSchemaList } = useDataSourceActions();
const [schemasRequested, setSchemasRequested] = React.useState(false);

// Custom loading state, we need to wait for the data to propagate once requested
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can take a pass to remove custom useStates in follow-ups. I see the same pattern used for addTableList

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, makes sense to consolidate those custom loading states. Happy to help with that in a follow-up if needed.

@Light2Dark Light2Dark added the enhancement New feature or request label Mar 24, 2026
@mauro-cerzosimo
Copy link
Copy Markdown
Contributor Author

Hi @Light2Dark , thank you for the review! I've pushed a new commit addressing your comment. 🙇

@Light2Dark Light2Dark requested a review from mscolnick March 26, 2026 04:53
@mscolnick mscolnick merged commit 89192e7 into marimo-team:main Mar 26, 2026
36 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants