[Speed] Optimization for MetaData.create_all() existence checks in multi-tenant/large schema environments #13295

Tschuppi81 · 2026-05-12T19:52:43Z

Tschuppi81
May 12, 2026

Describe the use case

Feature Request: Optimize `MetaData.create_all()` Existence Checks for Multi-Tenant Environments

The Use Case / Why

In multi-tenant applications using a "schema-per-tenant" strategy (common in PostgreSQL), MetaData.create_all() becomes a performance bottleneck. Even with checkfirst=True, SQLAlchemy performs a serial existence check for every individual table in the MetaData collection.

When a schema contains N tables, this results in N individual queries to pg_catalog or information_schema. In environments where these checks occur frequently across many schemas, the cumulative network round-trips create a massive N+1 overhead that is often flagged by performance monitoring tools (e.g., Sentry).

The Evidence

Using SQLAlchemy 2.x and PostgreSQL, a create_all() call for a schema with 10 tables results in the following serial "waterfall" of queries, even when the tables already exist, executing attached reproduce_metadata_n_plus_1.py script:

--- ATTEMPTING CONNECTION TO: postgresql://dev:devpassword@localhost/onegov ---
INFO:sqlalchemy.engine.Engine:select pg_catalog.version()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:select current_schema()
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:show standard_conforming_strings
INFO:sqlalchemy.engine.Engine:[raw sql] {}
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_1 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00005s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_2 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00004s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_3 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00003s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_4 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_5 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_6 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_7 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_8 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}
INFO:sqlalchemy.engine.Engine:DROP TABLE IF EXISTS test_table_9 CASCADE
INFO:sqlalchemy.engine.Engine:[generated in 0.00002s] {}

--- STARTING CREATE_ALL (checkfirst=False) ---
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_1 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00005s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_2 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00004s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_3 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00004s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_4 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00004s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_5 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00006s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_6 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00006s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_7 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00007s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_8 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00006s] {}
INFO:sqlalchemy.engine.Engine:
CREATE TABLE test_table_9 (
	id SERIAL NOT NULL, 
	data VARCHAR, 
	PRIMARY KEY (id)
)


INFO:sqlalchemy.engine.Engine:[no key 0.00006s] {}
INFO:sqlalchemy.engine.Engine:COMMIT

--- FINISHED SUCCESSFULLY ---

This confirms that create_all() utilizes a visitor pattern that does not currently offer a bulk reflection or a single "fetch all tables in schema" optimization to minimize round-trips.

The Proposed Optimization / Workaround

I have mitigated this by implementing a check — manually querying the existence of a single marker table via inspect(engine).has_table() to decide whether to skip the create_all() call entirely. This reduces initialization overhead from O(N) to O(1).

Please refer to reproduce_fix_n_plus_1.py script attached.

Attachments

reproduce_metadata_n_plus_1.py

reproduce_fix_n_plus_1.py

Request

Could SQLAlchemy provide a native way to optimize these checks?

An option for a bulk "pre-check" of all tables in the MetaData for a given schema in a single query.
A more efficient "check-all-or-nothing" mode for create_all() to avoid the per-table visitor pattern for checks.
A high-level metadata.exists(engine) helper.

Databases / Backends / Drivers targeted

postgresql

Example Use

refer to attached script reproduce_metadata_n_plus_1.py

Additional context

I work on a multi-tenant application called onegov-cloud where each schema represents a tenant: https://github.com/OneGov/onegov-cloud

CaselIT · 2026-05-12T20:07:37Z

CaselIT
May 12, 2026
Maintainer

Hi,

It's not clear what you are looking for. I thought it was one thing but the log show no existence check.

Most of what you are looking for is already in the inspector.

Overall I don't think create_all is something that's used often at runtime, so I'm not sure how performance critical it is.

In amy case can you state what's the issue you are trying to solve?

18 replies

Daverball May 15, 2026

It's not so much that they can't debug, it's just that the missing table will have no obvious connection to the abstraction that created it from the point in the code that's emitting the error, so it will be difficult to debug without the required knowledge of those abstractions, no matter how experienced you are.

I'm not gonna argue that these abstractions are good. I'd rather we didn't have them. We do use less obfuscated constructs in our other applications, so we can get away with more explicit workflows, where we don't have to rely on create_all to avoid confusing errors. It's just not worth changing at this point, since it has been working well for us in the current state. So I'm hesitant to make any changes for only very minor benefits, when the potential downside is wasted hours of work, every time someone new runs into issues with it.

But I also completely see your point, I wanted to create this issue, not because it will solve our problems, but because it seemed like an obvious improvement that would benefit every user of SQLAlchemy, albeit to a much lesser degree, than it would benefit us. But just like with every change, there is an associated cost, whether it's the raw work of implementing or the associated complexity it introduces and if that cost is too great right now, to justify the change, that's completely fine.

Either way, thanks for your time and consideration!

zzzeek May 15, 2026
Maintainer

well through the discussion here I did some of the thinking for the improvement already, and the main one is that it would be implemented using has_tables(tables) that abstracts down to has_table() for dialects that dont participate in the newer architecture. this would make the change a lot easier as we wouldn't have to worry as much about backwards compatibility.

that is this discussion asking for the feature has already "started" the feature

CaselIT May 15, 2026
Maintainer

I'm trying to think when this would cause issues, and I can't really think of where it would be worse that the current behavior.

The only case that comes to mind is if an user has a really large db in table number and it's mapping in the metadata only a very limited subset of tables of the db. In this case the query to get all tables may be somewhat slow, but I don't think it will be too dramatic.

So I could see us making such a change. Any thoughts mike?

Ah of course like Mike said the devil is in the temp tables (that if I'm not mistaken in some db can't even be enumerated)

The has_multi_table make sense to me. Likely return a dict table->bool (or iterable of 2 tuples)

zzzeek May 15, 2026
Maintainer

right what do we even do for has_tables(names) on a DB that does not enumerate TEMP tables? now we have to run the per-table query for every name anyway right?

CaselIT May 15, 2026
Maintainer

I'm not sure if has_name does something strange if the table is a temp one. Like mssql may do a different query when the table starts with #

CaselIT · 2026-05-19T21:50:43Z

CaselIT
May 19, 2026
Maintainer

I've created #13311

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Speed] Optimization for MetaData.create_all() existence checks in multi-tenant/large schema environments #13295

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 18 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Speed] Optimization for MetaData.create_all() existence checks in multi-tenant/large schema environments #13295

Uh oh!

Tschuppi81 May 12, 2026

Describe the use case

Feature Request: Optimize MetaData.create_all() Existence Checks for Multi-Tenant Environments

The Use Case / Why

The Evidence

The Proposed Optimization / Workaround

Attachments

Request

Databases / Backends / Drivers targeted

Example Use

Additional context

Replies: 2 comments · 18 replies

Uh oh!

CaselIT May 12, 2026 Maintainer

Uh oh!

Daverball May 15, 2026

Uh oh!

zzzeek May 15, 2026 Maintainer

Uh oh!

Uh oh!

CaselIT May 15, 2026 Maintainer

Uh oh!

zzzeek May 15, 2026 Maintainer

Uh oh!

CaselIT May 15, 2026 Maintainer

Uh oh!

CaselIT May 19, 2026 Maintainer

Tschuppi81
May 12, 2026

Feature Request: Optimize `MetaData.create_all()` Existence Checks for Multi-Tenant Environments

Replies: 2 comments 18 replies

CaselIT
May 12, 2026
Maintainer

zzzeek May 15, 2026
Maintainer

CaselIT May 15, 2026
Maintainer

zzzeek May 15, 2026
Maintainer

CaselIT May 15, 2026
Maintainer

CaselIT
May 19, 2026
Maintainer