[Speed] Optimization for MetaData.create_all() existence checks in multi-tenant/large schema environments #13295
Unanswered
Tschuppi81
asked this question in
Usage Questions
Replies: 2 comments 18 replies
-
|
Hi, It's not clear what you are looking for. I thought it was one thing but the log show no existence check. Most of what you are looking for is already in the inspector. Overall I don't think create_all is something that's used often at runtime, so I'm not sure how performance critical it is. In amy case can you state what's the issue you are trying to solve? |
Beta Was this translation helpful? Give feedback.
18 replies
-
|
I've created #13311 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the use case
Feature Request: Optimize
MetaData.create_all()Existence Checks for Multi-Tenant EnvironmentsThe Use Case / Why
In multi-tenant applications using a "schema-per-tenant" strategy (common in PostgreSQL),
MetaData.create_all()becomes a performance bottleneck. Even withcheckfirst=True, SQLAlchemy performs a serial existence check for every individual table in theMetaDatacollection.When a schema contains
Ntables, this results inNindividual queries topg_catalogorinformation_schema. In environments where these checks occur frequently across many schemas, the cumulative network round-trips create a massiveN+1overhead that is often flagged by performance monitoring tools (e.g., Sentry).The Evidence
Using SQLAlchemy 2.x and PostgreSQL, a
create_all()call for a schema with 10 tables results in the following serial "waterfall" of queries, even when the tables already exist, executing attachedreproduce_metadata_n_plus_1.pyscript:This confirms that
create_all()utilizes a visitor pattern that does not currently offer a bulk reflection or a single "fetch all tables in schema" optimization to minimize round-trips.The Proposed Optimization / Workaround
I have mitigated this by implementing a check — manually querying the existence of a single marker table via
inspect(engine).has_table()to decide whether to skip thecreate_all()call entirely. This reduces initialization overhead fromO(N)toO(1).Please refer to
reproduce_fix_n_plus_1.pyscript attached.Attachments
reproduce_metadata_n_plus_1.py
reproduce_fix_n_plus_1.py
Request
Could SQLAlchemy provide a native way to optimize these checks?
MetaDatafor a given schema in a single query.create_all()to avoid the per-table visitor pattern for checks.metadata.exists(engine)helper.Databases / Backends / Drivers targeted
postgresql
Example Use
refer to attached script reproduce_metadata_n_plus_1.py
Additional context
I work on a multi-tenant application called onegov-cloud where each schema represents a tenant: https://github.com/OneGov/onegov-cloud
Beta Was this translation helpful? Give feedback.
All reactions