fix(ingestion): release Engine resources on database switch by ulixius9 · Pull Request #27625 · open-metadata/OpenMetadata

ulixius9 · 2026-04-22T10:15:04Z

Summary

CommonDbSourceService.set_inspector / close now close every checked-out connection in _connection_map and clear _inspector_map before disposing the engine. engine.dispose() alone does not release ConnectionFairy objects or free Inspector.info_cache — prior code dropped the map refs without closing, leaving the old engine pinned via _ConnectionRecord → pool → engine. Across many databases this leaks tens of MB per switch (large info_cache of reflection results) and at interpreter shutdown the orphaned fairies trigger RecursionError in _finalize_fairy returning to a disposed pool's Condition lock.
MultiDBSource._execute_database_query and SnowflakeSource.get_database_names_raw now eagerly .fetchall() instead of streaming. Lazy iteration kept a cursor alive across set_inspector calls; since set_inspector disposes the engine the cursor is bound to, the cursor would have been invalidated when we started closing connections. Fetching the DB-name list up front (a small finite set) closes the cursor before any close can happen.
scoped_session is rebound to the new engine on each set_inspector — otherwise sessionmaker.bind retains the disposed engine.

Test plan

New unit tests in ingestion/tests/unit/topology/database/test_common_db_source.py and test_snowflake.py — 12 tests, all pass (39 total in the two files, 0 regressions).
Acceptance test — test_old_engine_becomes_gc_eligible_after_release creates a real SQLAlchemy engine, stashes a fairy in _connection_map, takes a weakref, calls _release_engine, drops strong refs, runs gc.collect(), asserts the weakref is dead. This fails against the prior kill_active_connections-only code path and passes with the fix — the direct regression guard.
_release_engine covered against a real in-memory SQLite engine: closes every map entry (including entries keyed by arbitrary worker thread ids), clears the inspector map, removes the session, disposes the pool, idempotent when engine is None, tolerates already-closed connections.
.fetchall() behavior validated: test_generator_survives_engine_dispose_mid_iteration advances the generator, disposes the engine, confirms remaining rows still yield — i.e. the cursor is no longer live at the point set_inspector disposes the engine.
Snowflake-side fetchall verified by mocking self.connection and asserting .fetchall() called exactly once with results yielded in order.
Recommended before merge: run an ingestion against a multi-DB Snowflake account and confirm (a) container RSS no longer grows linearly per DB switch, and (b) the Exception ignored in: <function _ConnectionRecord.checkout.<locals>.<lambda>> stderr warning at shutdown no longer appears.

Related symptom

A production ingestion against a 39-database Snowflake account exhibited container memory growing from ~1 GiB to ~4 GiB (pod memory limit, OOMKill territory) across a single run, followed by RecursionError: maximum recursion depth exceeded at interpreter shutdown in sqlalchemy/pool/base.py:_finalize_fairy and snowflake/connector/vendored/urllib3/connectionpool.py:_close_pool_connections. Both tracebacks bottom out in threading.Condition / RLock acquisition — the signature of weakref finalizers firing on pools that had been disposed while their fairies were still in flight. This PR eliminates the SQLAlchemy-side path by closing fairies explicitly before dispose.

🤖 Generated with Claude Code

Close fairies in _connection_map and clear _inspector_map before engine.dispose() in CommonDbSourceService.set_inspector/close. Dispose alone does not free Inspector.info_cache or release checked-out ConnectionFairies, leaving the old engine GC-pinned across DB switches and triggering _finalize_fairy RecursionError at interpreter shutdown. Eagerly fetch multi-DB name queries (MultiDBSource._execute_database_query and SnowflakeSource.get_database_names_raw) so the cursor closes before the caller invokes set_inspector, which disposes the engine the cursor was bound to. Also rebind scoped_session to the new engine so it doesn't keep the disposed one alive via sessionmaker.bind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes SQLAlchemy engine/resource leaks when multi-database sources switch databases during ingestion by explicitly closing checked-out connections and clearing inspector/session state before disposing the old engine, and by ensuring database-name queries don’t leave cursors streaming across engine switches.

Changes:

Add CommonDbSourceService._release_engine() and use it from set_inspector() and close() to close all checked-out connections, clear inspector caches, remove scoped sessions, and dispose the engine.
Update MultiDBSource._execute_database_query() and SnowflakeSource.get_database_names_raw() to eagerly .fetchall() to avoid a live cursor across engine disposal.
Add unit tests covering engine release/GC eligibility and eager-fetch behavior for MultiDB/Snowflake paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
ingestion/src/metadata/ingestion/source/database/common_db_source.py	Introduces `_release_engine()` and uses it for DB switching/cleanup to prevent engine/inspector/session leaks.
ingestion/src/metadata/ingestion/source/database/multi_db_source.py	Eagerly fetches DB-name query results to avoid cursor lifetime issues across engine disposal.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py	Applies eager fetch to Snowflake database-name enumeration.
ingestion/tests/unit/topology/database/test_common_db_source.py	Adds unit + acceptance coverage for engine release semantics and eager-fetch behavior.
ingestion/tests/unit/topology/database/test_snowflake.py	Adds Snowflake-specific tests to assert `.fetchall()` usage and yielded ordering.

github-actions · 2026-04-22T10:22:12Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

- Set self.engine = None after dispose in _release_engine (gitar-bot): prevents close() from leaving a dangling disposed-engine reference that would produce a confusing pool error on accidental later access. - _FakeSource now has close() and is wrapped in a fixture that cleans up its checked-out connection (Copilot #1): avoids resource warnings and an interfering fairy across test teardown. - Rewrite test_generator_survives_engine_dispose_mid_iteration as test_generator_survives_connection_close_mid_iteration (Copilot #2): Engine.dispose() does not close checked-out connections, so the old test did not reproduce what _release_engine actually does. The real regression is the explicit conn.close() on the fairy in _connection_map before dispose. The new test closes the connection mid-iteration, which is what fetchall() needs to survive. - Switch the query in _FakeSource.get_database_names_raw and the seeded INSERT assertions to the TEXT name column (Copilot #3): _execute_database_query is typed Iterable[str]; testing on integer ids obscured the actual contract. - Update test_disposes_pool to assert surrogate.engine is None after release (follows from the new self.engine = None behavior) and verify the original pool's checkedout() is 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-04-22T10:29:20Z

Code Review ✅ Approved 1 resolved / 1 findings

Explicitly releasing engine resources during a database switch resolves potential reference leakage. No open issues found.

✅ 1 resolved

✅ Edge Case: _release_engine leaves disposed engine ref; close() may confuse callers

📄 ingestion/src/metadata/ingestion/source/database/common_db_source.py:162-176
After _release_engine(), self.engine remains set to the disposed Engine object (not None). This is fine in set_inspector() where it's immediately reassigned, but in close() it leaves a dangling disposed-engine reference. If any code accidentally accesses self.connection or self.engine after close(), it will get a confusing error from the disposed pool rather than a clean AttributeError or None check failure. The method already sets self.session = None — doing the same for self.engine after dispose() would be consistent and defensive.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-04-22T11:30:51Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
78.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-22T12:44:31Z

🔴 Playwright Results — 1 failure(s), 10 flaky

✅ 3700 passed · ❌ 1 failed · 🟡 10 flaky · ⏭️ 89 skipped

Shard	Passed	Failed	Flaky	Skipped
✅ Shard 1	481	0	0	4
🟡 Shard 2	654	0	2	7
🟡 Shard 3	665	0	1	1
🟡 Shard 4	645	0	3	27
🔴 Shard 5	609	1	1	42
🟡 Shard 6	646	0	3	8

Genuine Failures (failed on all attempts)

❌ Pages/Glossary.spec.ts › Add and Remove Assets (shard 5)

�[31mTest timeout of 180000ms exceeded.�[39m

🟡 10 flaky test(s) (passed on retry)

Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/BulkImport.spec.ts › Database Schema (shard 2, 1 retry)
Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › Enum (shard 4, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Table (shard 4, 1 retry)
Pages/ExplorePageRightPanel.spec.ts › Should follow Data Consumer role policies for ownerless searchIndex (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…27627) * fix(ingestion): release Engine resources on database switch (#27625) * fix(ingestion): release engine resources on database switch Close fairies in _connection_map and clear _inspector_map before engine.dispose() in CommonDbSourceService.set_inspector/close. Dispose alone does not free Inspector.info_cache or release checked-out ConnectionFairies, leaving the old engine GC-pinned across DB switches and triggering _finalize_fairy RecursionError at interpreter shutdown. Eagerly fetch multi-DB name queries (MultiDBSource._execute_database_query and SnowflakeSource.get_database_names_raw) so the cursor closes before the caller invokes set_inspector, which disposes the engine the cursor was bound to. Also rebind scoped_session to the new engine so it doesn't keep the disposed one alive via sessionmaker.bind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * py format * fix(ingestion): address PR review feedback from gitar-bot and Copilot - Set self.engine = None after dispose in _release_engine (gitar-bot): prevents close() from leaving a dangling disposed-engine reference that would produce a confusing pool error on accidental later access. - _FakeSource now has close() and is wrapped in a fixture that cleans up its checked-out connection (Copilot #1): avoids resource warnings and an interfering fairy across test teardown. - Rewrite test_generator_survives_engine_dispose_mid_iteration as test_generator_survives_connection_close_mid_iteration (Copilot #2): Engine.dispose() does not close checked-out connections, so the old test did not reproduce what _release_engine actually does. The real regression is the explicit conn.close() on the fairy in _connection_map before dispose. The new test closes the connection mid-iteration, which is what fetchall() needs to survive. - Switch the query in _FakeSource.get_database_names_raw and the seeded INSERT assertions to the TEXT name column (Copilot #3): _execute_database_query is typed Iterable[str]; testing on integer ids obscured the actual contract. - Update test_disposes_pool to assert surrogate.engine is None after release (follows from the new self.engine = None behavior) and verify the original pool's checkedout() is 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ingestion): keep connection_obj in sync with engine across DB switches self.connection_obj is set once in __init__ to the initial engine and never updated. After set_inspector rebuilds self.engine, connection_obj still points at the disposed original engine — pinning its dialect and compiled_cache alive for the source's lifetime. Rebind connection_obj when creating the new engine in set_inspector, and clear it in _release_engine so close() leaves nothing dangling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ulixius9 requested a review from a team as a code owner April 22, 2026 10:15

Copilot AI review requested due to automatic review settings April 22, 2026 10:15

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 22, 2026

Copilot started reviewing on behalf of ulixius9 April 22, 2026 10:15 View session

gitar-bot Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/common_db_source.py

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread ingestion/tests/unit/topology/database/test_common_db_source.py

Comment thread ingestion/tests/unit/topology/database/test_common_db_source.py Outdated

Comment thread ingestion/tests/unit/topology/database/test_common_db_source.py

py format

edb369f

ulixius9 had a problem deploying to test April 22, 2026 10:27 — with GitHub Actions Error

ulixius9 changed the base branch from main to fix/snowflake-engine-release-on-db-switch April 22, 2026 10:36

ulixius9 merged commit f9df440 into open-metadata:fix/snowflake-engine-release-on-db-switch Apr 22, 2026
31 checks passed

ulixius9 temporarily deployed to test April 22, 2026 10:38 — with GitHub Actions Inactive

ulixius9 had a problem deploying to test April 22, 2026 10:38 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingestion): release Engine resources on database switch#27625

fix(ingestion): release Engine resources on database switch#27625
ulixius9 merged 3 commits intoopen-metadata:fix/snowflake-engine-release-on-db-switchfrom
ulixius9:fix/snowflake-engine-release-on-db-switch

ulixius9 commented Apr 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

gitar-bot Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ulixius9 commented Apr 22, 2026

Summary

Test plan

Related symptom

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

gitar-bot Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 22, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented Apr 22, 2026

🔴 Playwright Results — 1 failure(s), 10 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gitar-bot Bot commented Apr 22, 2026 •

edited

Loading