Skip to content

fix(memory): roll back pooled MySQL connections on release (follow-up to #296)#303

Merged
luigisaetta merged 1 commit into
mainfrom
fix/mysql-pool-idle-in-transaction
Jun 4, 2026
Merged

fix(memory): roll back pooled MySQL connections on release (follow-up to #296)#303
luigisaetta merged 1 commit into
mainfrom
fix/mysql-pool-idle-in-transaction

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #296. Fixes a connection-pool transaction-management bug in the MySQL checkpointer that causes an idle-in-transaction → metadata-lock deadlock (and latent stale reads), and fixes the cold-start adapter integration test that was hiding it.

The bug

_MySQLConnectionPool returned connections to the pool without ending their transaction. With the default autocommit=False, every read (load, exists, list_threads, count, get_metadata, query_by_metadata, search_data) — and the SELECT 1 health check added in #296 — opens a transaction that holds a SHARED_READ metadata lock on the tables it touches. A connection pooled in that state sits idle in transaction, which means:

  1. A later DROP/ALTER blocks indefinitely behind the pending EXCLUSIVE MDL.
  2. The next borrower reuses the pinned REPEATABLE READ snapshot and can read stale data.

How it was found

The new test_mysql_adapter_concurrent_same_thread_cold_start from #296 never actually ran the pool: it mutated a frozen AgentState (state.agent_id = ...), raising ValidationError before touching MySQL — so CI (unit-only) stayed green and the integration suite errored out early. Once that test is corrected to use the immutable model_copy(update=...) pattern, it reaches the pool and hangs.

Reproduced against MySQL 9.6:

performance_schema.metadata_locks
+----------------+-------------+-------------+
| OBJECT_NAME    | LOCK_TYPE   | LOCK_STATUS |
+----------------+-------------+-------------+
| t_xxxxxxxx     | SHARED_READ | GRANTED     |   <- idle pooled conns (Sleep, trx RUNNING)
| t_xxxxxxxx     | SHARED_READ | GRANTED     |
| ...            | ...         | ...         |
| t_xxxxxxxx     | EXCLUSIVE   | PENDING     |   <- DROP TABLE, blocked ~forever
+----------------+-------------+-------------+

The fix

  • Roll back any open transaction when a connection is returned to the pool (_release), so every pooled connection is handed out clean; discard it if the rollback fails. This is standard pool hygiene and fixes both the MDL deadlock and the stale-read window regardless of the autocommit setting.
  • Factor the connector error tuple into a shared _connection_errors() helper used by both _is_healthy and _release (a dead-connection rollback() raises mysql.connector.Error, which the previous narrow except would have let escape the finally).
  • Fix the cold-start adapter test to use model_copy(update=...) instead of mutating a frozen model.

Validation

  • MySQL integration suite (LOCUS_MYSQL_INTEGRATION=1) against MySQL 9.6: 11 passed, stable across repeated runs (previously: 1 hang + 1 error).
  • Standalone stress repro of the previously-hanging scenario: 5/5 runs complete.
  • Full unit suite: pass, total coverage 93.2% (≥90% gate); added 2 regression tests for _release (rollback-on-return, discard-on-rollback-failure).
  • ruff, ruff format --check, and hatch run typecheck all clean.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 3, 2026
The MySQL connection pool returned connections to the pool without
ending their transaction. Under the default autocommit=False, every
read (load/exists/list_threads/count/get_metadata/query_by_metadata/
search_data) and the SELECT 1 health check opens a transaction that
holds a SHARED_READ metadata lock on the tables it touched. A pooled
connection left in that state sits idle-in-transaction:

- a later DROP/ALTER blocks indefinitely on the pending EXCLUSIVE MDL
  (proved against MySQL 9.6: DROP TABLE hung behind GRANTED SHARED_READ
  locks held by sleeping locus connections), and
- the next borrower reuses the pinned REPEATABLE READ snapshot and can
  read stale data.

Roll back on return so every pooled connection is handed out clean, and
discard it if the rollback fails. Factor the connector error tuple into
a shared _connection_errors() helper used by both _is_healthy and the
new _release.

Also fix the cold-start adapter integration test, which mutated a frozen
AgentState (state.agent_id = ...) and raised before exercising the pool;
use the immutable model_copy(update=...) pattern. This is the test that
surfaced the hang once it could actually run.

Follow-up to #296.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@luigisaetta
Copy link
Copy Markdown
Contributor

Nice fix. Rolling back on pool release is the right boundary: it protects reads, health checks, and any future code path that borrows a connection without having to remember transaction cleanup locally.

Copy link
Copy Markdown
Contributor

@luigisaetta luigisaetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok approved, great catch!

@luigisaetta luigisaetta merged commit 770b6eb into main Jun 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants