Skip to content

Thin mode: pool grows beyond max after TIMEDWAIT requests time out #581

@chahn

Description

@chahn
  1. What versions are you using?
oracledb.__version__: 3.4.2
platform.platform: macOS-15.7.5-arm64-arm-64bit-Mach-O
sys.maxsize > 2**32: True
platform.python_version: 3.13.9

Oracle Database 21c Express Edition Release 21.0.0.0.0 (21.3.0.0.0), thin mode.

  1. Is it an error or a hang or a crash?

Error (incorrect behavior). The connection pool grows beyond the configured max when using POOL_GETMODE_TIMEDWAIT. After timed-out acquire requests, pool.opened significantly exceeds pool.max.

  1. What error(s) or behavior you are seeing?

With a pool configured as max=5 and getmode=POOL_GETMODE_TIMEDWAIT, after a burst of timed-out requests, pool.opened grows to 25 (5x the configured max). The excess connections are real TCP connections to Oracle and consume database resources.

We observed this in production where max=20 but pool.opened grew to 60.

Expected behavior: pool.opened should never exceed pool.max with POOL_GETMODE_TIMEDWAIT. Only POOL_GETMODE_FORCEGET should allow exceeding max.

Root cause is in BaseThinPoolImpl._get_next_request() in src/oracledb/impl/thin/pool.pyx:

cdef PooledConnRequest _get_next_request(self):
    for request in self._requests:
        if not request.waiting \          # <-- BUG: bypasses max check
                or request.requires_ping \
                or request.is_replacing \
                or request.is_extra \
                or self._open_count < self.max:
            request.in_progress = request.waiting
            return request
        break

When a TIMEDWAIT request times out, request.waiting is set to False (in acquire()'s finally block), but the request remains in self._requests. The background task calls _get_next_request(), which finds these stale requests. The not request.waiting condition evaluates to True, short-circuiting the entire or-chain -- including the self._open_count < self.max guard.

The background task then calls _process_request(), which creates a new connection via _create_conn_impl(). In _post_process_request(), self._open_count += 1 is executed (incrementing beyond max), then request.reject() is called (since not request.waiting), which returns the connection to the free list. The connection was created for a request nobody is waiting for, but _open_count was already incremented.

For N timed-out requests, this creates N extra connections beyond max.

Note: This is related to the previously reported #390 which was closed as stale. The commit referenced as the fix (b13418a) addressed a different symptom (connections stuck in the busy list, #392) but did not fix the _get_next_request() logic. The current code on main still has this bug.

Suggested fix -- do not process timed-out requests when they have no special flags (ping/replace/extra). Separate handling of timed-out requests from active requests, and discard stale requests from the queue:

cdef PooledConnRequest _get_next_request(self):
    cdef PooledConnRequest request
    while self._requests:
        request = self._requests[0]
        if not request.waiting:
            if request.requires_ping or request.is_replacing \
                    or request.is_extra:
                request.in_progress = False
                return request
            self._requests.pop(0)
            request.bg_processing = False
            continue
        if request.requires_ping or request.is_replacing \
                or request.is_extra \
                or self._open_count < self.max:
            request.in_progress = True
            return request
        break
  1. Does your application call init_oracle_client()?

No. Using thin mode only.

  1. Include a runnable Python script that shows the problem.

No database schema is needed. The bug is in the pool's connection management logic and only requires the ability to connect to any Oracle database.

"""
Reproduction: pool.opened grows beyond pool.max with POOL_GETMODE_TIMEDWAIT.

When timed-out requests remain in the pool's internal request queue,
the background task creates connections for them without checking the
max limit, causing pool.opened to exceed pool.max.

The bug requires the background task to be woken up AFTER the stale
requests accumulate. In production, this happens naturally when new
requests arrive. Here we simulate it by sending additional requests
after the first batch times out.
"""

import oracledb
import threading
import time

# Connect to any Oracle database
DSN = "localhost:1521/XEPDB1"
USER = "system"
PASSWORD = "oracle"

POOL_MAX = 5
TIMEOUT_REQUESTS = 20

pool = oracledb.create_pool(
    user=USER,
    password=PASSWORD,
    dsn=DSN,
    min=2,
    max=POOL_MAX,
    increment=1,
    getmode=oracledb.POOL_GETMODE_TIMEDWAIT,
    wait_timeout=1000,  # 1 second (in ms)
)

print(f"Pool created:   opened={pool.opened}  busy={pool.busy}  max={pool.max}")

# Step 1: Fill the pool to max -- all connections held busy
held_connections = [pool.acquire() for _ in range(POOL_MAX)]
print(f"Pool filled:    opened={pool.opened}  busy={pool.busy}")

# Step 2: Send requests that will all time out (pool is full)
timeout_count = 0
lock = threading.Lock()


def try_acquire():
    global timeout_count
    try:
        conn = pool.acquire()
        pool.release(conn)
    except oracledb.DatabaseError:
        with lock:
            timeout_count += 1


threads = [threading.Thread(target=try_acquire) for _ in range(TIMEOUT_REQUESTS)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Timed out:      {timeout_count}/{TIMEOUT_REQUESTS} requests got DPY-4005")
print(f"After timeouts: opened={pool.opened}  busy={pool.busy}")

# Step 3: Send a few more requests to wake the background task.
#
# Each new request calls fulfill() -> _add_request() -> _notify_bg_task().
# The background task wakes up and iterates _requests. The stale timed-out
# requests (waiting=False) are at the front of the queue. The condition
# "not request.waiting" in _get_next_request() evaluates to True,
# short-circuiting the "_open_count < self.max" guard. The background
# task creates a new connection for each stale request, incrementing
# _open_count beyond max.
print("Triggering background task with new requests...")
trigger_threads = [threading.Thread(target=try_acquire) for _ in range(3)]
for t in trigger_threads:
    t.start()
for t in trigger_threads:
    t.join()

# Step 4: Wait for background task to finish processing stale requests
time.sleep(15)

print(f"After bg task:  opened={pool.opened}  busy={pool.busy}")

# Release held connections
for conn in held_connections:
    pool.release(conn)
time.sleep(2)

print(f"After release:  opened={pool.opened}  busy={pool.busy}")

if pool.opened > pool.max:
    excess = pool.opened - pool.max
    print(f"\nBUG: pool.opened ({pool.opened}) exceeds pool.max ({pool.max})"
          f" by {excess} connections.")
    print(f"Expected pool.opened <= {pool.max}."
          f" Only POOL_GETMODE_FORCEGET should exceed max.")
else:
    print(f"\npool.opened ({pool.opened}) is within pool.max ({pool.max}).")

pool.close(force=True)

Verified output (oracledb 3.4.2, Python 3.13.9, Oracle 21c XE, thin mode):

Pool created:   opened=0  busy=0  max=5
Pool filled:    opened=5  busy=5
Timed out:      20/20 requests got DPY-4005
After timeouts: opened=5  busy=5
Triggering background task with new requests...
After bg task:  opened=28  busy=5     <-- BUG: should be 5
After release:  opened=28  busy=0     <-- 28 > max(5)

BUG: pool.opened (28) exceeds pool.max (5) by 23 connections.
Expected pool.opened <= 5. Only POOL_GETMODE_FORCEGET should exceed max.

The 23 excess connections = 20 stale requests + 3 trigger requests, each creating a connection beyond max.


We're happy to answer any questions or provide a pull request with a fix if that would be helpful.
Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions